Intel RealSense 3D cameras bring hand and finger tracking to home PCs, and an easy-to-use SDK for developers, which makes them a great new input method for both VR games and screen-based games.
Ideally, we'd like to be able to make games that don't require the player to touch any kind of peripheral at any point. But, as with the Kinect and the EyeToy, we run into problems when we face one common task: entering text. Even inputting a character's name without using a keyboard can be tedious.
In this post, I'll share what I've learned about the best (and worst!) ways to let players enter text via gesture alone, and show you how to set up the Intel RealSense SDK in Unity, so you can try it in your own games.
(Note that I'm focusing on ASCII input here, and specifically the English alphabet. Alternative alphabets and modes of input, like stenography, shorthand, and kanji, may be served better in other ways.)
Input Methods We Can Improve On
There are other approaches for peripheral-free text input out there, but they have flaws. We'd like to come up with an approach that improves on each of them.
Virtual Keyboard
Keyboards are the gold standard for text entry, so what about just mimicking typing on a keyboard in mid-air or on a flat surface?
Sadly, the lack of tactile feedback is more important than it might seem at first glance. Touch-typing is impossible in this situation, because kinesthesia is too inaccurate to act on inertial motion alone. The physical and responsive touch of a keyboard gives the typist second-sense awareness of finger position and acts as an ongoing error-correction mechanism. Without that, one’s fingers tend to drift off target, and slight positioning errors compound quickly, requiring a "reset" to home keys.
Gestural Alphabet
Our first experiment with RealSense for text input was an effort to recognize American Sign Language finger-spelling. We discovered that there are several difficulties that make this an impractical choice.
One problem is speed. Proficient finger-spellers can flash about two letters per second or 120 letters per minute. At an average of five letters per word, that’s 24 WPM, which is considerably below even the average typist's speed of 40 WPM—a good finger-speller is about half as fast as a so-so keyboarder.
Another problem is the need for the user to learn a new character set. One of the less-than-obvious values of a standard keyboard is that it comports to all of the other tools we use to write. The printed T we learn in kindergarten is the same T seen on the keyboard. Asking users to learn a new character set just to enter text in a game is a no-go.
Joystick and Spatial Input
Game consoles already regularly require text input for credit card numbers, passwords, character names, and other customizable values. The typical input method is to display a virtual keyboard on the screen and allow a spatially sensitive input to "tap" a given key.
There are many iterations of this concept. Many use a joystick to move a cursor. Others may use a hand-tracking technology like Intel RealSense or Kinect to do essentially the same thing (with a wave of the hand acting as a tap of the key). Stephen Hawking uses a conceptually similar input that tracks eye movements to move a cursor. But all of these systems create a worst-of-both-worlds scenario where a single-point spatial input, essentially a mouse pointer, is used to work a multi-touch device; it's like using a pencil eraser to type one letter at a time.
Some interesting work has been done to make joystick text input faster and more flexible by people like Doug Naimo at Triggerfinger, but the input speed still falls short of regular typing by a large margin, and is really only valuable when better or faster input methods are unavailable.
My Chosen Input Method
All this talk about the weaknesses of alternate text-input methods implies that the humble keyboard has several strengths that are not easily replaced or improved upon. How can these demonstrated strengths be conserved in a text entry system that requires no hands-on peripherals? I believe the answer lies in two critical observations:
- The ability to use as many as 10 fingers is impossible to meet or beat with any single-point system.
- The tight, layered, and customizable layout of the keyboard is remarkably efficient—but it is a 2D design, and could be expanded by incorporating a third dimension.
With all of this in mind, I came up with a simple, two-handed gestural system I call "facekeys". Here's how it works.
Starting Simple: A Calculator
Before getting to a full keyboard, let's start with a numpad—well, a simple calculator. We need ten digits (0 to 9) and five operators (plus, minus, divide, multiply, and equals). Aiming for using all ten fingers, we can break these into two five-digit groups, and represent these on screen as two square-footed pyramids, with the operators as another pyramid:
Each finger corresponds to one face of each pyramid. Each face can be thought of as a "key" on a keyboard, so I call them facekeys. The left hand enters digits 1 to 5 by flexing fingers individually, while the right enters digits 6 to 0. Flexing the same finger on both hands simultaneously—both ring fingers, say—actuates a facekey on the operator pyramid.
Non-digit (but essential) functions include a left-fist to write the displayed value to memory, a right-fist to read (and clear) memory, and two closed fists to clear the decks and start a new calculation.
When I first tested this out, I assumed that users holding their hands palm-downward (as if typing on a keyboard) would be most comfortable. However, it turns out that a position with palms facing inward is more comfortable, and allows for both longer use and more speed:
It also turns out that visual feedback from the screen is very important, especially when learning. We can provide this via a familiar calculator-style digit readout, but it's also good to make the pyramids themselves rotate and animate with each stroke, to establish and reinforce the connection between a finger and its corresponding facekey.
This system is comfortable and easily learned, and is also easily extensible. For instance, the lack of a decimal point and a backspace key gets frustrating quickly, but these inputs are easily accommodated with minor modifications. First, a right-handed wave can act as a backspace. Second, the equals facekey can be replaced with a decimal point for entry, and a "clap" gesture became the equals operator, which has the delightful result of making calculations rhythmic and modestly fun.
Extending This to a Full Keyboard
A peripheral-free calculator is one thing, but a typical 80+ keyboard replacement is quite another. There are, however, some very simple and practical ways to continue development around this keyfacing concept.
The standard keyboard is arranged in four rows of keys plus a spacebar: numbers and punctuation on top with three rows of letters beneath. Each row is defined by its position in space, and we can use that concept here.
Instead of moving your hands toward or away from a fixed point like the camera, a more flexible method is make the system self-referential. We can let the player define a comfortable distance between their palms; the game will then set this distance internally as 1-Delta
. The equivalent of reaching to different rows on the keyboard is then just moving hands closer or farther apart from one another: a 2-Delta
distance accesses "second row" keys, and 3-Delta
reaches third row keys.
The "home keys" are set to this 1-Delta
distance, and keyfacing proceeds by mapping letters and other characters to a series of pyramids that sequentially cover the entire alphabet. Experimentation suggests 3-4 comfortable and easily reproducible Deltas exist between hands that are touching and shoulder-width. Skilled users may find many more, but the inherent inaccuracy of normal kinesthesia is likely to be a ceiling to this factor.
Simple gestures provide another axis of expansion. The keyboard's Shift key, for instance, transforms each key into two, and the Ctrl and Alt keys extend that even more. Simple, single-handed gestures would create exactly the same access to key layers while maintaining speed and flexibility. For instance, a fist could be the Shift key. A "gun" hand may access editing commands or any number of combinations. By using single-handed gestures to modify the keyfaces, the user can access different characters.
Ready to try it yourself? First, you’ll need to install the Intel RealSense SDK and set up the plugin for Unity.
Crash Course in Unity and RealSense
Here's a quick walkthrough explaining how to install and set up the RealSense SDK and Unity. We'll make a simple test demo that changes an object's size based on the user's hand movement.
1. What You'll Need
You will need:
- An Intel RealSense 3D camera (either embedded in a device or an external camera)
- Unity Professional 4.0 or higher
- The free Intel RealSense SDK
You may also wish to use this free plugin that lets you write Unity code in Visual Studio; it's up to you.
I'm going to use the spaceship from Unity's free Space Shooter project, but you can just use a simple cube or any other object if you prefer.
2. Importing the RealSense Unity Toolkit
The package containing the Unity Toolkit for Intel RealSense technology contains everything you need for manipulating game objects. The Unity package is located in the \RSSDK\Framework\Unity\
folder. If you installed the RealSense SDK in the default location, the RSSDK
folder will be in C:\Program Files (x86)\
(on Windows).
You can import the Unity Toolkit as you would any package. When doing so, you have the options to pick and choose what you want. For the purpose of this tutorial, we will use the defaults and import everything.
As you can see in the following image, there are now several new folders under the Assets folder.
- Plugins and Plugins.Managed contain DLLs required for using the Intel RealSense SDK.
- RSUnityToolkit is the folder that contains all the scripts and assets for running the toolkit.
We won’t go into what all the folders are here; I’ll leave that for you to investigate!
3. Setting Up the Scene
Add the ship to the scene.
Next, add a directional light to give the ship some light so you can see it better.
It should look like this:
4. Adding the Scale Action
To add scaling capabilities to the game object, we have to add the ScaleAction script. The ScaleActionScript is under the RSUnityToolkit folder and in the Actions subfolder.
Simply grab the script and drag and drop it directly onto the ship in the Scene view. You will now be able to see the ScaleAction script parameters in the Inspector.
5. Setting the Parameters
Starting with the Start Event, expand the arrow to show the default trigger. In this case, we don't want to use Gesture Detected, we want to use Hand Detected.
Right-click on Gesture Detected and select Remove. Then, on the Start Event’s Add button, click and select Hand Detected. Under Which Hand, select and choose ACCESS_ORDER_RIGHT_HANDS.
Now we'll to set the Stop Event. Expand the Stop Event and remove the Gesture Lost row. Next, click the Stop Event’s Add button and choose Hand Lost. Next to Which Hand, select ACCESS_ORDER_RIGHT_HANDS.
We won’t have to change the Scale Trigger because there is only one option for this anyway. We'll just use the default.
6. Trying It Out
That's it! Save your scene, save your project, and run it; you'll be able to resize the ship on screen with a gesture.
Now It's Your Turn!
We've discussed the ideas behind inputting text without touching a peripheral, and you've seen how to get started with the RealSense SDK in Unity. Next, it's up to you. Take what you've learned and experiment!
First, get your demo interpreting different characters based on which fingers you move. Next, reflect this on the screen in an appropriate way (you don't have to use my pyramid method!). Then, take it further—what about trying a different hand position, like with palms facing the camera, or a different input motion, like twisting a Rubik's cube?
The Intel® Software Innovator program supports innovative independent developers who display an ability to create and demonstrate forward looking projects. Innovators take advantage of speakership and demo opportunities at industry events and developer gatherings.
Intel® Developer Zone offers tools and how-to information for cross-platform app development, platform and technology information, code samples, and peer expertise to help developers innovate and succeed. Join our communities for the Internet of Things, Android, Intel® RealSense™ Technology, Modern Code, Game Dev and Windows to download tools, access dev kits, share ideas with like-minded developers, and participate in hackathons, contests, roadshows, and local events.