HoloLens

Augmented Reality is one of the most exciting new paradigms for computing that has the potential to be useful in a number of contexts. Microsoft’s HoloLens is not a consumer level product, but does provide a platform for developing AR concepts for future platforms; perhaps it is the 70s Xerox Alto to the 1984 Macintosh. Technically, the SLAM implementation works incredibly well, and allows for holograms to be placed in an environment, appear stable, and allow the user to walk around and observe them. It also has world scale anchoring, so objects will maintain their position between sessions. The most apparent technical issue with the HoloLens is the field of view, which will be discussed later.

The HoloLens' limited field of view (source)

g The other difficulty I faced while prototyping for the HoloLens was the set of restrictions enforced by the SDK: while you can access the spatial mapping of your environment and the RGB camera, you cannot access the raw depth camera data. This prevents using the device as a headmounted Kinect or Leap Motion.

The implication of this is that you are essentially restricted to Microsoft’s “Gaze, Gesture, Voice” (GGV) input system, so custom gestures are out unless you can implement them with just the RGB camera. They also do not yet provide any mechanism for training your own gestures, as possible with the Kinect.

Perhaps Microsoft will open up the input system in the future, but at the moment it feels like a real mistake to have such restrictions for a device that is supposed to be for developing the future of a new medium.

Number input

For this prototype I set two criteria, the first being to design for precise number input. With hand tracking and gestural input, it is fairly easy to provide rough input, for example adjusting a slider or directly manipulating an object. This is the kind of interaction that is common in demo materials.

Scaling in AR

While fine for roughly finding values or making adjustments, sometimes exact numbers are required. There are fewer concepts for how to do this in AR, usually just an on screen keyboard.

The second criteria was not to use an external device, whether that’s a gestural input glove or keyboard to just type in values. While it may be true that ultimately using a hardware keyboard will always be faster than an “in AR” input method, and would make sense for desk work, it requires additional equipment that might not always be available, and uses none of the affordances of Augmented Reality. Using a device also means that one or both hands are occupied and can’t be used for AR interactions without putting down the input device.

Concepts

I started by sketching out a number of potential interactions, which used a mix of interaction methods, GGV and other.

Concepts 1

Concepts 2

Because of the previously mentioned constraints, I decided to first focus on the concepts that used the GGV system, and then create concept videos for some of the others that would be possible with greater hardware access.

Prototypes

The first prototype built was the standard numberpad input, which has the advantage of being familiar (although phone keypads and calculators have different opinions).

Calculator vs Phone, requiring two mental models with little difference in speed or accuracy since the 1950s

The box at the bottom is the display

The second prototype re-arranged the numbers into a circle. The initial thought came from the tennis principle of always returning to the centre of the court, in this arrangement every key is equidistant from the centre. The positions of the digits piggybacks on the layout of the analogue clock to provide familiarity.

HoloLens Prototype 2

Using this interface didn’t provide a real difference in ease of input, if there was any improvement it was ‘within paradigm’, my inclination is that any interface within the ‘gaze and click’ will have a similar level of ease/difficulty.

The final prototype used Microsoft’s voice recognition as input. It would recognise the spoken words for each digit, as well as ‘dot’, ‘point’ and ‘decimal’ for the decimal point, ‘negative’, ‘minus’, and ‘positive’ to switch between a positive and negative values, and ‘clear’ to delete everything.

The voice prototype is just an output box, that is activated when looked at.

Unlike the previous prototype, this was a paradigm change; while it wasn’t necessarily faster to enter numbers, and some of the usual problems of voice recognition (i.e. non-recognition) cropped up, it was the least stressful method of input. However, this is only one role of an interface, if I had to edit the number, or fix a mistake then it would be a real pain to do with the voice. Saying numbers is also not a great interface for exploration, which usually works well when the numbers are abstracted into something (e.g. a piece of graphics) that can be played with without thinking about the actual numbers themselves, it’s hard to play with the spoken names of numbers without thinking about numbers. Voice control also has a number of restrictions, particularly requiring an environment where other people aren’t speaking and accidentally triggering your voice commands.

As can be seen in the video, field of vision and the size of an interface are important factors. It takes a lot of movement to move the cursor around if the interface is large/you are standing close to it. It can also be difficult to see the output box for feedback in both of these prototypes, an alternative would be to have these “vision locked”, fixed to your view rather than as virtual objects.

A key insight gained from the experiment is that speed of input is hugely influenced by how ready you can be to do the next thing. With gaze input this is limited to knowing where the next number is, or looking at it (with the eyes, not moving the head). What you can’t do is be half ready to press it (e.g. with a keyboard you can have a finger hovering over the key you’re going to press, or at least near it - the layout of the keyboard aims to have all the keys you need within reach of the home position).