I was skeptical, even as I felt inclined to buy it, of the Kinect. Having missed PAX this year, I missed my biggest and best opportunity to try it before I bought it. It may have been best that way. A few moments into using my Kinect on Friday and I experienced one of those wonderfully personal moments of future shock that I'm not sure I would have experienced that amidst a crowd.

I texted via XBL [1] to my brother that the future as promised by science fiction staples like Minority Report and Star Trek has arrived so that we can use it for such banal purposes as playing minigames and controlling our ESPN viewing... Hyperbole aside, I do think that the convergence of hardware and software in the Kinect as a consumer device and a "toy" are well-timed baby steps towards whatever the future of human-computer interaction is shaping up to be.

I know that other people have felt similar shock at touch and multitouch devices like the Surface and the iPhone/iPad, and I know that other people have felt similar shock at the Wii's early motion control. I've been trying to peg down why those felt mundane when I first tried them and yet Kinect continues to feel like "magic". I know as much as the technical details of the Kinect as I do the aforementioned technologies, and so I think the answer must lie in the combination and the gestalt of several technologies together for the first time.

When the Kinect is "firing on all cylinders", it is a brilliant thing indeed. Certainly it is not perfect and there are many places where it can be bettered (and presumably will be through continuous software improvement over the next few years), but there seems to be quite a bit of promise in the cheeky Johnny 5--impersonator. [2]

If I had to choose one particular thing that particularly impresses me, it is the facial recognition. When it works, and it works even better after a few minutes of training for the "Kinect ID", it is an amazing thing. The ability to sign in with just your face is pretty awesome, but where it shines is in "hop in/hop out" gaming.

I had an interesting debate on secrets hiding with respect to the iPad, and after that discussion I really want to try writing (or at least playing) a game that uses the facial recognition to "enforce" hot-seat board game secret sharing... (It would need some sort of override password for those rare times when facial recognition doesn't match, but would still probably be interesting.)

Kinect versus Windows

The more that I interact with the Kinect, the more I want to also play with it (or a brother of it) on my Windows PC. Considering that the Kinect is a USB device and that most of the other Xbox 360 USB devices have slowly gained official devices I don't think it is a stretch to imagine that many people at Microsoft have already discussed just this.

One of the things that I find interesting is that so many of the individual component technologies have already been in Windows for some time, albeit underutilized. Out of the box, Windows 7 has support for multi-touch and speech recognition and can already support many of the UI patterns that work for "Kinect hovering". It is an interesting thing, in fact, that we don't see more uses of these built-in Windows tools, and more explorations of them as user interaction tools.

As someone with tablet envy some days, I continue to debate the merits of purchasing a multi-touch monitor for my desktop, or a standalone Windows tablet. It still surprises me how many people don't know that Windows supports touch input (much less how long that support has been available).

Speech recognition is the definitive tale of underutilized Windows feature. It has been built-in to Windows since XP (with free versions for several versions previous to XP), and has supported everything the Kinect speech rec does, plus more. How many people do you think have tried it? (For the curious Control Panel -> Ease of Access -> Speech Recognition in Windows 7.)

The big thing that separates the Kinect and most desktop speech recognition is that the Kinect's microphone is an "array microphone" and most computers do not ship with one and few people sell them. Windows Vista's speech recognition wizard suggested purchasing an array microphone, and I believe that at one point Microsoft tried suggesting them to OEMs to sell computers with them.

There is an obvious and funny little pattern that follows: If people had array microphones on their computers they would use speech recognition more. But because "nobody" uses speech recognition, manufacturers don't see why they should sell expensive array microphones on their systems. In the end, Windows' speech recognition has been relegated to "just a toy", because consumers don't buy (and don't know to buy, and sometimes do not have access to buy) array microphones. Windows' Speech Recognition itself doesn't see some of the priority it needs to get better, because users aren't using.

The Kinect is a genius [3] departure from the usual rules in this ugly feedback cycle. By focusing on being a "toy" first and foremost, the Kinect is putting front and center, and in people's living rooms, just the sorts of specialty hardware that Windows wishes it could put on people's desktops.

I've heard it said before that the Xbox 360 has been great for Microsoft because it acts as a clean, proprietary slate for Microsoft to research future Windows UI improvements. I think this is even more the case with the Kinect. The Kinect is quite possibly the best HCI research lab in a box that has yet to be deployed commercially. For now we shall be glad and happy to put it to its banal entertainment uses, but I for one plan to watch excitedly as people begin to grasp that whatever it is that we've started here: it is more than just a toy. I am going to play my role as rat in this new research maze, because that will fun. I am also looking forward to whatever roles as researcher I can take.

Converge the UIs!

So, this is obviously research in progress. I think nowhere is this more apparent than in the sometimes odd sub-contexts of the Kinect Hub and Kinect Guide. It seems unfortunate that in this first release the Xbox team was unable to provide the same user interface/experience regardless of whether you were Kinect hand/speak-waving through it or controller waggling at it.

First of all, I think the Kinect Guide (which you pull up by flag semaphoring a "G" for "Guide" to the Kinect) is actually superior to the current home "blade" of the modern Guide, and almost reminiscent of the old home (Xbox Live) "blade" on the original firmware dashboard (for those that remember that). A controller would not have to much difficulty waggling over it, but it violates the current pivot control of the Guide's blades by making use of both directions. Some other pivot would be needed, but is already needed given the fact that the Kinect Guide is lacking much of the Xbox Guide's tools as it is.

I would almost welcome an option to try using the Kinect Guide as a new default for my controllers, with some mechanism such as a double-tap to revisit the more featureful Xbox Guide.

On the other hand, the Kinect Hub is a weird cousin to the current Xbox dashboard. Lacking a pivot control that is anything like the dashboard's, the Kinect Hub seems content to just provide a hodge-podge of items from across the dashboard's pivots, spread out in a couple of rows of tiles. There don't appear any visible differences between the latest less-3D dashboard update's tiles and the Kinect's tiles, and yet one responds to hand-waving and provides voice prompts while the other does not. At this point the distinction seems arbitrary and useless.

It seems like Microsoft is just a stone's throw from a truly useful convergence of its "dashboard" UIs: at this point Media Center, Xbox, Kinect, and Windows Phone 7 all seem to share similar concepts of pivots and tiles. There just weirdly seem to be so many subtle differences between the various approaches, and having three of the four accessible on an Xbox 360 seems particularly like a silly kick in the pants.

It is possible that the touch, controller, and hand-waving are too disparate to bridge into a single UI structure, but I'd be surprised if that were the case. Again, it seems that Microsoft is so close to establishing a great design pattern and useful UI across all of these disparate input systems and use cases. I hope that the current differences are much more a matter of coding velocity and priority needs for this particular launch than siloed warfare. [4] I can only assume that time will tell, but I've got a feeling there should be at least one internal evangelist currently crusading on this topic.

Now if you excuse me, I'm going back to the maze. I really want to win that adventure club watch for my avatar...


[1]Anyone hate me if I use the word xboxted for this purpose?
[2]I am willing to bet that I'm not the only person thinking or shouting "Johnny 5, Alive!" when the Kinect goes through its boot-up recalibration.
[3]Might we say, "Apple-like"?
[4]Go Team XAML, though! If one silo has to "win", as a .NET Developer I can't but help to express my interest in seeing the Windows Phone 7 "Silverlight-everywhere" approach succeed.