If you haven’t seen Word Lens in action within the past few days, I’m not sure exactly what to do with you. I’d like to take the opportunity and show how Word Lens illustrates a point.
Right now, every application in the world is struggling to reach the right corner of a four-quadrant chart, with the two axes “clarity of user experience” and “quality of implementation”. Word Lens is just the right app to show off nearly every subtlety of this chart.
First of all, there are a ton of translation applications. Besides the phone being an ideal device for the traveler, there are many translation applications already and are easy to port. Interface-wise, they range from the plainest of plain through the painstakingly crafted. The plainest of plain aren’t necessarily the bottom of this scale. Many of them have tried to do their own thing in a half-assed way, or simply run off the cliff completely with weird control choices or badly contrasting colors. The interesting thing is that even the interfaces that have been sweated over for hundreds of hours and refined over months and years aren’t the top of the heap.
Right above the best of the traditional interfaces lie the best examples of the natural interfaces. The best interface is the one that you don’t even think of as an interface, the one that you understand from the start and effortlessly get around. There is a chasm as wide as between command line interfaces and traditional GUIs when you can choose to point a camera towards a sign and see it translated on screen as if it was printed that way, instead of needing to mentally copy the text and run the translation yourself. The last few days are a brilliant testimony to the idea that everyone gets this on a fundamental level, whether or not they have actually used a computer, held a phone or cared about technology before.
Word Lens isn’t just cool because it does so much with the technology it is able to apply; it’s cool because it seems like there’s no technology, because it seems like the device is finally not only adapting itself to you, but adapting your surroundings to you. For this reason alone, it is worthy of our applause. (Additionally, some other people are worthy of our scorn because of their lack of planning; Word Lens would fit perfectly on any Top 100 list of 2010.)
“Natural interfaces” can go wrong, too! Horrendously wrong. Most, if not all other, “augmented reality” applications fall into this bucket. They feel like technology for technology’s sake. They call attention to, and amplify, technology’s part in a problem that needs solving, but they scratch the surface, add more UI and don’t actually solve the problem. If something is more than two blocks away, you’re probably better helped by showing a map and indicating the direction than by overlaying pins where their respective distance is poorly distinguished. You take to a device because you can’t see that many blocks away, and the camera can’t solve that problem for you.
So not every natural interface is actually natural. And the natural interface can’t solve every problem. At some point, you probably need to go to abstraction to deal with the problem at hand. A badly done natural interface will probably crash and burn far worse than a mediocre GUI would. But where it works, it really, really works.
The second axis on our chart is quality of implementation, and sadly, Word Lens falls short. It has trouble stably settling on which words to interpret the image as and will happily toggle back and forth. It has trouble finding the letters at all, for that matter. It does seem to get the (laudable and necessary) mechanism to detect the background pattern and text color right when all that works, but additionally, apparently the translation isn’t that good. And the work it needs to do is slow, so everything is pretty choppy.
I can accept that. It doesn’t make a perfect — or maybe not even an acceptable, I don’t know — translator, but it’s gotten the underlying model right. It’s far easier, with a world wowed, to knuckle down and start fixing the shortcomings. Within ten years, maybe actually making the online translation on the fly or having some embedded technology really work out all the edge cases would be realistic.
On the other hand, the many teams with working translators will wake up and smell the coffee. They will trot out reasonable defenses like decent translations, but they’ll start working on something like this too, and it’ll take them years, maybe months depending on their existing budget and R&D. Once Word Lens starts working convincingly, it’s going to be something people want in every translator. It may not be the perfect mode to deal with every translation, but it’s brilliant — why not have it?
To make a truly great application, you will need to have a great user experience and a high-quality implementation. You can do without either if your kind of application just isn’t that good generally, and you can do very well for yourself by having just one of them. But if you’re not figuring out how to do both convincingly, and even worse, aren’t figuring out what people really want to do instead of using your application on its terms, prepare to have your lunch eaten at some point.
When I hear about X-recognition (for arbitrary values of X) and how it simplifies some UI I’m always skeptical. Because in most cases the next sentence talks about how many problems there are and that it will improve only with time. Recognition could obviously improve many UIs if you could get the implementation right. But this is algorithmically hard and if you don’t have good algorithms to start with the probability is low that you can improve on your starting point essentially. But on the other hand, if you had the algorithm and it would be good enough you would probably use it for something else. As you said, in several years we may see acceptable solutions but it won’t be this app which has set the first marks for this idea. It will be the overall recognition technology which will lead the race here and if it reaches some acceptable level we all will wonder how many other ideas this tech will enable. (Or, then again, we won’t wonder too much, because science fiction of today will have predicted much of what is going to come in an endless self-fulfilling prophecy)
By Johannes · 2010.12.21 16:22
Johannes: There’s been OCR before, there’s been translation before and there’s been real-time photo editing before. There’s just never been something like Word Lens before as an openly available product to put them all together in a reasonably well performing form.
I might not regard it as the very best thing ever or even excel individually in any of its technologies, but it would be a mistake to dismiss it because of that.
By Jesper · 2010.12.21 21:36