Columns
Men are from Earth, computers are from Vulcan
- By Curt Monash
- May 31, 2001
Natural language computer interfaces
were introduced commercially about 15 years ago. They failed miserably.
For example, Artificial Intelligence Corporation's Intellect was a natural
language DBMS query/reporting/charting tool. It was actually a pretty good product.
But it's infa-
mous among industry insiders as the product for which IBM, in one of its first
software licensing deals, got about 1,700 trial installations -- and less than
a 1% sales close rate. Even its successor, Linguistic Technologies' English
Wizard, doesn't seem to be attracting many customers, despite consistently good
product reviews.
Another example was HAL, the natural language command interface to 1-2-3. HAL
is the product that first made Bill Gross (subsequently the founder of Knowledge
Adventure and idealab!) and his brother Larry famous. However, it achieved no
success, and was quickly dropped from Lotus' product line.
In retrospect, it's obvious why natural language interfaces failed. First of
all, they offered little advantage over the forms-and-menus paradigm that dominated
enterprise computing in both the online-character-based and client-server-GUI
eras. If you could not meet an application
need with forms and menus, you couldn't meet it with natural language either.
Even worse, NL actually had couple of clear disadvantages versus traditional
interfaces. First of all, it required (ick!) typing, often more typing than
the forms and menus did. Second, forms and menus tell the user exactly what
he can do. Natural language, however, lets him give orders the computer doesn't
know how to follow. This is inefficient, not to mention frustrating.
However, even in 1983, it was obvious that the typing objection would go away
some day, because of speech recognition -- once desktop computers reached 100
MIPs
or so. (Effective keyboard-replacement speech recognition -- as opposed to true
natural language understanding -- is mainly a matter of processing power.) 15
years later, standard PCs exceed 100 MIPs (assuming that 1 MIPs = a couple of
megahertz for these purposes), and speech recognition is indeed getting practical.
In fact, as has become increasingly evident recently, speech recognition is
now a hot technology. Bill Gates has been talking it up for a couple of years.
Increasingly, the press has swung to believing him.
That said, speech recognition is as misunderstood (no pun intended) as most
artificial intelligence technologies. Yes, it beats typing, in a number of circumstances:
- On the telephone (duh!),
- "Busy hands" and/or "busy eyes" applications and lo-cales
(doctors' offices, trading floors, warehouses, etc. -- and, some day in the
future, your kitchen and car),
- People simply reluctant to type (e.g., anybody with sufficient wrist or
back problems, and many males over the age of 45).
But before our computers talk back and forth with us in the voice of Majel
Barrett Roddenberry, applications are going to have to add several important
elements required for truly functional natural-language interfaces:
- Intuitively clear names for everything on (or just behind) the screen, and
- Application-specific disambiguation logic.
For most practical purposes, the latter requirement equates to a new generation
of document selection technology.
"The Rule of Names"
According to legend, knowing something's name gives you power over it. When
that "something" is a button or menu choice on a speech-enabled computer,
the legend is literally true. But when a feature doesn't have an obvious name,
you can't easily invoke it.
When applications consisted mainly of forms and menus, this was rarely a problem.
Everything had a clear role and label. But Web pages are less organized. Hyperlinks
can be scattered all over the place, with little rhyme or reason.
Frankly, I don't think this is a hard problem to solve. It wouldn't take a
lot of XML to divide the page into clear regions, so that commands like "Show
me article #3" (on a search results list) could be interpreted in the obvious
way. But it does take at least some discipline; random Web pages will not necessarily
be easy to "talk" to.
Cybernetic listening skills
The bigger challenge is to make sure that the application can respond in some
useful way, no matter what command it's given. This is even more difficult than
it was 15 years ago, because of the radical increase in "casual" computer
usage. In the old days, we could assume the user had some clear business reason
for using the application and, if necessary, that she/he had time to be trained
(even if people rarely sat still for as much training as they really needed).
Therefore, we could at least assume that the users had at least a general idea
of what the application did, and hence of which commands the computer could
obey. From an NL standpoint, we could assume that what they actually "said"
(which in those days meant "typed") was at least reasonably close
to what they were "supposed" to say.
Now, however, some of the most important applications are
Internet e-commerce and portals, competing and begging for the users' attention.
The user is there strictly on a voluntary basis, and if he doesn't get immediate
gratification, he's gone, history, hasta la bye-bye. Site-specific training
isn't even a consideration. And even if somebody did actually take a class on
"How to use Excite," the knowledge would be obsolete in six months.
So applications, if they are to have natural language interfaces that please
and respond to users, have to be able to respond pretty much to any command.
Ideally, voice-enabled systems would be like the computers on Star Trek, which
can return information from vast archives, brew a pot of Earl Grey tea, play
three parts of a quartet, create self-aware life forms, or answer questions
like "Computer, what is the nature of the universe?". More realistically,
they should be able, for example, to respond to a command like "Tell me
about flights to Miami" by automatically giving the user a travel-reservation
application or Web page, and entering Miami in the appropriate form field.
If one thinks about the complications in such a system, it becomes clear that
there are only two possible ways an application system can be designed to respond
meaningfully to an enormous range of reasonable possible requests.
- It can do the equivalent of saying "I'm sorry, I didn't understand
that," "I'm sorry, I can't do that," and so on.
- It can interpret many commands as text-search strings, and return appropriate
results.
The first strategy -- application-specific disambiguation logic, clear responses
to "errors," etc. -- is absolutely necessary. No software is perfectly
intelligent; the user will have to be asked for disambiguation help from time
to time (just as clerks today ask customers to repeat their requests!). I'm
not going to go into much detail about how that works because, frankly, it's
a tricky thing to get right. Users hate unnecessary disambiguation steps.
They also hate the incorrect responses that result from ambiguity, and do tolerate
being asked for help when it's truly needed. In short, whatever you build the
first time around will probably be wrong. So build something fast; then run,
don't walk, to the nearest usability lab, find out how you screwed up, and redo
your system until you get it right. I'm convinced that the second strategy --
heavy reliance on text-search technology -- is a requirement as well. Just try
to name a major Web site that doesn't use text search. True, text search has
gotten a bad rap recently, mainly because a whole generation of search engines
didn't really work. But it will stage a comeback.
Expect a lot of discussion of these issues in future Monash NetWatchers; the
document selection business is one of our major areas of research.
About the Author
Curt A. Monash, Ph.D., is president and CEO of Monash Information Services and Elucidate Technologies LLC, located in Lexington, Mass.