« Riddle of the Sphinx: What Do I Need to Recognize Speech? | Main | O'Reilly on Google Speech and Gestures to Speech »

Speech Recognition & Synthesis in Science Fiction

Last month everyone and their brother blogged about Michael Schmitz's article on Human Computer Interaction in Science Fiction Movies. However, this blog wasn't up and running then, and I feel I should include it for completeness.

Regarding speech interfaces, especially in Star Trek, Schmitz writes, "In almost all movies the speech interface is conversational and intuitive, the difficulties especially of speech recognition and evaluation are never considered." No kidding! This is a big reason why I'm motivated to work on this project -- Why the hell hasn't it been invented already? It looked so easy on the big screen when I was a kid!

I've been going back and rewatching some of the Star Trek: The Next Generation episodes (a guilty geeky pleasure), and although the interaction there was definitely conversational, it was very rigid and formal, like most Star Trek speech. Terms like "standby," "hold execution," and "resume program" were the norm. These terms are standard sci-fi/Star Trek jargon and do not feel out of place on screen. However, I'm pondering the affect of this formal, terse jargon in a real-life speech interface. Would it make for a more natural discussion with a computer? It certainly makes the computer's job easier.

For example, I might say to my friend, "Hey, dude, what's goin' on?" But to the computer, I would say, "Status?" Terse, efficient... yet cold and rigid. However, I suggest keeping this seperation of common speech versus command speech is useful. It's my opinion that a speech interface should not be someone's "friend" or "companion," but a guide, a tool, an assistant. Using a secondary, more formal, mode of speech might reinforce this implicit seperation of "friend" versus "computer."

Comments (2)

Makes sense to me...

Seems like having a formal command language makes it less likely that there will be issues with the computer thinking a side-conversation with another human is something it should act upon.

Exactly! I'm always amazed at how quickly people can slip in and out of "modes of speech" anyway. For example, have you ever found yourself just unconsciously not swearing around children or your in-laws? Yes, people have problems with it sometimes, but all in all, these minor shifts in speaking are not cumbersome.

(Probably a better example is when we change our words when talking with children, customers, or co-workers. We have a different vocabulary, and yes even different sentence structure depending on who we're talking to. My Southern accent reappears dramatically when I'm back home with my family.)

I think adjusting your speech to "computer-ese" will probably be just as easy. The question is -- do people *want* to do that? (I know there's plenty of research on that... I just haven't read it yet.)

Also, this brings up a topic I want to explore later -- how long will it take for speech interfaces to affect how human languages evolve? Obviously we have new "computer words" already, like "google," but what about when we're actually talking to computers? Television has unified English, for example, by having more and more people listen to the same dialects. Will speaking to a computer affect us the same way in the long term? Will we abandon words that are difficult for the computers to interpret? It might take a couple of decades, but that's a very short time in the history of most languages.

Post a comment

About

This page contains a single entry from the blog posted on May 20, 2007 8:30 PM.

The previous post in this blog was Riddle of the Sphinx: What Do I Need to Recognize Speech?.

The next post in this blog is O'Reilly on Google Speech and Gestures to Speech.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34