Return to CALL resources main page |
View Site Index
See an
overview of Speech Recognition elsewhere at this website
I worked at Courseware Publishing International as Director of ESL Software Design from 1995-1997. CPI's work with speech recognition was directed at producing a series of products collectively called Conversation Navigation. These products begin with a vocabulary training program called See It Hear It Say It! The second in the series was called Traci Talk, The Mystery. The latter, which I edited and helped develop from a script written originally by Phil Hubbard, is an interactive branching mystery game created specifically for learners of ESL that is resolved by users interacting with characters on the screen. These interactions could be done almost entirely via speech-recognition.
My participation in these products included the following:
I worked with the voice talent in the studio; e.g. prepare scripts for studio use, direct the process of recording. Afterwards, I used Sound Forge to sample the audio, enhance it as necessary using sound editing tools, and save it to 16-bit wav files.
I created BNF grammars to specify the full range of acceptable utterances at any given juncture requiring the user's spoken input. Using the IBM Voice Type Application Factory speech recognition software, I compiled these into context files and tweaked the grammars and other parameters until satisfactory recognition of a wide range of utterances was achieved.
When I started working for Courseware Publishing International in fall 1995, the decision to go ahead with See it Hear it Say it! had already been made, the interface had been designed, and there was little I could do about the direction of the product other than to grade the vocabulary already selected (delays in production that my suggestions for improvement would have entailed did not fit in with the company's fiscal situation at the time).
However, I was fortunate to be in on the ground floor of development of Traci Talk: The Mystery, and to be able to work closely with Phil Hubbard, an ESL-specialist professor at Stanford university, and to advise staff at CPI early enough to influence the project's pedagogical integrity. Dr. Hubbard had just started working as a consultant to develop the story line for the project, and a center working with speech recognition at Stanford University had been engaged to write the DLLs for the IBM Continuous speech recognition software so that it would work with Authorware, the authoring system used as the development platform.
Aside from these partnerships, the program was created in less than two years almost entirely by a minimal full-time staff of 5 at the offices of CPI in Cupertino, one of whom was the company president engaged mainly in administration and logistical support (organizing partnerships, arranging sound studios and voice talent, etc.). Most of the creative work was done by a single programmer using Authorware and a single graphics artist who used a 3-D modeling tool to create the characters and articulate them, though an additional graphics artist was brought in part time during the second year. During this period, staff often had to break off to devote time and energies to other products that the company was developing and maintaining, so in that two years perhaps half the time was spent on Traci Talk. The final push to completion was assisted by the hiring of a part-time graphics artist and a tester for Traci Talk and other CPI programs.
Traci Talk in a nutshell
Traci Talk is a mystery game where the user is set a problem and then must go about collecting data in order to solve it. The task is presented in the form of an email that identifies the student user as a famous detective engaged to resolve the mystery of a missing object. The email transmission is truncated so that the users find out only that to start the adventure, they must take a train to 'Cupervale'. Users then meet the four principle characters in four sets of interactions in each of two game modes. In these interactions, users can converse with the characters by articulating utterances displayed on the screen.
What students can say is displayed on the screen in two or usually three choices of interaction driving the story line. In addition, the users can articulate a set of commands also displayed on the screen driving the program's help features. These commands and conversational gambits can either be spoken or clicked on. Progress through the adventure is marked by the appearance of icons down one side of the screen. That is, users with no icons start the program with the introduction, but as they engage the various characters, icons appear and are recorded in their profiles so if they restart the game, they can click on one of the icons to resume the game at that point. However, they can't jump ahead in the game until they have touched on the linear sequence that will give them an icon for each character.
The interactions start with Ron on the train to Cupervale. The next episode is with Hector, the cab driver who takes the student users to their apartment on campus, as arranged by the sender of the email. There they meet Sandy, their flaky neighbor. Finally, they meet Ana at dinner that evening. Each of these characters reveals in conversation a part of the puzzle that establishes the relationships and antagonisms between them, and it is in talking with the characters that users divine the nature of the missing object and form hypotheses about who might have taken it.
Before talking with any of the four characters, users are presented with three sets of 5 questions which they can either ignore or use to guide their conversations with the characters. In other words, there are 15 items for each character that the program will suggest as information users might seek to discover in conversation with the principles, for a total of 60 items. The users may revisit the characters as often as they like and have as many conversations as they wish to get as much information as they can for the first part of the adventure (and conversations at this stage can vary widely in length and subject matter), but in order to go on to the second part, they have to answer 8 questions taken at random from the 60 available. In other words, users who have spoken to each of the characters at this stage of the program will have 4 icons on their screen which they can click on to move about to different parts of the program, but they can't progress to solution phase of the game until they have gathered sufficient information about the characters to answer all 8 of the 'gatekeeper' questions correctly.
Once past the 'gatekeeper' students find that they can call each of the four principles to their apartment for further interviews. Here they can ask questions concerning the first part of the conversation, and the characters will elaborate on their stories and grievances with one another. Given enough information, and as is often the case with such adventures, a bit of luck perhaps, the assiduous user is likely to stumble on the solution to the mystery.
Pedagogical Integrity of Traci Talk
Traci Talk's unique success was its partnership between commercial and academic interests using what was at the time a practical and appropriate application of state-of-the-art speech recognition tools. 'Traci' is in fact an acronym for 'teacher ranging across the computer interface'. Traci was conceived as an agent who appears when the program is started but who otherwise remains dormant unless called by the student user. The user can in fact literally call "Traci" to make the agent appear, or click a button if that doesn't work. In speech recognition terms, this means that whenever the speech recognition engine is switched on awaiting an utterance, one of the expected utterances is always "Traci."
Other expected utterances include, "I beg your pardon," or "Sorry, I didn't catch that." These utterances, when recognized, will cause the character to paraphrase something he or she has just said. Notice that the utterance is almost never repeated in this case. In other words, for almost everything a user hears a character say in Traci Talk, there is a paraphrase which can be elicited in case of poor comprehension. As with normal conversation, the character will usually simplify or say more directly what has just been said, although sometimes requests for paraphrase can result in interesting or humorous asides.
Traci, the agent, was conceived as a personified PDA or Personal Digital Assistant to the user. We imagined a student posing as a famous detective immersed in a world where people are speaking on higher order topics in a difficult to understand language, who somehow has the ability to pull out a PDA in the course of a conversation and use it to help facilitate understanding of the conversation. Here, for example, the student can call the PDA by saying "Traci," and then say "Repeat," and the character will say exactly what he or she has said before. The student can also read what was said, whereas conversations with the characters are aural in nature (they can even be completely aural, as users can switch off their own prompts in case they have memorized their scripts). Students can also tell Traci to Go Back to a previous utterance ("Go Back" takes them back one step, and "Go Back Four" takes them back four steps, and so on). At the point returned to, users can branch the conversation in other directions if they wish. Students can also see the conversation task questions they are meant to be working on using the Traci interface. All the commands are voice activated or clickable, whichever the user prefers (or whichever works - although properly configured, the speech recognition in Traci Talk works very well and accepts a wide range of pronunciations and even mispronunciations).
Traci also records conversations with each character to a text file. Users can read these conversations using Notepad (Traci Talk is PC based and doesn't work with Macintosh) and should copy them to a personal work area, important since the next conversation with that character will over-write the previous one.
The program was created to allow ESL students to enjoy realistic interaction in simulated spoken conversation with plausible characters. I have seen students whose level of English was quite low appear to enjoy perhaps for their first time ever the experience of holding their own in a conversation in that target language (such students greatly appreciated my handing them transcripts of their conversations to take away with them). The program was designed to give students at any level a set of tools to facilitate such interaction, as well as to promote collaboration among students who might share and discuss what each had learned in separately branching conversations with the characters. The program differs from others of similar genre (e.g. Who is Oscar Lake) in the greater depth of branching available to students, and in its application of speech recognition to enhancement of the conversation and enjoyment of the interactive experience.
Who is Oscar Lake too often provides students choices that give alternate ways to say things but do not in fact lead down different branches. Traci Talk's choices almost always result in branching, and Traci provides a greater range of help than any found in Oscar Lake. Oscar, with its video-animated characters and drag and drop capabilities, is a 'slicker' production however. Incidentally, we found out about Oscar Lake at a TESOL Conference where we were showing our prototype versions of Traci Talk, so the one did not influence the other.
For comments, suggestions, or further information
on this page Last updated: April 19, 2002 in Hot Metal Pro 6.0 |