Speech
Recognition,
Technologies and Applications
480
The 1980s was characterised by a major impetus to large vocabulary, continuous speech
recognition systems led by the US Defense Advanced Research Projects Agency (DARPA)
community, which sponsored a research programme to achieve high word accuracy for a
thousand word continuous speech recognition database management task. Major research
contributions included Carnegie-Mellon University (CMU), inventors of the well known
Sphinx system (Lee et al., 1990), BBN with the BYBLOS system (Chow et al., 1987), Lincoln
Labs (Paul, 1989), MIT (Zue et al., 1989), and AT&T Bell Labs (Lee et al., 1990).
The support of DARPA has continued since then, promoting speech recognition technology
for a wide range of tasks. DARPA targets, and performance evaluations, have mostly been
based on the measurement of word (or sentence) error rates as the system figure of merit.
Such evaluations are conducted systematically over carefully designed tasks with
progressive degrees of difficulty, ranging from the recognition of continuous speech spoken
with stylized grammatical structure (as routinely used in military tasks, e.g., the Naval
Resource Management task) to transcriptions of live (off-the-air) news broadcasts (e.g. NAB,
involving a fairly large vocabulary over 20K words) and conversational speech.
In recent years, major attempts were focused on developing machines able communicate
naturally with humans. Having dialogue management features in which speech applications
are able to reach some desired state of understanding by making queries and confirmations
(like human-to-human speech communications), are the main characteristics of these recent
steps. Among such systems, Pegasus and Jupiter developed at MIT, have been particularly
noteworthy demonstrators (Glass & Weinstein, 2001), and the How May I Help You
(HMIHY) system at AT&T has been an equally noteworthy service first introduced as part
of AT&T Customer Care for their Consumer Communications Services in 2000 (Gorin, 1996).
Finally, we can say after almost five decades of research and many valuable achievements
along the way (Minker & Bennacef, 2004), the challenge of designing a machine that truly
understands speech as well as an intelligent human, still remains. However, the accuracy of
contemporary systems for specific tasks has gradually increased to the point where
successful real-world deployment is perfectly feasible.