Speech Recognition for Smart Homes

Download 0,56 Mb. Pdf ko'rish
bet	3/14
Sana	15.05.2024
Hajmi	0,56 Mb.
	#235098

1 2 3 4 5 6 7 8 9 ... 14

3. ASR development
Half a century of ASR research has seen progressive improvements, from a simple machine
responding to small set of sounds to advanced systems able to respond to fluently spoken
natural language. To provide a technological perspective, some major highlights in the
research and development of ASR systems are outlined:
The earliest attempts in ASR research, in the 1950s, exploited fundamental ideas of acoustic-
phonetics, to try to devise systems for recognizing phonemes (Fry & Denes, 1959) and
recognition of isolated digits from a single speaker (Davis et al., 1952). These attempts
continued in the 1960s by the entry of several Japanese laboratories such as Radio Research
Lab, NEC and Kyoto University to the arena. In the late 1960s, Martin and his colleagues at
RCA Laboratories developed a set of elementary time-normalisation methods, based on the
ability to reliably detect the presence of speech (Martin et al., 1964). Ultimately he founded
one of the first companies which built, marketed and sold speech recognition products.
During the 1970s, speech recognition research achieved a number of significant milestones,
firstly in the area of isolated word or discrete utterance recognition based on fundamental
studies by in Russia (Velichko & Zagoruyko, 1970), Japan (Sakoe & Chiba, 1978), and in the
United States (Itakura, 1975). Another milestone was the genesis of a longstanding group
effort toward large vocabulary speech recognition at IBM. Finally, researchers in AT&T Bell
Laboratories initiated a series of experiments aimed at making speech recognition systems
that were truly speaker independent (Rabiner et al., 1979). To achieve this goal,
sophisticated clustering algorithms were employed to determine the number of distinct
patterns required to represent all variations of different words across a wide population of
users. Over several years, this latter approach was progressed to the point at which the
techniques for handling speaker independent patterns are now well understood and widely
used.
Actually isolated word recognition was a key research focus in the 1970s, leading to
continuous speech recognition research in the 1980s. During this decade, a shift in
technology was observed from template-based approaches to statistical modelling,
including the hidden Markov model (HMM) approach (Rabiner et al., 1989). Another new
technology, reintroduced in the late 1980s, was the application of neural networks to speech
recognition. Several system implementations based on neural networks were proposed
(Weibel et al., 1989).

Speech
Recognition,
Technologies and Applications
480
The 1980s was characterised by a major impetus to large vocabulary, continuous speech
recognition systems led by the US Defense Advanced Research Projects Agency (DARPA)
community, which sponsored a research programme to achieve high word accuracy for a
thousand word continuous speech recognition database management task. Major research
contributions included Carnegie-Mellon University (CMU), inventors of the well known
Sphinx system (Lee et al., 1990), BBN with the BYBLOS system (Chow et al., 1987), Lincoln
Labs (Paul, 1989), MIT (Zue et al., 1989), and AT&T Bell Labs (Lee et al., 1990).
The support of DARPA has continued since then, promoting speech recognition technology
for a wide range of tasks. DARPA targets, and performance evaluations, have mostly been
based on the measurement of word (or sentence) error rates as the system figure of merit.
Such evaluations are conducted systematically over carefully designed tasks with
progressive degrees of difficulty, ranging from the recognition of continuous speech spoken
with stylized grammatical structure (as routinely used in military tasks, e.g., the Naval
Resource Management task) to transcriptions of live (off-the-air) news broadcasts (e.g. NAB,
involving a fairly large vocabulary over 20K words) and conversational speech.
In recent years, major attempts were focused on developing machines able communicate
naturally with humans. Having dialogue management features in which speech applications
are able to reach some desired state of understanding by making queries and confirmations
(like human-to-human speech communications), are the main characteristics of these recent
steps. Among such systems, Pegasus and Jupiter developed at MIT, have been particularly
noteworthy demonstrators (Glass & Weinstein, 2001), and the How May I Help You
(HMIHY) system at AT&T has been an equally noteworthy service first introduced as part
of AT&T Customer Care for their Consumer Communications Services in 2000 (Gorin, 1996).
Finally, we can say after almost five decades of research and many valuable achievements
along the way (Minker & Bennacef, 2004), the challenge of designing a machine that truly
understands speech as well as an intelligent human, still remains. However, the accuracy of
contemporary systems for specific tasks has gradually increased to the point where
successful real-world deployment is perfectly feasible.

Download 0,56 Mb.

1 2 3 4 5 6 7 8 9 ... 14

Download 0,56 Mb.

Pdf ko'rish