Speech Recognition for Smart Homes




Download 0,56 Mb.
Pdf ko'rish
bet3/14
Sana15.05.2024
Hajmi0,56 Mb.
#235098
1   2   3   4   5   6   7   8   9   ...   14
3. ASR development 
Half a century of ASR research has seen progressive improvements, from a simple machine 
responding to small set of sounds to advanced systems able to respond to fluently spoken 
natural language. To provide a technological perspective, some major highlights in the 
research and development of ASR systems are outlined: 
The earliest attempts in ASR research, in the 1950s, exploited fundamental ideas of acoustic-
phonetics, to try to devise systems for recognizing phonemes (Fry & Denes, 1959) and 
recognition of isolated digits from a single speaker (Davis et al., 1952). These attempts 
continued in the 1960s by the entry of several Japanese laboratories such as Radio Research 
Lab, NEC and Kyoto University to the arena. In the late 1960s, Martin and his colleagues at 
RCA Laboratories developed a set of elementary time-normalisation methods, based on the 
ability to reliably detect the presence of speech (Martin et al., 1964). Ultimately he founded 
one of the first companies which built, marketed and sold speech recognition products. 
During the 1970s, speech recognition research achieved a number of significant milestones
firstly in the area of isolated word or discrete utterance recognition based on fundamental 
studies by in Russia (Velichko & Zagoruyko, 1970), Japan (Sakoe & Chiba, 1978), and in the 
United States (Itakura, 1975). Another milestone was the genesis of a longstanding group 
effort toward large vocabulary speech recognition at IBM. Finally, researchers in AT&T Bell 
Laboratories initiated a series of experiments aimed at making speech recognition systems 
that were truly speaker independent (Rabiner et al., 1979). To achieve this goal, 
sophisticated clustering algorithms were employed to determine the number of distinct 
patterns required to represent all variations of different words across a wide population of 
users. Over several years, this latter approach was progressed to the point at which the 
techniques for handling speaker independent patterns are now well understood and widely 
used. 
Actually isolated word recognition was a key research focus in the 1970s, leading to 
continuous speech recognition research in the 1980s. During this decade, a shift in 
technology was observed from template-based approaches to statistical modelling
including the hidden Markov model (HMM) approach (Rabiner et al., 1989). Another new 
technology, reintroduced in the late 1980s, was the application of neural networks to speech 
recognition. Several system implementations based on neural networks were proposed 
(Weibel et al., 1989). 


 Speech 
Recognition, 
Technologies and Applications 
480 
The 1980s was characterised by a major impetus to large vocabulary, continuous speech 
recognition systems led by the US Defense Advanced Research Projects Agency (DARPA) 
community, which sponsored a research programme to achieve high word accuracy for a 
thousand word continuous speech recognition database management task. Major research 
contributions included Carnegie-Mellon University (CMU), inventors of the well known 
Sphinx system (Lee et al., 1990), BBN with the BYBLOS system (Chow et al., 1987), Lincoln 
Labs (Paul, 1989), MIT (Zue et al., 1989), and AT&T Bell Labs (Lee et al., 1990). 
The support of DARPA has continued since then, promoting speech recognition technology 
for a wide range of tasks. DARPA targets, and performance evaluations, have mostly been 
based on the measurement of word (or sentence) error rates as the system figure of merit. 
Such evaluations are conducted systematically over carefully designed tasks with 
progressive degrees of difficulty, ranging from the recognition of continuous speech spoken 
with stylized grammatical structure (as routinely used in military tasks, e.g., the Naval 
Resource Management task) to transcriptions of live (off-the-air) news broadcasts (e.g. NAB, 
involving a fairly large vocabulary over 20K words) and conversational speech. 
In recent years, major attempts were focused on developing machines able communicate 
naturally with humans. Having dialogue management features in which speech applications 
are able to reach some desired state of understanding by making queries and confirmations 
(like human-to-human speech communications), are the main characteristics of these recent 
steps. Among such systems, Pegasus and Jupiter developed at MIT, have been particularly 
noteworthy demonstrators (Glass & Weinstein, 2001), and the How May I Help You 
(HMIHY) system at AT&T has been an equally noteworthy service first introduced as part 
of AT&T Customer Care for their Consumer Communications Services in 2000 (Gorin, 1996).
Finally, we can say after almost five decades of research and many valuable achievements 
along the way (Minker & Bennacef, 2004), the challenge of designing a machine that truly 
understands speech as well as an intelligent human, still remains. However, the accuracy of 
contemporary systems for specific tasks has gradually increased to the point where 
successful real-world deployment is perfectly feasible. 

Download 0,56 Mb.
1   2   3   4   5   6   7   8   9   ...   14




Download 0,56 Mb.
Pdf ko'rish