particular, a speech recognizer that provides a confidence level

Download 0,56 Mb. Pdf ko'rish
bet	10/14
Sana	15.05.2024
Hajmi	0,56 Mb.
	#235098

1 ... 6 7 8 9 10 11 12 13 14

7. Embedded speech recognition

particular, a speech recognizer that provides a confidence level,
C
can tie in with sub-phrase
arguments to determine requests-for-clarification (RFC) which are themselves serviced
through examination of the interruptibility type,
T
.
So given a recognition confidence level
C
, an RFC will be triggered if:
log
C
γ
< R
(V)
(2)
Where
γ
is system and scale dependent, determined through system training.
Interruptibility type includes two super-classes of 'immediate'

and 'end-of-phrase
'
.
Immediate interrupts may be verbal or non-verbal (a light, a tone, a gesture such as a raised
hand, or a perplexed look on a listeners’ face based on the designed interface). An
immediate interrupt would be useful either when the utterance is expected to be so long that
it is inconvenient to wait to the end, or when the meaning requires clarification up-front. An
example of an immediate interrupt would be during an email dictation, where the meaning
of an uncertain word needs to be checked as soon as the uncertainty is discovered –
reviewing a long sentence that has just been spoken in order to correct a single mistaken
word is both time consuming and clumsy in computer dialogue terms.
An end-of-phrase interrupt is located at a natural reply juncture, and could be entirely
natural to the speaker as in “did you ask me to turn on the light?”
7. Embedded speech recognition
Nowadays, embedded speech technology as an active research area attracts not only
researchers from academia but also industrial groups interested to invest in this promising
new market. Thus, more and more companies have launched embedded speech systems.
These provide alternative control interfaces for consumer appliances to replace knobs,
switches, buttons and so on. In specific niche applications, with limited vocabulary size, the
success of such niche products may well advance the public acceptance of speech
technology. Current examples include voice dialling for GSM telephones, and media
players.
As consumer devices become increasingly complex, naturally the range of features
increases, and thus it has become more and more difficult for users to produce the
appropriate sequences of key presses to set a control. A typical example is the inability of

Speech
Recognition,
Technologies and Applications
488
most people to use a remote control to set the timer on their video recorder to record
forthcoming broadcasts. In addition, as devices decrease in size, and average users increase
in age, manual manipulation has similarly become more difficult. From a system
architecture point of view, embedded speech recognition is now becoming considered a
simple approach to user interfacing. Adoption in the embedded sphere contrasts with the
more sluggish adoption of larger distributed system approaches (Tan & Varga, 2008).
However there is a price to be paid for such architectural simplicity: complex speech
recognition algorithms must run on under-resourced consumer devices. In fact, this forces
the development of special techniques to cope with limited resources in terms of computing
speed and memory on such system.
Resource scarcity limits the available applications: on the other hand it forces algorithm
designers to optimise techniques in order to guarantee sufficient recognition performance
even in adverse conditions, on limited platforms, and with significant memory constraints
(Tan & Varga, 2008). Of course, ongoing advances in semiconductor technologies mean that
such constraints will naturally become less significant over time.
In fact, increased computing resources coupled with more sophisticated software methods
may be expected to narrow the performance differential between embedded and server-
based recognition applications: the border between applications realized by these
techniques will narrow, allowing for advanced features such as natural language
understanding to become possible in an embedded context rather than simple command-
and-control systems. At this point there will no longer be significant technological barriers
to use of embedded systems to create a smart VI-enabled home.
However at present, embedded devices typically have relatively slow memory access, and a
scarcity of system resources, so it is necessary to employ a fast and lightweight speech
recognition engine in such contexts. Several such embedded ASR systems have been
introduced in (Hataoka et al., 2002), (Levy et al., 2004), and (Phadke et al., 2004) for
sophisticated human computer interfaces within car information systems, cellular phones,
and interaction device for physically handicapped persons (and other embedded
applications) respectively.
It is also possible to perform speech recognition in smart homes by utilising a centralised
server which performs the processing, connected to a set of microphones and loudspeakers
scattered throughout a house: this requires significantly greater communications bandwidth
than a distributed system (since there may be arrays of several microphones in each
location, each with 16 bit sample depth and perhaps 20kHz sampling rate), introduces
communications delays, but allows the ASR engine to operate on a faster computer with
fewer memory constraints.
As the capabilities of embedded systems continue to improve, the argument for a
centralised solution will weaken. We confine the discussion here to a set of distributed
embedded systems scattered throughout a smart home, each capable of performing speech
recognition, and VI. Low-bandwidth communications between devices in such a scenario to
allow co-operative ASR (or CPU cycle-sharing) is an ongoing research theme of the authors,
but will not affect the basic conclusions at this stage.
In the next section, the open source Sphinx is described as a reasonable choice among
existing ASRs for smart home services. We will explain why Sphinx is suitable for utilisation
in smart homes as a VI core through examining its capabilities in an embedded speech
recognition context.

Speech Recognition for Smart Homes
489

Download 0,56 Mb.

1 ... 6 7 8 9 10 11 12 13 14

Download 0,56 Mb.

Pdf ko'rish