Speech Recognition for Smart Homes




Download 0,56 Mb.
Pdf ko'rish
bet12/14
Sana15.05.2024
Hajmi0,56 Mb.
#235098
1   ...   6   7   8   9   10   11   12   13   14
9. Audio aspects 
As mentioned in section 1, smart home VI provides a good implementation target for 
practical ASR: the set of users is small and can be predetermined (especially pre-trained, 
and thus switched-speaker-dependent ASR becomes possible), physical locations are well-
defined, the command set and grammar can be constrained, and many noise sources are 
already under the control of (or monitored by) a home control system. 
In terms of the user set, for a family home, each member would separately train the system 
to accommodate their voices. A speaker recognition system could then detect the speech of 
each user and switch the appropriate acoustic models into Sphinx. It would be reasonable 
for such a system to be usable only by a small group of people. 
Physical locations – the rooms in the house – will have relatively constant acoustic
characteristics, and thus those characteristics that can be catered for by audio pre-
processing. Major sources of acoustic noise, such as home theatre, audio entertainment 
systems, games consoles and so on, would likely be under the control of the VI system (or 
electronically connected to them) so that methods such as spectral subtraction (Boll, 1979) 
would perform well, having advanced knowledge of the interfering noise. 
It would also be entirely acceptable for a VI system, when being required to perform a more 
difficult recognition task, such as LVCSR for email dictation, to automatically reduce the 
audio volume of currently operating entertainment devices. 
Suitable noise reduction techniques for a smart home VI system may include methods such 
as adaptive noise cancellation (ANC) (Hataoka et al., 1998) or spectral subtraction which 
have been optimized for embedded use (Hataoka et al., 2002).
The largest difference between a smart home ASR deployment and one of the current 
computer-based or telephone-based dictation systems is microphone placement 
(McLoughlin, 2009): in the latter, headset or handset microphones are used which are close 
to the speakers mouth. A smart home system able to respond to queries anywhere within a 
room in the house would have a much harder recognition task to perform. Microphone 
arrays, steered by phase adjustments, are able to 'focus' the microphone on a speakers 
mouth (Dorf, 2006), in some cases, and with some success. 


Speech Recognition for Smart Homes 
491 
However more preferable is a method of encouraging users to direct their own speech in the 
same way that they do when interacting with other humans: they turn to face them, or at 
least move or lean closer. This behaviour can be encouraged in a smart home by providing a 
focus for the users. This might take the form of a robot head/face, which has an added 
advantage of being capable of providing expressions – a great assistance during a dialogue 
when, for example, lack of understanding can be communicated back to a user non-verbally.
This research is currently almost exclusively the domain of advanced Japanese researchers: 
see for example (Nakano et al., 2006). 
A reasonable alternative is the use of a mobile device, carried by a user, which they can 
speak into (Prior, 2008). This significantly simplifies the required audio processing, at the 
expense of requiring the user to carry such a device. 

Download 0,56 Mb.
1   ...   6   7   8   9   10   11   12   13   14




Download 0,56 Mb.
Pdf ko'rish