11. Conclusion
The major components of a smart home ASR system currently exist within the speech
recognition research community, as the evolutionary result of half a century of applied and
academic research. The command-and-control application of appliances and devices within
the home, in particular the constrained grammar syntax, allows a recognizer such as Sphinx
Speech
Recognition,
Technologies and Applications
492
to operate with high levels of accuracy. Results are presented here which relate accuracy to
vocabulary size, and associate metrics for reducing vocabulary (and thus maximising
accuracy) through the use of restricted grammars for specialised applications.
Audio aspects related to the smart home, and the use of LVCSR for multi-user dictation
tasks are currently major research thrusts, as is the adaption of ASR systems for use in
embedded devices. The application of speech recognition for performing WWW queries is
probably particularly important for the adoption of such systems within a usable smart
home context, and this work is ongoing, and likely to be greatly assisted if current research
efforts towards a semantic web will impact the WWW as a whole.
The future of ASR within smart homes will be assured first by the creation of niche
applications which deliver to users in a friendly and capable fashion. That the technology
largely exists has been demonstrated here, although there is still some way to go before such
technology will be adopted by the general public.
12. References
Boll, S. F. (1979). “Suppression of acoustic noise in speech using spectral subtraction”, IEEE
Transactions on Signal Processing, Vol. 27, No. 2, pp. 113-120.
Chevalier, H.; Ingold, C.; Kunz, C.; Moore, C.; Roven, C.; Yamron, J.; Baker B.; Bamberg, P.;
Bridle, S.; Bruce, T.; Weader, A. (1996)“Large-vocabulary speech recognition in
specialized domains”, Proc. ICASSP, Vol. 1 pp. 217-220.
Chow, Y.L.; Dunham, M.O.; Kimball, O. A.; Krasner, M. A.;. Kubala, G. F ; Makhoul,
J.;Roucos, S. ; Schwartz, R. M. (1987). “BBYLOS: The BBN continuous speech
recognition system,” Proc. ICASSP., pp.89-92.
Davis, K. H.; Biddulph, R.; Balashek, S. (1952). “Automatic recognition of spoken digits”, J.
Acoust. Soc. Am., Vol 24, No. 6.
Dorf, C. (2006). Circuits, Signals, and Speech And Image Processing, CRC Press.
ECHONET Consortium (2008). Energy Conservation and Homecare Network,
www.echonet.gr.jp, last accessed July 2008.
Fry, D. B.; Denes, P. (1959). “The design and operation of the mechanical speech recognizer
at University College London”, J. British Inst. Radio Engr., Vol. 19, No. 4, pp. 211-
229.
Furui, S. (2001). “Toward flexible speech recognition-recent progress at Tokyo Institute of
Technology”, Canadian Conference on Electrical and Computer Engineering, Vol.
1, pp. 631-636.
Glass, J.; Weinstein, E. (2001). “SpeechBuilder: Facilitating Spoken Dialogue System
Development”, 7
th
European Conf. on Speech Communication and Technology,
Aalborg Denmark, pp. 1335-1338.
Gorin, A. L.; Parker, B. A.; Sachs, R. M. and Wilpon, J. G. (1996). “How May I Help You?”,
Proc. Interactive Voice Technology for Telecommunications Applications (IVTTA),
pp. 57-60.
Hataoka, N.; Kokubo, K.; Obuchi, Y.; Amano, A. (1998). “Development of robust speech
recognition middleware on microprocessor”, Proc. ICASSP, May, Vol. 2, pp. 837-
840.
Hataoka, N.; Kokubo, K.; Obuchi, Y.; Amano, A. (2002). “Compact and robust speech
recognition for embedded use on microprocessors”, IEEE Workshop on
Multimedia Signal Processing, pp. 288-291.
Speech Recognition for Smart Homes
493
Huggins-Daines, D.; Kumar, M.; Chan, A.; Black, A. W.; Ravishankar, M.; Rudnicky, A. I.
(2006). “PocketSphinx: a free, real-time continuous speech recognition system for
hand-held devices”, Proc. ICASSP, Toulouse.
Itakura, F. (1975). “Minimum prediction residual applied to speech recognition”, IEEE
Transactions on Acoustics, Speech, Signal Processing, pp.67-72.
Kamm, C. A.; Yang, K.M.; Shamieh, C. R.; Singhal, S. (1994). “Speech recognition issues for
directory assistance applications”, 2
nd
IEEE Workshop on Interactive Voice
Technology for Telecommunications Applications IVTTA94, May, pp. 15-19, Kyoto.
Kango, R.; Moore, R.; Pu, J. (2002). “Networked smart home appliances - enabling real
ubiquitous culture”, Proceedings of 5th International Workshop on Networked
Appliances, Liverpool.
Kango, R.; Pu, J.; Moore, R. (2002b). “Smart appliances of the future - delivering enhanced
product life cycles”, The 8
th
Mechatronics International Forum Conference,
University of Twente, Netherlands.
Kryter, K. D. (1995). The Handbook of Hearing and the Effects of Noise, Academic Press.
Lahrmann, A. (1998). “Smart domestic appliances through innovations”, 6th International
Conference on Microsystems, Potsdam, WE-Verlag, Berlin.
Lee, K. F. (1989). Automatic Speech Recognition: The Development of the Sphinx System,
Kluwer Academic Publishers.
Lee, K. F. ; Hon, H. W.; Reddy, D. R. (1990). “An overview of the Sphinx speech recognition
system”, IEEE Transactions on Acoustics, Speech, Signal Processing, vol.38(1), Jan,
pp. 35-45.
Lee, C. H.; Rabiner, L. R.; Peraccini, R.; Wilpon, J. G. (1990). “Acoustic modeling for large
vocabulary speech recognition”, Computer Speech and Language.
Levy, C.; Linares, G.; Nocera, P.; Bonastre, J. (2004). “Reducing computational and memory
cost for cellular phone embedded speech recognition system”, Proc. ICASSP, Vol. 5,
pp. V309-312, May.
Martin, T. B.; Nelson, A. L.; Zadell, H. J. (1964). “Speech recognition by feature abstraction
techniques”, Tech. Report AL-TDR-64-176, Air Force Avionics Lab.
McLoughlin, I.; Sharifzadeh, H. R. (2007). “Speech recognition engine adaptions for smart
home dialogues”, 6th Int. Conference on Information, Communications and Signal
Processing, Singapore, December.
McLoughlin, I. (2009). Applied Speech and Audio, Cambridge University Press, Jan.
McTear, M. F. (2004). Spoken Dialogue Technology: Toward The Conversational User
Interface, Springer Publications.
Miller, G. A.; Heise, G. A.; Lichten, W. (1951). “The intelligibility of speech as a function of
the context of the test materials”, Exp. Psychol. Vol. 41, pp. 329-335.
Minker, W.; Bennacef, S. (2004). Speech and Human-Machine Dialog, Kluwer Academic
Publishers.
Nakano, M.; Hoshino, A.; Takeuchi, J.; Hasegawa, Y.; Torii, T.; Nakadai, K.; Kato, K.;
Tsujino, H. (2006). “A robot that can engage in both task-oriented and non-task-
oriented dialogues”, 6
th
IEEE-RAS International Conference on Humanoid Robots,
pp. 404-411, December.
Paul, D. B. (1989). “The Lincoln robust continuous speech recognizer,” Proc. of ICASSP,
vol.1, pp. 449-452.
Speech
Recognition,
Technologies and Applications
494
Phadke, S.; Limaye, R.; Verma, S.; Subramanian, K. (2004). “On design and implementation
of an embedded automatic speech recognition system”, 17
th
International
Conference on VLSI Design, pp. 127-132.
Prior, S. (2008). “SmartHome system”, http://smarthome.geekster.com, last accessed July
2008.
Rabiner, L. R.; Levinson, S. E.; Rosenberg, A. E.; Wilpon, J. G. (1979). “Speaker independent
recognition of isolated words using clustering techniques”, IEEE Transactions on
Acoustics, Speech, Signal Processing, August.
Rabiner, L. R. (1989). “A tutorial on hidden markov models and selected applications in
speech recognition”, Proc. IEEE, pp. 257-286, February.
Rabiner, L. R. (1994). “Applications of voice processing to telecommunications”, In
proceedings of the IEEE, Vol. 82, No. 2, pp. 199-228, February.
Ravishankar, M. K. (1996). “Efficient algorithms for speech recognition”, Ph.D thesis,
Carnegie Mellon University, May.
Roy, D. (1999). “Networks for homes”, IEEE Spectrum, December, vol. 36(12), pp. 26-33.
Sakoe, H.; Chiba, S. (1978). “Dynamic programming algorithm optimization for spoken
word recognition”, IEEE Transactions on Acoustics, Speech, Signal Processing,
February, vol.26(1), pp. 43-49.
Sun, H.; Shue, L.; Chen, J. (2004). “Investigations into the relationship between measurable
speech quality and speech recognition rate for telephony speech”, Proc. ICASSP,
May, Vol. 1, pp.1.865-1.868.
Tan, Z. H.; Varga, I. (2008). Automatic Speech Recognition on Mobile Devices and over
Communication Networks, Springer Publications, pp. 1-23.
Tanenbaum, A, (1996). Computer Networks, 3rd ed. Upper Saddle River, N.J. London,
Prentice Hall.
Velichko, V. M.; Zagoruyko, N. G. (1970). “Automatic recognition of 200 words”,
International Journal of Man-Machine Studies, June, Vol.2, pp. 223-234.
Wang, Y. M.; Russell, W.; Arora, A.; Jagannathan, R. K. Xu, J. (2000). “Towards dependable
home networking: an experience report”, Proceedings of the International
Conference on Dependable Systems and Networks, p.43.
Weibel, A.; Hanazawa, T.; Hinton, G.; Shikano, K.; Lang, K. (1989). “Phoneme recognition
using time-delay neural networks”, IEEE Transactions on Acoustics, Speech, Signal
Processing, March, Vol.37(3), pp. 328-339.
Wikipedia, (2008). http://en.wikipedia.org/wiki/Semantic_web, last accessed July 2008.
Zue, V.; Glass, J.; Phillips, M.; Seneff, S. (1989). “The MIT summit speech recognition system:
a progress report”, Proceedings of DARPA Speech and Natural Language
Workshop, February, pp. 179-189.
View publication stats
|