Speech
Recognition,
Technologies
and Applications
478
and user satisfaction in a world where users expect to be surrounded and served by many
kinds of computers and digital consumer electronics products.
In parallel to this, advancements in networking have led to computer networks becoming
common in everyday life (Tanenbaum, 1996) – driven primarily by the Internet. This has
spawned
new services, and new concepts of cost-effective and convenient connectivity, in
particular wireless local-area networks. Such connectivity has in turn promoted the
adoption of digital infotainment.
Fig. 1. An illustration of the range and scope of potential smart home services, reproduced
by permission of ECHONET Consortium, Japan (ECHONET, 2008).
Recently trends reveal that consumers are more often buying bundles
of services in the area
of utilities and entertainment, while technical studies in the field of connected appliances
(Lahrmann, 1998; Kango et al., 2002b) and home networking (Roy, 1999) are showing
increasing promise, and increasing convergence in those areas. Figure 1 illustrates many of
the services that can be provided for various activities within a house (ECHONET, 2008). An
appliance can be defined as smart when it is 'an appliance whose
data is available to all
concerned at all times throughout its life cycle' (Kango et al., 2002). As a matter of fact, smart
appliances often use emerging technologies and communications methods (Wang et al.,
2000) to enable various services for both consumer and producer.
Here we define smart homes as those having characteristics such as central control of home
appliances, networking ability, interaction with users through intelligent
interfaces and so
on. When considering natural interaction with users, one of the most user-friendly methods
would be vocal interaction (VI). Most importantly, VI matches well the physical
environment of the smart home. A VI system that
can be accessed in the garage, bathroom,
bedroom and kitchen would require at least a distributed set of microphones and
loudspeakers, along with a centralised processing unit. A similar KMM solution will by
contrast require keyboard,
mouse and monitor in each room, or require the user to walk to a
centralised location to perform input and control. The former solution is impractical for cost
Speech Recognition for Smart Homes
479
and environmental reasons (imagine using KMM whilst in the shower), the latter solution is
not user-friendly.
Practical VI presupposes a viable two way communications
channel between user and
machine that frees the user from a position in front of KMM. It does not totally replace a
monitor – viewing holiday photographs is still more enjoyable with a monitor than through
a loudspeaker – and in some instances a keyboard or mouse will still be necessary: such as
entering or navigating complex technical documents. However a user-friendly VI system
can augment the other access methods, and be more
ubiquitous in accessibility, answering
queries and allowing control when in the shower, whilst walking up stairs, in the dark and
even during the messy process of stuffing a turkey.
The following sections focus on ASR issues as an enabling technology for VI in smart home
computing, beginning with an overview of ASR evolution and state-of-the art.