• 7. Embedded speech recognition
  • particular, a speech recognizer that provides a confidence level




    Download 0,56 Mb.
    Pdf ko'rish
    bet10/14
    Sana15.05.2024
    Hajmi0,56 Mb.
    #235098
    1   ...   6   7   8   9   10   11   12   13   14

    particular, a speech recognizer that provides a confidence level

    can tie in with sub-phrase 
    arguments to determine requests-for-clarification (RFC) which are themselves serviced 
    through examination of the interruptibility type, 
    T

    So given a recognition confidence level 
    C
    , an RFC will be triggered if: 
    log
    C
    γ
    < R
    (V)
    (2) 
    Where 
    γ
    is system and scale dependent, determined through system training. 
    Interruptibility type includes two super-classes of 'immediate'
     
    and 'end-of-phrase
    '

    Immediate interrupts may be verbal or non-verbal (a light, a tone, a gesture such as a raised 
    hand, or a perplexed look on a listeners’ face based on the designed interface). An 
    immediate interrupt would be useful either when the utterance is expected to be so long that 
    it is inconvenient to wait to the end, or when the meaning requires clarification up-front. An 
    example of an immediate interrupt would be during an email dictation, where the meaning 
    of an uncertain word needs to be checked as soon as the uncertainty is discovered – 
    reviewing a long sentence that has just been spoken in order to correct a single mistaken 
    word is both time consuming and clumsy in computer dialogue terms. 
    An end-of-phrase interrupt is located at a natural reply juncture, and could be entirely 
    natural to the speaker as in “did you ask me to turn on the light?” 
    7. Embedded speech recognition 
    Nowadays, embedded speech technology as an active research area attracts not only 
    researchers from academia but also industrial groups interested to invest in this promising 
    new market. Thus, more and more companies have launched embedded speech systems. 
    These provide alternative control interfaces for consumer appliances to replace knobs
    switches, buttons and so on. In specific niche applications, with limited vocabulary size, the 
    success of such niche products may well advance the public acceptance of speech 
    technology. Current examples include voice dialling for GSM telephones, and media 
    players. 
    As consumer devices become increasingly complex, naturally the range of features 
    increases, and thus it has become more and more difficult for users to produce the 
    appropriate sequences of key presses to set a control. A typical example is the inability of 


     Speech 
    Recognition, 
    Technologies and Applications 
    488 
    most people to use a remote control to set the timer on their video recorder to record 
    forthcoming broadcasts. In addition, as devices decrease in size, and average users increase 
    in age, manual manipulation has similarly become more difficult. From a system 
    architecture point of view, embedded speech recognition is now becoming considered a 
    simple approach to user interfacing. Adoption in the embedded sphere contrasts with the 
    more sluggish adoption of larger distributed system approaches (Tan & Varga, 2008). 
    However there is a price to be paid for such architectural simplicity: complex speech 
    recognition algorithms must run on under-resourced consumer devices. In fact, this forces 
    the development of special techniques to cope with limited resources in terms of computing 
    speed and memory on such system.
    Resource scarcity limits the available applications: on the other hand it forces algorithm 
    designers to optimise techniques in order to guarantee sufficient recognition performance 
    even in adverse conditions, on limited platforms, and with significant memory constraints 
    (Tan & Varga, 2008). Of course, ongoing advances in semiconductor technologies mean that 
    such constraints will naturally become less significant over time. 
    In fact, increased computing resources coupled with more sophisticated software methods 
    may be expected to narrow the performance differential between embedded and server-
    based recognition applications: the border between applications realized by these 
    techniques will narrow, allowing for advanced features such as natural language 
    understanding to become possible in an embedded context rather than simple command-
    and-control systems. At this point there will no longer be significant technological barriers 
    to use of embedded systems to create a smart VI-enabled home. 
    However at present, embedded devices typically have relatively slow memory access, and a 
    scarcity of system resources, so it is necessary to employ a fast and lightweight speech 
    recognition engine in such contexts. Several such embedded ASR systems have been 
    introduced in (Hataoka et al., 2002), (Levy et al., 2004), and (Phadke et al., 2004) for 
    sophisticated human computer interfaces within car information systems, cellular phones, 
    and interaction device for physically handicapped persons (and other embedded 
    applications) respectively. 
    It is also possible to perform speech recognition in smart homes by utilising a centralised 
    server which performs the processing, connected to a set of microphones and loudspeakers 
    scattered throughout a house: this requires significantly greater communications bandwidth 
    than a distributed system (since there may be arrays of several microphones in each 
    location, each with 16 bit sample depth and perhaps 20kHz sampling rate), introduces 
    communications delays, but allows the ASR engine to operate on a faster computer with 
    fewer memory constraints. 
    As the capabilities of embedded systems continue to improve, the argument for a 
    centralised solution will weaken. We confine the discussion here to a set of distributed 
    embedded systems scattered throughout a smart home, each capable of performing speech 
    recognition, and VI. Low-bandwidth communications between devices in such a scenario to 
    allow co-operative ASR (or CPU cycle-sharing) is an ongoing research theme of the authors, 
    but will not affect the basic conclusions at this stage. 
    In the next section, the open source Sphinx is described as a reasonable choice among 
    existing ASRs for smart home services. We will explain why Sphinx is suitable for utilisation 
    in smart homes as a VI core through examining its capabilities in an embedded speech 
    recognition context. 


    Speech Recognition for Smart Homes 
    489 

    Download 0,56 Mb.
    1   ...   6   7   8   9   10   11   12   13   14




    Download 0,56 Mb.
    Pdf ko'rish

    Bosh sahifa
    Aloqalar

        Bosh sahifa



    particular, a speech recognizer that provides a confidence level

    Download 0,56 Mb.
    Pdf ko'rish