• Generation Query Generation Query Generation Information Extraction Information
  • Request Matching Information Extraction Query Generation Query Generation Query
  • Presentation Request Phrasing
  • Generation Information Extraction Information Extraction Request Formatting Results
  • Speech Recognition for Smart Homes




    Download 0,56 Mb.
    Pdf ko'rish
    bet6/14
    Sana15.05.2024
    Hajmi0,56 Mb.
    #235098
    1   2   3   4   5   6   7   8   9   ...   14
    Interface 
    Multiplexer
    Local info & 
    Commands
    Microphone
    User 
    Interface 
    Multiplexer
    Local info & 
    Commands
    Re
    fin
    e
    Request 
    Matching
    Information 
    Extraction
    Query 
    Generation
    Query 
    Generation
    Query 
    Generation
    Information 
    Extraction
    Information 
    Extraction
    Request 
    Formatting
    Results 
    Presentation
    Request 
    Phrasing
    Semantic
    Web
    Local DB
    (Wikipedia)
    Web
    Re
    sul
    ts
    U
    se
    r R
    eq
    ue
    st
    Re
    fin
    e
    Request 
    Matching
    Information 
    Extraction
    Query 
    Generation
    Query 
    Generation
    Query 
    Generation
    Information 
    Extraction
    Information 
    Extraction
    Request 
    Formatting
    Results 
    Presentation
    Request 
    Phrasing
    Semantic
    Web
    Local DB
    (Wikipedia)
    Web
    Re
    sul
    ts
    U
    se
    r R
    eq
    ue
    st
    Request 
    Matching
    Information 
    Extraction
    Query 
    Generation
    Query 
    Generation
    Query 
    Generation
    Information 
    Extraction
    Information 
    Extraction
    Request 
    Formatting
    Results 
    Presentation
    Request 
    Phrasing
    Semantic
    Web
    Local DB
    (Wikipedia)
    Web
    Re
    sul
    ts
    U
    se
    r R
    eq
    ue
    st
    Fig. 2. Overall structure of the WWW vocal query access system. 


     Speech 
    Recognition, 
    Technologies and Applications 
    482 
    The semantic web currently being promoted and researched by Tim Berners-Lee and others 
    (see Wikipedia 2008), goes a long way towards providing a solution: it divorces the 
    graphical/textual nature of web pages from their information content. In the semantic web, 
    pages are based around information. This information can then be marked up and displayed 
    graphically
    if required. When designing smart home services benefiting from vocal 
    interactions of the semantic web, the same information could be marked up and presented 
    vocally, where the nature of the information warrants a vocal response (or the user requires 
    a vocal response). 
    There are three alternative methods of VI relating to the WWW resource: 

    The few semantic web pages (with information extracted and then, either as specified in 
    the page, or using local preferences, converted to speech), and then presented vocally. 

    HTML web pages, with information extracted, refined then presented vocally. 

    Vocally-marked up web pages, presented vocally. 
    Figure 2 shows the overall structure proposed by the authors for vocal access to the WWW. 
    On the left is the core vocal response system handling information transfer to and from the 
    user. A user interface and multiplexer allow different forms of information to be combined 
    together. Local information and commands relate to system operation: asking the computer 
    to repeat itself, take and replay messages, give the time, update status, increase volume and 
    so on. For the current discussion, it is the ASR aspects of the VI system which are most 
    interesting: 
    User requests are formatted into queries, which are then phrased as required and issued 
    simultaneously to the web, the semantic web and a local Wikipedia database. The semantic 
    web is preferred, followed by Wikipedia and then the WWW. 
    WWW responses can then be refined by the local Wikipedia database. For example too 
    many unrelated hits in Wikipedia indicate that query adjustments may be required. 
    Refinement may also involve asking the user to choose between several options, or may 
    simply require rephrasing the question presented to the information sources. Since the 
    database is local, search time is almost instantaneous, allowing a very rapid request for 
    refinement of queries to be put to the user if required before the WWW search may have 
    completed. 
    Finally, results are obtained as either Wikipedia information, web pages or semantic 
    information. These are analysed, formatted, and presented to the user. Depending on the 
    context, information type and amount, the answer is either given vocally, graphically or 
    textually. A query cache and learning system (not shown) can be used to improve query 
    processing and matching based on the results of previous queries. 
    4.3 Dictation 
    Dictation involves the automatic translation of speech into written form, and is 
    differentiated from other speech recognition functions mostly because user input does not 
    need to be interpreted (although doing so may well aid recognition accuracy), and usually 
    there is little or no dialogue between user and machine. 
    Dictation systems imply large vocabularies and, in some cases, an application will include 
    an additional specialist vocabulary for the application in question (McTear, 2004). Domain-
    specific systems can lead to increased accuracy. 


    Speech Recognition for Smart Homes 
    483 

    Download 0,56 Mb.
    1   2   3   4   5   6   7   8   9   ...   14




    Download 0,56 Mb.
    Pdf ko'rish