Shruti: Desktop Version
In recent years, it has become critical to bridge the gulf of between the man and the machine. The Internet has become an integral part of today’s life and the greatest knowledge repository on Earth. Technology for accessing the Internet and harnessing the myriad powers of the personal computer is a must if one is not to fall behind. The need of the hour is intelligent human-computer interfacing, enabling a wider community such as the rural neo-literates and pre-literates, the physically challenged (like the visually impaired and the speech impaired) to interact with computer systems in a natural way.
The speech interfaces like Shruti may have manifold uses. They could serve as:
• Computer interfaces for the visually challenged, for whom graphical interfaces are not viable.
• The voice of the speech impaired.
• Computer interfaces for neo-literates and pre-literates.
• Modules in software to help pre-literates learn languages using a computer.
• Interfacing modules in multilingual environments, where, depending on the need, the computer can talk in different languages.
Text to speech has been one of the greatest challenges of modern computational science. While the utterance of flat speech by a computer has been achievable – the greatest challenges in the field are to impose natural intonation and prosody based on the characteristics of the language, dialect, person and context.
The diagram shown below gives a complete idea of the modules of a Text to Speech converter. The diagram is detailed and gives a clear idea of a TTS converter:
Various techniques exist to convert a given text to speech. Initially, a grapheme to phoneme mapper is required to convert the given graphemes (the smallest unit of written language) to a list of phonemes (the smallest unit of spoken language). The next stage is to render the string of phonemes – to synthesize the speech. Speech synthesizers can be broadly classified into two different classes. Some synthesizers are articulatory where speech synthesis is controlled by parameters that represent the speech production system rather than the signal itself, the other being concatenative synthesizers where different signal units from a dictionary are concatenated to produce synthetic speech. However, the prime challenge in all cases is the quality of the sound produced and its naturalness.
The desktop version of Shruti implements the Text-to-Speech converter for regional languages like Hindi and Bengali using concatenative approach. Concatenative approach finds voice units corresponding to a Phoneme and concatenates them to produce the sound file. Smoothening algorithms are also applied on the concatenated speech and the noted improvements are achieved in this process.
After understanding the basic essence of Text-to-Speech software let’s quickly understand how the desktop version of Shruti is implemented.
|