Shruti: Desktop Version In recent years, it has become critical to bridge the gulf of between the man and the machine. The Internet has become an integral part of today’s life and the greatest knowledge repository on Earth. Technology for accessing the Internet and harnessing the myriad powers of the personal computer is a must if one is not to fall behind. The need of the hour is intelligent human-computer interfacing, enabling a wider community such as the rural neo-literates and pre-literates, the physically challenged (like the visually impaired and the speech impaired) to interact with computer systems in a natural way.
The speech interfaces like Shruti may have manifold uses. They could serve as:
• Computer interfaces for the visually challenged, for whom graphical interfaces are not viable.
• The voice of the speech impaired.
• Computer interfaces for neo-literates and pre-literates.
• Modules in software to help pre-literates learn languages using a computer.
• Interfacing modules in multilingual environments, where, depending on the need, the computer can talk in different languages.
Text to speech has been one of the greatest challenges of modern computational science. While the utterance of flat speech by a computer has been achievable – the greatest challenges in the field are to impose natural intonation and prosody based on the characteristics of the language, dialect, person and context.
The diagram shown below gives a complete idea of the modules of a Text to Speech converter. The diagram is detailed and gives a clear idea of a TTS converter:
Various techniques exist to convert a given text to speech. Initially, a grapheme to phoneme mapper is required to convert the given graphemes (the smallest unit of written language) to a list of phonemes (the smallest unit of spoken language). The next stage is to render the string of phonemes – to synthesize the speech. Speech synthesizers can be broadly classified into two different classes. Some synthesizers are articulatory where speech synthesis is controlled by parameters that represent the speech production system rather than the signal itself, the other being concatenative synthesizers where different signal units from a dictionary are concatenated to produce synthetic speech. However, the prime challenge in all cases is the quality of the sound produced and its naturalness.
The desktop version of Shruti implements the Text-to-Speech converter for regional languages like Hindi and Bengali using concatenative approach. Concatenative approach finds voice units corresponding to a Phoneme and concatenates them to produce the sound file. Smoothening algorithms are also applied on the concatenated speech and the noted improvements are achieved in this process.
After understanding the basic essence of Text-to-Speech software let’s quickly understand how the desktop version of Shruti is implemented.
3.1 Features of Shruti
The front-end of the software is written using Java. Refer to the block diagram shown above. The front-end is used to take the input text and produce the output sound file. The processing is done by two backend dynamic link libraries.
There are two backend dynamic link libraries that are written using C and which implements two important parts of the Text-to-Speech synthesizer.
The first dynamic link library implements the Natural Language Processing (NLP) unit which will be referred as Hindianalyser module in remaining thesis.
The second dynamic link library implements the Indian Language Phonetic Synthesizer (ILPS) unit which will be referred as Hindiengine module in remaining thesis.
These dynamic link libraries are loaded at the runtime when required and the
appropriate functions from the library will be called.
3.2 Overview of Shruti
The processing part of the desktop based (Win32 API) text-to-speech software can be divided into 2 sub modules:
HindiAnalyser: It takes the input supplied by the frontend to produce tokens which corresponds to a unique sound clip of the sound library.
HindiEngine: It takes the tokens and sound units from the library and generates the whole sound clip. After generation smoothening algorithms are applied for a smooth speech.
The frontend is responsible to take the input and to play the wav file generated by the backend.
The next figure shows a dataflow diagram for the software. Dataflow diagrams always facilitate the understanding of a software product.
The front-end is written using Java and an important feature of this implementation is to call the dynamic link libraries made by Visual C from Java program. This is done using Java Native Interfaces. In the code for the dynamic link library made by Visual C , the following code snippet is added:
This function of the dll can be accessed from the java code. Two files called “jni.h” and “jni_md.h” are included during the build process. See the references to find the source code of this implementation.
Java Developer Kit (JDK) should be installed on the desktop computer running the software. JDK is bulky software so it is not possible to use JDK for Embedded Shruti where memory is a main concern and in such a case installing JDK is more of a burden than of any substantial use. Therefore Embedded Shruti uses Windows CE API and Microsoft Foundation Classes customized for Windows CE. Such an implementation don’t need any JDK on the hardware (Pocket-PC in this case) on which the software will be executed.
Now in the implementation completely using Windows CE API and MFC customized for Windows CE a dynamic link library (mfcce400d.dll) of size 819 KB is required which is considerably smaller than the JDK. The JDK for Windows CE with least features has a size of 8.5 MB.
The backend dlls are made using Embedded Visual C and transferred on the system folder of the device running Windows CE.
Win CE Hindianalyser dll: 43 KB Win32 version : 256 KB
Win CE Hindiengine dll: 29 KB Win32 version : 260 KB
would have required at least 1 MB of memory. But the port of GDBM to Windows CE which is used in Embedded Shruti require only a dynamic link library called gdbmce.dll which is of size 31KB and it is appropriate for the application since a hash based structure was needed rather than a database which implements SQL queries.
The names and the sizes of the dynamic link libraries that will be needed to run Embedded Shruti are the following:
1. gdbmce.dll 31 KB (for database application)
2. hindianalyser.dll 43 KB NLP module
3. hindiengine.dll 29 KB ILPS module
4. mfcce400d.dll 819 KB (for standard SDK emulation)
This data shows that this implementation needs much less disk space compared to an implementation that uses JDK and build the software on top of it.
The next chapter will explain the different implementations of Embedded Shruti one by one and the key features of each implementation will be provided. Each implementation is referred as a model. The performance comparison will be provided subsequently.
Embedded Shruti Last chapter introduced the desktop version of Shruti and the structure of the source code was explained along with the dataflow diagram for the software. In this chapter different models of Embedded Shruti will be explained one by one and the drawbacks of each model will be sited which resulted into a new modified and efficient model.
4.1 Model 1: Windows CE crude port
This is the first model of Embedded Shruti. It started with the source code of Win32 version and first of all the structure of the native source code is identified. The points are identified where the API functions that are used in native code are not supported in Windows CE API. At all these points the modifications will be done accordingly so that the native code remains consistent. The input output characteristic of the native code should not be changed.
Embedded Shruti is designed in a modular way. There are three modules in Model 1. These are the following:
Frontend was designed using Java in the native Win32 code but in Embedded Shruti it’s designed using MFC customized for Windows CE in eMbedded Visual C .
The Frontend has a dialog box having the following contents:
1. Input Text Box: It takes the text input from the user which is to be changed to speech. The input should be in Hindi/Bengali at present. If multilingual keyboard is not there spell the Bengali/Hindi words using English alphabets and then fed the English alphabets into the text box.
2. Analyse Button: Analyse button on clicking read the input text from the text box and then write the text into a temporary file on the device called “TextIscii.txt”. This file will be read later on by the dynamic link library. Now after saving the input text on a file it loads the dynamic link library for Natural Language Processing called as hindianalyser.dll. Code snippet for loading dll is provided. The dll should export the functions which other executable can call. The method of exporting the functions from a dll will be given shortly. Before that the procedure to load a dll and call an exported function from executable code is given.
If the library is not loaded successfully then hInst1 will be null. Once the library is loaded into the main memory the exported function from the library is accessed using the function pointer. The functions exported from the dynamic link libraries can be accessed only by the function pointer.
//Getting the address of the analyser function into the function pointer
Analyse is the name of the function exported from the dynamic link library. The functions that were exported by the dynamic link library are mentioned on .def(definition) file of the dynamic link library source code. A typical .def file will look like:
The name of the library is specified on the first line of the def file which is called Analyser library in this case. After that there are a list of functions that are exported from the dll which are mentioned under the EXPORTS header. There may be a number of function exported by a dll. There should be a function in dll that starts with the name as mentioned under EXPORTS tag. The pointer to that function will be copied into the function pointer from the calling program (the executable in this case) and the function is called with appropriate inputs.
pFunction called above will be NULL if there is no such function exported by the dll.
if(pFunction = = NULL)
MessageBox(L"Unable to load the Analyse function");
MessageBox(L"Analyse function exported from the dll called");
Once the function pointer is obtained in pFunction, the function can be called with DWORD as parameter and as the return type is integer it will return an integer value after processing the input text file “TextAscii.txt”.
After the use of library is over it’s always advisable to Free the library. As the dynamic link libraries are loaded on RAM, for devices running Windows CE which have very limited RAM space it’s advisable to unload the dll as soon as the work is done.
//Unloading a dynamic link library
The methods discussed above a necessary to do operations related to dynamic link libraries. The next important difference between the native Win32 source code and the Windows CE version are the file operations. As already mentioned above on clicking the Analyse button the text input is saved on a file in disk. Windows CE doesn’t support file operations like fopen, fread, fclose, fseek and so on. Therefore while porting it is very important to find the equivalent of each of these file operations using Windows CE API.
In Windows CE API all devices are accessed by handles. The developer can access a file on disk, or a USB port or a sound device using handles. No other layer is defined like fopen and fseek. The following code snippets will show how to create a file, read a file and write a file using Windows CE API.
A complete reference to the CreateFile function is provided below:
This function creates, opens, or truncates a file, communications resource, disk device, or console. It returns a handle that can be used to access the object. It can also open and return a handle to a directory.
Pointer to a null-terminated string that specifies the name of the object (file, communications resource, disk device, console, or directory) to create or open.
If *lpFileName is a path, there is a default string size limit of MAX_PATH characters. This limit is related to how the CreateFile function parses paths.
When lpFileName points to a communications resource to open, the developer must include a colon after the name. For example, specify "COM1: " to open that port.
Specifies the type of access to the object. An application can obtain read access; write access, read-write access, or device query access. This parameter can be any combination of the following values.
Specifies device query access to the object. An application can query device attributes without accessing the device.
Specifies read access to the object. Data can be read from the file and the file pointer can be moved. Combine with GENERIC_WRITE for read-write access.
Specifies write access to the object. Data can be written to the file and the file pointer can be moved. Combine with GENERIC_READ for read-write access.
Specifies how the object can be shared. If dwShareMode is 0, the object cannot be shared. Subsequent open operations on the object will fail, until the handle is closed.
To share the object, use a combination of one or more of the following values:
Subsequent open operations on the object will succeed only if read access is requested.
Subsequent open operations on the object will succeed only if write access is requested.
Ignored; set to NULL.
Specifies which action to take on files that exist, and which action to take when files do not exist. For more information about this parameter, see the Remarks section. This parameter must be one of the following values:
Creates a new file. The function fails if the specified file already exists.
Creates a new file. If the file exists, the function overwrites the file and clears the existing attributes.
Opens the file. The function fails if the file does not exist.
Opens the file, if it exists. If the file does not exist, the function creates the file as if dwCreationDisposition were CREATE_NEW.
Opens the file. Once opened, the file is truncated so that its size is zero bytes. The calling process must open the file with at least GENERIC_WRITE access. The function fails if the file does not exist.
Specifies the file attributes and flags for the file.
Any combination of the following attributes is acceptable for the dwFlagsAndAttributes parameter, except all other file attributes override FILE_ATTRIBUTE_NORMAL.
The file should be archived. Applications use this attribute to mark files for backup or removal.
The file is hidden. It is not to be included in an ordinary directory listing.
The file has no other attributes set. This attribute is valid only if used alone.
The file is read only. Applications can read the file but cannot write to it or delete it.
The file is part of or is used exclusively by the operating system.
Ignored; as a result, CreateFile does not copy the extended attributes to the new file.
An open handle to the specified file indicates success. If the specified file exists before the function call and dwCreationDisposition is CREATE_ALWAYS or OPEN_ALWAYS, a call to GetLastError returns ERROR_ALREADY_EXISTS, even though the function has succeeded. If the file does not exist before the call, GetLastError returns zero. INVALID_HANDLE_VALUE indicates failure. To get extended error information, call GetLastError.
//Read a file
The file can be read by the handler only if it is opened in GENERIC_READ mode using CreateFile. Thus the call to read file must come after the file is opened appropriately.
Here the file is read into the character array rbuff where cBytes specifies number of bytes to be read and readBytes will contain the number of bytes actually read from the file. readBytes is passed by address so that it can be modified in ReadFile and the modifications will be visible in the calling function.
API reference to ReadFile:
This function reads data from a file, starting at the position indicated by the file pointer. After the read operation has been completed, the file pointer is adjusted by the number of bytes actually read.
Handle to the file to be read. The file handle must have been created with GENERIC_READ access to the file. This parameter cannot be a socket handle.
Pointer to the buffer that receives the data read from the file.
Number of bytes to be read from the file.
Pointer to the number of bytes read. ReadFile sets this value to zero before doing any work or error checking.
Unsupported; set to NULL.
The ReadFile function returns when one of the following is true: the number of bytes requested has been read or an error occurs.
Nonzero indicates success. If the return value is nonzero and the number of bytes read is zero, the file pointer was beyond the current end of the file at the time of the read operation. Zero indicates failure. To get extended error information, call GetLastError.
//Write to a file
The file can be read by the handler only if it is opened in GENERIC_WRITE mode using CreateFile. Thus the call to read file must come after the file is opened appropriately.
Here the character array wbuff is written into the file specified by the exampleHandler where cBytes specifies number of bytes to be written and writeBytes will contain the number of bytes actually written into the file. writeBytes is passed by address so that it can be modified in WriteFile and the modifications will be visible in the calling function.
API Reference to WriteFile:
This function writes data to a file. WriteFile starts writing data to the file at the position indicated by the file pointer. After the write operation has been completed, the file pointer is adjusted by the number of bytes actually written.