4.2 Model 2: Windows CE port using GDBM (Without voice.db)
In Chapter 2 GNU Database manager was introduced and the reason for choosing it in place of Microsoft SQL server for CE was explained. The crude port discussed in last model has several disadvantages and in this model the linear scan required in epoch file was avoided using an extendible hash based database called GDBM. GNU Database manager was ported on Windows CE platform. Take a look at the source code of GDBMCE for details.
At this point of time it is important to understand the meaning of extendible hashing since this application needs a hash based database not a SQL supported database. Each line of the Epoch file contains the first entry as the token and it is followed by 4 different epoch values to be used in ILPS algorithm (Hindiengine module). Extendible hash based databases are very efficient when retrieval is to be done by the specified key value (in this case the token name like 0704) and the complexity of retrieval operation is O(1+alpha) where alpha is load factor which is nearly 0 for a balanced database. In this model the epoch file is read and saved in GDBM database using the token name as the key and the value being the epoch. Four epoch databases are made which contains the following:
epoch1.db Key is the token name and the value is the first epoch value specified on the line corresponding to that particular token name on the epoch.txt file.
epoch2.db Key is the token name and the value is the second epoch value specified on the line corresponding to that particular token name on the epoch.txt file.
epoch3.db Key is the token name and the value is the third epoch value specified on the line corresponding to that particular token name on the epoch.txt file.
epoch4.db Key is the token name and the value is the fourth epoch value specified on the line corresponding to that particular token name on the epoch.txt file.
For example take one line from epoch.txt file:
0165179 104 206 307 409
epoch1.db Key is 0165179, values is 104
epoch2.db Key is 0165179, values is 206
epoch3.db Key is 0165179, values is 307
epoch4.db Key is 0165179, values is 409
The present version of Embedded Shruti uses epoch1.db. For producing better quality speech later versions of the software might use the other epoch database files.
Intonation file is also saved into a GDBM database and used accordingly in the program. Thus the new model of the hindiengine can be represented by the following picture:
Both the epoch database and the intonation database are saved on disc(secondary storage rather than main memory or RAM). The advantage is that linear scan is avoided now and the epoch can be obtained in almost O(1) time provided the key value of the epoch which is the token name.
After understanding the basic structure of this model, let’s take a detailed look on extendible hashing and why it is the most efficient data structure when a values is to retrieved according to the key value.
|