Saturday, April 17, 2010

Support Vector Machines for Speech Recognition

Support Vector Machines are a way of classifying data. it says that SVM is considered to be easier than using neural networks, but I'm yet to provide a comment on that :).
These days I’m trying to find a way to apply a SVM to improve accuracy of our speech recognition application. To achieve this I'm going to use LIBSVM.

The procedure for training the SVM is to
  • Transform data to the format of an SVM package
  •  Randomly try a few kernels and parameters
  •  Test

First, I thought to check whether I could use this to classify and predict phonemes. First step was to prepare the input files.
The format of training and testing data file is: 

<label><index1>:<value1><index2><value2> 
.
.
.
Each line contains an instance and is ended by a '\n' character.  For
classification, file are only used to calculate accuracy or errors.
Hence I analyzed a voice signal using MATLAB to check whether I can represent phonemes in vector format. For this first I recorded a voice signal and saved in .wav format . When this signal is plotted, I observed that some phonemes are intuitively separable whereas some are not. However even these separable phonemes contain different number of samples. But according to input file format each data element need to be represented in a fixed size vector. So now, I’m trying to check whether I can represent every phoneme in a fixed sized vector using different transforms, filters etc.

Thursday, April 15, 2010

Thursday, April 8, 2010

Want to type e-mails in Sinhalese ??


The easiest way of typing from a native language is to use a transliteration application. For Sinhalese the most famous application was the UCSC real-time Unicode converter that can be found here.
Now this is becoming replaced with Google transliteration labs, which provides Unicode conversion support for number of languages including Sinhalese.  
Compared to Unicode converter this provides an intuitive transliteration scheme.

Transliteration Vs Translation


Translation is the process of converting meaning of a word from one language to another where as transliteration is the process of converting sound of a word to another language. Apparently, transliteration allows typing other languages phonetically in English letters through these converters.
To embed the feature to Gmail and make the life easy refer this.

Thursday, April 1, 2010

Using neural networks for speech recognition


Our 'Sinhala Speech Recognition system' is now showing an elementary behavior, but we need to improve it more in order to get a satisfactory performance. To achieve that we are using optimization techniques on several faces such as improving and training language model and acoustic model, noise filtering techniques and machine learning approaches.
I’m going to use a neural network based approach to overcome the uncertainty due to variations of user and environment noise.  My intention is to use this blog entry to provide a step by step process to illustrate how to use a neural network in speech recognition.

Introduction to neural networks
A neural network is a collection of unique processing elements named neurons which are connected with similar other elements. First artificial neural network was designed in late 50’s and it’s much simpler than any biological neural network. Biological neural networks are far more complex and lots of studies are happening to discover the secrets behind these biological systems. Neural networks perform their role in applications that have limitations of using regular computer programs such as image recognition, speech recognition and making decisions.
Neural network differ from regular programming due to its requirement of training before performing task where as in regular programming task is programmed.

Structure of a neural network
A neural network consists of set of inputs, a weight (that multiplies each input ) and output per neuron. The output is calculated by an activation function applied on the sum of all inputs that are multiplied by the weights.