Saturday, April 17, 2010

Support Vector Machines for Speech Recognition

Support Vector Machines are a way of classifying data. it says that SVM is considered to be easier than using neural networks, but I'm yet to provide a comment on that :).
These days I’m trying to find a way to apply a SVM to improve accuracy of our speech recognition application. To achieve this I'm going to use LIBSVM.

The procedure for training the SVM is to
  • Transform data to the format of an SVM package
  •  Randomly try a few kernels and parameters
  •  Test

First, I thought to check whether I could use this to classify and predict phonemes. First step was to prepare the input files.
The format of training and testing data file is: 

<label><index1>:<value1><index2><value2> 
.
.
.
Each line contains an instance and is ended by a '\n' character.  For
classification, file are only used to calculate accuracy or errors.
Hence I analyzed a voice signal using MATLAB to check whether I can represent phonemes in vector format. For this first I recorded a voice signal and saved in .wav format . When this signal is plotted, I observed that some phonemes are intuitively separable whereas some are not. However even these separable phonemes contain different number of samples. But according to input file format each data element need to be represented in a fixed size vector. So now, I’m trying to check whether I can represent every phoneme in a fixed sized vector using different transforms, filters etc.

No comments: