Upside Inside Out: Support Vector Machines for Speech Recognition

Support Vector Machines are a way of classifying data. it says that SVM is considered to be easier than using neural networks, but I'm yet to provide a comment on that :).

These days I’m trying to find a way to apply a SVM to improve accuracy of our speech recognition application. To achieve this I'm going to use LIBSVM.

The procedure for training the SVM is to

Transform data to the format of an SVM package
Randomly try a few kernels and parameters
Test

First, I thought to check whether I could use this to classify and predict phonemes. First step was to prepare the input files.

The format of training and testing data file is:

<label><index1>:<value1><index2><value2>
.
.
.

Each line contains an instance and is ended by a '\n' character. For
classification, is an integer indicating the class label
(multi-class is supported). For regression, is the target
value which can be any real number. For one-class SVM, it's not used
so can be any number. Except using precomputed kernels (explained in
another section), : gives a feature (attribute) value.
is an integer starting from 1 and is a real
number. Indices must be in ASCENDING order. Labels in the testing file are only used to calculate accuracy or errors.

Hence I analyzed a voice signal using MATLAB to check whether I can represent phonemes in vector format. For this first I recorded a voice signal and saved in .wav format . When this signal is plotted, I observed that some phonemes are intuitively separable whereas some are not. However even these separable phonemes contain different number of samples. But according to input file format each data element need to be represented in a fixed size vector. So now, I’m trying to check whether I can represent every phoneme in a fixed sized vector using different transforms, filters etc.

Upside Inside Out

Saturday, April 17, 2010

Support Vector Machines for Speech Recognition

No comments:

My results from http://sminds.com/mbti/

Blog Archive