Friday, November 18, 2011

svm_toolkit: Examples

The svm_toolkit gem is now functional. There are some examples bundled with the gem, to illustrate how to use the library.

There are three stages to using libsvm:
  1. create or load a problem definition (an instance of class Problem)
  2. train a model with given parameters
  3. evaluate the model
The Problem Definition

Loading a problem definition from file

A fairly standard data format for training SVM models is the svmlight format. This format represents one instance per line. The line is separated into tokens by spaces. The first token is the class for the instance. The remaining tokens are index:value pairs, separated by colons. The advantage of this data format is where many features have the value of 0.

-1 1:1 3:-0.535714 5:-0.692308

The above line defines an instance with:
  • class value of -1
  • index 1 has value 1
  • index 2 has value 0 (default value)
  • index 3 has value -0.535714
  • index 4 has value 0 (default value)
  • index 5 has value -0.692308
Load a file in this format using the command:

Problem.from_file("australian_scale.txt")

Creating a problem definition in code

Sometimes, your data will need constructing, or some preprocessing from a different file format. The toolkit supports generating problem definitions from two arrays: an array of instance definitions, and an array of the instance labels.
 
# Sample dataset: the 'Play Tennis' dataset
# from T. Mitchell, Machine Learning (1997)
# --------------------------------------------
# Labels for each instance in the training set
# 1 = Play, 0 = Not
Labels = [0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0]

# Recoding the attribute values into range [0, 1]
Instances = [
[0.0,1.0,1.0,0.0],
[0.0,1.0,1.0,1.0],
[0.5,1.0,1.0,0.0],
[1.0,0.5,1.0,0.0],
[1.0,0.0,0.0,0.0],
[1.0,0.0,0.0,1.0],
[0.5,0.0,0.0,1.0],
[0.0,0.5,1.0,0.0],
[0.0,0.0,0.0,0.0],
[1.0,0.5,0.0,0.0],
[0.0,0.5,0.0,1.0],
[0.5,0.5,1.0,1.0],
[0.5,1.0,0.0,0.0],
[1.0,0.5,1.0,1.0]
]

The above definition creates 14 instances. Each row in the array Instances represents one instance, and simply lists the value for each feature. Each entry in the array Labels represents the label for the corresponding row in Instances.

For example, the third instance has feature values [0.5,1.0,1.0,0.0] and label 1.

Create a problem definition from these two arrays using the command:

Problem.from_array(Instances, Labels)

Train a Model


The most complex part of training a model is setting the parameters to use. The parameters depend on the problem to be solved, and the chosen kernel; more complete descriptions are available on the libsvm website. The most important two parameters are svm_type and kernel_type.

svm_type determines the type of problem to solve. A typical classification problem has type Parameter::C_SVC; libsvm also supports NU_SVC, ONE_CLASS, EPSILON_SVR, and NU_SVR.

kernel_type determines the kernel to use for building the model. There is a choice of Parameter::RBF, Parameter::LINEAR, Parameter::SIGMOID, and Parameter::POLY.

Depending on the kernel type, you will also want to set one or more of:
  • cost (for all kernel types)
  • degree (for the polynomial type)
  • gamma (for the radial-basis function and sigmoid types)
There are other settings for training purposes, but the above are enough for most purposes.

A Parameter instance is created, passing in default values for the above in map format, for example, to create a simple classification problem with an RBF kernel:

params = Parameter.new(
:svm_type => Parameter::C_SVC,
:kernel_type => Parameter::RBF,
:cost => 10,
:gamma => 4
)

After creating the parameters, training the model on a given problem set is as a simple as:

model = Svm.svm_train(TrainingSet, params)

Evaluate a Model

There is a convenience method to evaluate a model on a given dataset, returning a simple count of the number of errors made:

model.evaluate_dataset(TrainingSet, true)

The second boolean value is optional. Passing 'true' makes the method print out the expected and actual output of the model for each instance.

0 comments:

Post a Comment