Saturday, November 19, 2011

Creating RBF models with svm_toolkit

Support-vector machine models are very sensitive to their parameter settings. When creating a model, we need to test a range of parameters and select the parameters which give the best generalisation performance. Radial-basis functions are a popular kernel choice, but have two parameters which need tuning: cost and gamma.

The svm_toolkit contains some methods to make this search process simpler. The main addition is Svm.cross_validation_search, which takes five parameters:
  1. a dataset for training, this is an instance of the Problem class.
  2. a dataset to use for cross-validation, this is also an instance of the Problem class.
  3. an array of values to use for the cost parameter: it is recommended these be exponentially growing, in powers of two.
  4. an array of values to use for the gamma parameter: these are also recommended to be exponentially growing, in powers of two.
  5. optionally, a fifth parameter value of true will generate a contour plot of the cross-validation performance against the two parameters.
The method returns the best performing model, based on minimising the number of errors on the cross-validation dataset. (A future extension will support measures of performance other than overall accuracy.)

The image below shows an example contour plot. (The contour plot is drawn using PlotPackage.)


Rescaling Features


The Problem class provides a rescale method, which rescales each feature in the current problem to fall in a given range: the default is for the features to end up in the range [0, 1]. For example,

problem.rescale

will make sure the problem's features are all in the range [0, 1]

Training and Evaluation

Assuming our data are divided into training, cross-validation and test sets, the following program will train, optimise and evaluate an RBF model:
require "svm_toolkit"

# load in datasets to use for training, cross validation, and testing
TrainingData = Problem.from_file "training_set.dat"
CrossValData = Problem.from_file "cross_val_set.dat"
TestData = Problem.from_file "test_set.dat"

# Make sure all features are in range [0, 1]
TrainingData.rescale
CrossValData.rescale
TestData.rescale

# decide on the range of costs and gammas to search over
Costs = [-5, -3, -1, 0, 1, 3, 5, 8, 10, 13, 15].collect {|n| 2**n}
Gammas = [-15, -12, -8, -5, -3, -1, 1, 3, 5, 7, 9].collect {|n| 2**n}

# create the best model, and display the contour plot of results
best_model = Svm.cross_validation_search(
TrainingData,
CrossValData,
Costs,
Gammas,
true
)

# evaluate model on the test set
puts "Test set errors: #{best_model.evaluate_dataset(TestData)}"

# save the model for later use
best_model.save "model.dat"

0 comments:

Post a Comment