Classification using SVM(Support Vector Machine) Algorithm

Harika Bonthu
2 min readJun 9, 2020

In this document, we are going to build a very basic Classification model using the SVM Algorithm in Python.

First of all, when do we use Classification? Classification is used to classify/predict a multiclass categorical variable.

Some of the popular Classification algorithms are Decision Trees, Support Vector Machine, K-Nearest Neighbours, Random Forest, etc.

Introduction to Support Vector Machine: SVM is basically used to linearly separate the classes of the output variable by drawing a Classifier/hyperplane — for a 2D space, the hyperplane is a Line; for a 3D space, a hyperplane is a Plane.

Support Vector Machines (SVM) in 2D

In the above fig, the hyperplane H3 is the best or optimal classifier as it has maximum distance(called as margin) from the support vectors(the closest data points).

Kernel

A kernel is nothing but a transformation that we apply to the existing features so that we can draw a classifier easily for non-linearly separable datapoints.

Here, we are going to use the Fish dataset from Kaggle. I have downloaded the dataset and added it to my Github repository for easy access.

Here is how to add a file to Github repository and Read CSV data from Github

The Fish data set has 7 columns: Species, Weight, Length1, Length2, Length3, Height, Width. And our aim is to predict the ‘Species’ based on the rest of the features. Species is a categorical variable holding the values ‘Bream’, ‘Roach’, ‘Whitefish’, ‘Parkki’, ‘Perch’, ‘Pike’, ‘Smelt’.

Python Code for implementing SVM:

#install necessary libraries
pip install pandas matplotlib seaborn sklearn
#import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
# Create a dataframe for the Fish dataset
dataset_url = "https://raw.githubusercontent.com/harika-bonthu/02-linear-regression-fish/master/datasets_229906_491820_Fish.csv"
fish = pd.read_csv(dataset_url)
# Defining the features and target variables
# X -> features, y -> target
X = fish.drop(['Species'], axis = 'columns')
y = fish.Species
# Split the data into training/testing sets using train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.2)
# X_train, X_test, y_train, y_test
# training a linear SVM classifier
svm_model_linear = SVC(kernel = 'linear', C = 1)
svm_model_linear.fit(X_train, y_train)
# Training the model using fit method
svm_pred = svm_model_linear.predict(X_test)
# model accuracy for X_test. Score method takes X_test and predicts the output and calculates the accuracy by comparing the predicted outputs with y_test.
accuracy = svm_model_linear.score(X_test, y_test)

Check out the complete Jupyter Notebook.

Finally, we built a model that achieved 93% accuracy. We can always play around with the hyperparameters like C(regularization), gamma to see if we can get better accuracy.

--

--