Classification using SVM(Support Vector Machine) Algorithm
In this document, we are going to build a very basic Classification model using the SVM Algorithm in Python.
First of all, when do we use Classification? Classification is used to classify/predict a multiclass categorical variable.
Some of the popular Classification algorithms are Decision Trees, Support Vector Machine, K-Nearest Neighbours, Random Forest, etc.
Introduction to Support Vector Machine: SVM is basically used to linearly separate the classes of the output variable by drawing a Classifier/hyperplane — for a 2D space, the hyperplane is a Line; for a 3D space, a hyperplane is a Plane.
In the above fig, the hyperplane H3 is the best or optimal classifier as it has maximum distance(called as margin) from the support vectors(the closest data points).
A kernel is nothing but a transformation that we apply to the existing features so that we can draw a classifier easily for non-linearly separable datapoints.
Here, we are going to use the Fish dataset from Kaggle. I have downloaded the dataset and added it to my Github repository for easy access.
Here is how to add a file to Github repository and Read CSV data from Github
The Fish data set has 7 columns: Species, Weight, Length1, Length2, Length3, Height, Width. And our aim is to predict the ‘Species’ based on the rest of the features. Species is a categorical variable holding the values ‘Bream’, ‘Roach’, ‘Whitefish’, ‘Parkki’, ‘Perch’, ‘Pike’, ‘Smelt’.
Python Code for implementing SVM:
#install necessary libraries
pip install pandas matplotlib seaborn sklearn#import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC# Create a dataframe for the Fish dataset
dataset_url = "https://raw.githubusercontent.com/harika-bonthu/02-linear-regression-fish/master/datasets_229906_491820_Fish.csv"
fish = pd.read_csv(dataset_url)# Defining the features and target variables
# X -> features, y -> target
X = fish.drop(['Species'], axis = 'columns')
y = fish.Species# Split the data into training/testing sets using train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.2)
# X_train, X_test, y_train, y_test# training a linear SVM classifier
svm_model_linear = SVC(kernel = 'linear', C = 1)
svm_model_linear.fit(X_train, y_train) # Training the model using fit method
svm_pred = svm_model_linear.predict(X_test)# model accuracy for X_test. Score method takes X_test and predicts the output and calculates the accuracy by comparing the predicted outputs with y_test.
accuracy = svm_model_linear.score(X_test, y_test)
Check out the complete Jupyter Notebook.
Finally, we built a model that achieved 93% accuracy. We can always play around with the hyperparameters like C(regularization), gamma to see if we can get better accuracy.