HelloWorldEngineer: Support Vector Machine

Description

SVM is a supervised learning model
SVM(Support Vector Machine) is used for classification
SVR (Support Vector Regression) is used for regression
Support vector machine is highly preferred by many as it produces significant accuracy with less computation power.
The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space (N — the number of features) that distinctly classifies the data points.
Sequential minimal optimization is the most used algorithm to train SVM, but you can train the SVM with another algorithm like Coordinate descent.

Sequential minimal optimization

Breaks the large quadratic programming into smaller pieces and solves.
Quadratic programming, is solving mathematical optimization using quadratic functions.
Example quadratic function: f(x) = ax2 + bx + c

Coordinate descent

Coordinate descent is an optimization algorithm that successively minimizes along coordinate directions to find the minimum of a function.

Performs very well with limited amount of data.
Uses kernel tricks to support non linear data.
Commonly used for text and image classification.
Good to know fact, SVM was developed in year 1970s by Vladimir Vapnik.

Algorithm

SVM works to transform the data in the higher dimensions so that data points can be classified.
Hyperplane

Separator found between classes, for creating decision boundary.
Data is transformed between classes for creating of hyperplane.
Kernel function is the mathematical function used to for data transformation. Usually called as kernel tricks.
The best decision boundary is called as hyperplane/hyperline.

Support Vectors

Data points from where the margins are calculated and maximized.
The data points on the either closet side of the hyperplane are called support vectors.
Below in the Fig1 the filled red and filled blue are the data points of two different classes.

Margins

Two more lines other than hyperplane.
Margin is used to make the decision the classification.
Below in the Fig3 there are two types of Margins using the data points.
Hard/Small, refers to the decision boundary where all the data points are correctly classified, which could lead to overfitting while testing.
Soft/Large, refers to the decision boundary where using the loss function maximizes the margin allowing misclassifications.

The kernel function has regularization parameter called C, controls the trade-off between misclassification and maximization.

Fig in the top of the blog, shows the original data points, then decision boundary is created, the data is transformed and then the hyperplane is created.
SVR

SVR uses the same principles as SVM and Linear regression (y=ax+b).
Idea here is to minimize the error rate.
Same the straight line is needed calling it as hyperplane and the support vectors between the decision boundary or margins for defining the curve/line.
The new data points in the curve plotted for prediction.
Unlike Linear regression, SVR works on the non linear data.

Visualization

Fig1: First graph shows the different possible margins, second graph shows the optimal hyperplane with maximum margin

Fig2: Different hyperplanes after data transformation

Fig3: Hard/Small Margin and Soft/Large Margin

Fig4: For different gamma values classification hyperplane changes

Fig5: Different kernel functions

Implementation

from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

clf = SVC(kernel='linear')
clf.fit(x_train,y_train)
y_pred = clf.predict(x_test)
print(accuracy_score(y_test,y_pred))

Hyper Parameters

kernel

Transforms the data into high dimensional for SVM to create hyper planes in solving non-linear problem.
Fig5 above explains the few kernel values such as linear, rbf, polynomial, sigmoid.
Linear kernel function is used when the data is linearly separator. For non linear other functions can be used.

C (Regularization)

Regularization parameter, manages the trade-off between maximum margin and misclassification.
Lower the C value means small margin is created and minimizes the error in the training dataset, higher the C value larger margin allowing misclassification in the training dataset.
Fig3 shows the variation given different C values, creating small/large margins.

gamma

Gamma manages the distance between the data points in the hyperplane.
Small gamma value means small set of data points close to the margin.
Large gamma value causing overfit as here large number of data points needs to be involved in classification of hyper planes.
Fig4 shows the different gamma values making difference in classification.

Advantages

Works efficiently in higher dimensions.
Effective with less number of samples.
Memory efficient.
Regularization avoids over fitting.
Handles non linear data using kernel trick.
Variance tolerant, does not get impacted with small changes.
Solves classification and regression problems.
Works on text and image data.
No assumptions are made of the data.
Not much influence on the outliers.

Drawbacks

Not suitable when number of sample is high.
Does not performs well when classes overlap.
Difficult to choose appropriate kernel.
Challenging to understand and interpret.
Large datasets takes large time to train.
High computation needed for tuning the hyper parameters.
Feature scaling is important.

Applications

Face or object detection
Face or object recognition
Face expression classification
Text categorization
Image or data classification
Handwriting recognition
Bioinformatics
Protein fold detection
Texture classification
Speech recognition
Disease Diagnosis

References

https://www.svm-tutorial.com/2017/02/svms-overview-support-vector-machines/

https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47

https://scikit-learn.org/stable/_images/sphx_glr_plot_iris_svc_001.png

https://web.iitd.ac.in/~sumeet/tr-98-14.pdf

https://web.mit.edu/6.034/wwwbob/svm-notes-long-08.pdf

HelloWorldEngineer

Thursday, 8 September 2022

Support Vector Machine - An Introduction

No comments:

Post a Comment

Scarcity Brings Efficiency: Python RAM Optimization