Thursday, 8 September 2022

Support Vector Machine - An Introduction



Description

  • SVM is a supervised learning model
  • SVM(Support Vector Machine) is used for classification
  • SVR (Support Vector Regression) is used for regression
  • Support vector machine is highly preferred by many as it produces significant accuracy with less computation power.
  • The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space (N — the number of features) that distinctly classifies the data points.
  • Sequential minimal optimization is the most used algorithm to train SVM, but you can train the SVM with another algorithm like Coordinate descent.
    • Sequential minimal optimization
      • Breaks the large quadratic programming into smaller pieces and solves.
      • Quadratic programming, is solving mathematical optimization using quadratic functions.
      • Example quadratic function: f(x) = ax2 + bx + c
    • Coordinate descent
      • Coordinate descent is an optimization algorithm that successively minimizes along coordinate directions to find the minimum of a function.
  • Performs very well with limited amount of data.
  • Uses kernel tricks to support non linear data.
  • Commonly used for text and image classification.
  • Good to know fact, SVM was developed in year 1970s by Vladimir Vapnik.

Algorithm
  • SVM works to transform the data in the higher dimensions so that data points can be classified.
  • Hyperplane
    • Separator found between classes, for creating decision boundary.
    • Data is transformed between classes for creating of hyperplane.
    • Kernel function is the mathematical function used to for data transformation. Usually called as kernel tricks. 
    • The best decision boundary is called as hyperplane/hyperline.
  • Support Vectors
    • Data points from where the margins are calculated and maximized.
    • The data points on the either closet side of the hyperplane are called support vectors.
    • Below in the Fig1 the filled red and filled blue are the data points of two different classes.
  • Margins
    • Two more lines other than hyperplane.
    • Margin is used to make the decision the classification. 
    • Below in the Fig3 there are two types of Margins using the data points.
    • Hard/Small, refers to the decision boundary where all the data points are correctly classified, which could lead to overfitting while testing.
    • Soft/Large, refers to the decision boundary where using the loss function maximizes the margin allowing misclassifications.
    • The kernel function has regularization parameter called C, controls the trade-off between misclassification and maximization.
  • Fig in the top of the blog, shows the original data points, then decision boundary is created, the data is transformed and then the hyperplane is created.
  • SVR 
    • SVR uses the same principles as SVM and Linear regression (y=ax+b).
    • Idea here is to minimize the error rate. 
    • Same the straight line is needed calling it as hyperplane and the support vectors between the decision boundary or margins for defining the curve/line. 
    • The new data points in the curve plotted for prediction.
    • Unlike Linear regression, SVR works on the non linear data.


Visualization

Fig1: First graph shows the different possible margins, second graph shows the optimal hyperplane with maximum margin

Fig2: Different hyperplanes after data transformation

Fig3: Hard/Small Margin and Soft/Large Margin

Fig4: For different gamma values classification hyperplane changes
Fig5: Different kernel functions



Implementation


from sklearn.svm import SVC

from sklearn.metrics import accuracy_score


clf = SVC(kernel='linear')

clf.fit(x_train,y_train)

y_pred = clf.predict(x_test)

print(accuracy_score(y_test,y_pred))


Hyper Parameters

  • kernel
    • Transforms the data into high dimensional for SVM to create hyper planes in solving non-linear problem.
    • Fig5 above explains the few kernel values such as linear, rbf, polynomial, sigmoid.
    • Linear kernel function is used when the data is linearly separator. For non linear other functions can be used.
  • C (Regularization)
    • Regularization parameter, manages the trade-off between maximum margin and misclassification.
    • Lower the C value means small margin is created and minimizes the error in the training dataset, higher the C value larger margin allowing misclassification in the training dataset.
    • Fig3 shows the variation given different C values, creating small/large margins.
  • gamma
    • Gamma manages the distance between the data points in the hyperplane.
    • Small gamma value means small set of data points close to the margin.
    • Large gamma value causing overfit as here large number of data points needs to be involved in classification of hyper planes.
    • Fig4 shows the different gamma values making difference in classification.


Advantages

  • Works efficiently in higher dimensions.
  • Effective with less number of samples.
  • Memory efficient.
  • Regularization avoids over fitting.
  • Handles non linear data using kernel trick.
  • Variance tolerant, does not get impacted with small changes.
  • Solves classification and regression problems.
  • Works on text and image data.
  • No assumptions are made of the data.
  • Not much influence on the outliers.


Drawbacks

  • Not suitable when number of sample is high.
  • Does not performs well when classes overlap.
  • Difficult to choose appropriate kernel.
  • Challenging to understand and interpret. 
  • Large datasets takes large time to train.
  • High computation needed for tuning the hyper parameters.
  • Feature scaling is important.


Applications

  • Face or object detection
  • Face or object recognition
  • Face expression classification
  • Text categorization
  • Image or data classification
  • Handwriting recognition
  • Bioinformatics
  • Protein fold detection
  • Texture classification
  • Speech recognition
  • Disease Diagnosis 


References

https://www.svm-tutorial.com/2017/02/svms-overview-support-vector-machines/

https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47

https://scikit-learn.org/stable/_images/sphx_glr_plot_iris_svc_001.png

https://web.iitd.ac.in/~sumeet/tr-98-14.pdf

https://web.mit.edu/6.034/wwwbob/svm-notes-long-08.pdf

No comments:

Post a Comment

Scarcity Brings Efficiency: Python RAM Optimization

  In today’s world, with the abundance of RAM available, we rarely think about optimizing our code. But sooner or later, we hit the limits a...