Description
- SVM is a supervised learning model
- SVM(Support Vector Machine) is used for classification
- SVR (Support Vector Regression) is used for regression
- Support vector machine is highly preferred by many as it produces significant accuracy with less computation power.
- The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space (N — the number of features) that distinctly classifies the data points.
- Sequential minimal optimization is the most used algorithm to train SVM, but you can train the SVM with another algorithm like Coordinate descent.
- Sequential minimal optimization
- Breaks the large quadratic programming into smaller pieces and solves.
- Quadratic programming, is solving mathematical optimization using quadratic functions.
- Example quadratic function: f(x) = ax2 + bx + c
- Coordinate descent
- Coordinate descent is an optimization algorithm that successively minimizes along coordinate directions to find the minimum of a function.
- Performs very well with limited amount of data.
- Uses kernel tricks to support non linear data.
- Commonly used for text and image classification.
- Good to know fact, SVM was developed in year 1970s by Vladimir Vapnik.
- SVM works to transform the data in the higher dimensions so that data points can be classified.
- Hyperplane
- Separator found between classes, for creating decision boundary.
- Data is transformed between classes for creating of hyperplane.
- Kernel function is the mathematical function used to for data transformation. Usually called as kernel tricks.
- The best decision boundary is called as hyperplane/hyperline.
- Support Vectors
- Data points from where the margins are calculated and maximized.
- The data points on the either closet side of the hyperplane are called support vectors.
- Below in the Fig1 the filled red and filled blue are the data points of two different classes.
- Margins
- Two more lines other than hyperplane.
- Margin is used to make the decision the classification.
- Below in the Fig3 there are two types of Margins using the data points.
- Hard/Small, refers to the decision boundary where all the data points are correctly classified, which could lead to overfitting while testing.
- Soft/Large, refers to the decision boundary where using the loss function maximizes the margin allowing misclassifications.
- The kernel function has regularization parameter called C, controls the trade-off between misclassification and maximization.
- Fig in the top of the blog, shows the original data points, then decision boundary is created, the data is transformed and then the hyperplane is created.
- SVR
- SVR uses the same principles as SVM and Linear regression (y=ax+b).
- Idea here is to minimize the error rate.
- Same the straight line is needed calling it as hyperplane and the support vectors between the decision boundary or margins for defining the curve/line.
- The new data points in the curve plotted for prediction.
- Unlike Linear regression, SVR works on the non linear data.
Visualization
![]() |
| Fig1: First graph shows the different possible margins, second graph shows the optimal hyperplane with maximum margin |
![]() |
| Fig2: Different hyperplanes after data transformation |
![]() |
| Fig3: Hard/Small Margin and Soft/Large Margin |
![]() |
| Fig4: For different gamma values classification hyperplane changes |
![]() |
| Fig5: Different kernel functions |
Implementation
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
clf = SVC(kernel='linear')
clf.fit(x_train,y_train)
y_pred = clf.predict(x_test)
print(accuracy_score(y_test,y_pred))
Hyper Parameters
- kernel
- Transforms the data into high dimensional for SVM to create hyper planes in solving non-linear problem.
- Fig5 above explains the few kernel values such as linear, rbf, polynomial, sigmoid.
- Linear kernel function is used when the data is linearly separator. For non linear other functions can be used.
- C (Regularization)
- Regularization parameter, manages the trade-off between maximum margin and misclassification.
- Lower the C value means small margin is created and minimizes the error in the training dataset, higher the C value larger margin allowing misclassification in the training dataset.
- Fig3 shows the variation given different C values, creating small/large margins.
- gamma
- Gamma manages the distance between the data points in the hyperplane.
- Small gamma value means small set of data points close to the margin.
- Large gamma value causing overfit as here large number of data points needs to be involved in classification of hyper planes.
- Fig4 shows the different gamma values making difference in classification.
Advantages
- Works efficiently in higher dimensions.
- Effective with less number of samples.
- Memory efficient.
- Regularization avoids over fitting.
- Handles non linear data using kernel trick.
- Variance tolerant, does not get impacted with small changes.
- Solves classification and regression problems.
- Works on text and image data.
- No assumptions are made of the data.
- Not much influence on the outliers.
Drawbacks
- Not suitable when number of sample is high.
- Does not performs well when classes overlap.
- Difficult to choose appropriate kernel.
- Challenging to understand and interpret.
- Large datasets takes large time to train.
- High computation needed for tuning the hyper parameters.
- Feature scaling is important.
Applications
- Face or object detection
- Face or object recognition
- Face expression classification
- Text categorization
- Image or data classification
- Handwriting recognition
- Bioinformatics
- Protein fold detection
- Texture classification
- Speech recognition
- Disease Diagnosis
References
https://www.svm-tutorial.com/2017/02/svms-overview-support-vector-machines/
https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47
https://scikit-learn.org/stable/_images/sphx_glr_plot_iris_svc_001.png
https://web.iitd.ac.in/~sumeet/tr-98-14.pdf
https://web.mit.edu/6.034/wwwbob/svm-notes-long-08.pdf








No comments:
Post a Comment