import numpy as np
class SVM:
def __init__(self, learning_rate=0.001, lambda_param=0.01, n_iters=1000):
self.lr = learning_rate
self.lambda_param = lambda_param
self.n_iters = n_iters
self.w = None
self.b = None
def fit(self, X, y):
= X.shape
n_samples, n_features = np.where(y <= 0, -1, 1)
y_ self.w = np.zeros(n_features)
self.b = 0
# Gradient descent
for _ in range(self.n_iters):
for idx, x_i in enumerate(X):
= y_[idx] * (np.dot(x_i, self.w) - self.b) >= 1
condition if condition:
self.w -= self.lr * (2 * self.lambda_param * self.w)
else:
self.w -= self.lr * (2 * self.lambda_param * self.w - np.dot(x_i, y_[idx]))
self.b -= self.lr * y_[idx]
def predict(self, X):
= np.dot(X, self.w) - self.b
linear_output return np.sign(linear_output)
Support Vector Machine implementation from scratch with Python
What is Support Vector Machines (SVMs)
Support Vector Machines (SVMs) are a type of machine learning algorithm used for classification and regression analysis. In particular, linear SVMs are used for binary classification problems where the goal is to separate two classes by a hyperplane.
The hyperplane is a line that divides the feature space into two regions. The SVM algorithm tries to find the hyperplane that maximizes the margin, which is the distance between the hyperplane and the closest points from each class. The points closest to the hyperplane are called support vectors and play a crucial role in the algorithm’s optimization process.
In linear SVMs, the hyperplane is defined by a linear function of the input features. The algorithm tries to find the optimal values of the coefficients of this function, called weights, that maximize the margin. This optimization problem can be formulated as a quadratic programming problem, which can be efficiently solved using standard optimization techniques.
In addition to finding the optimal hyperplane, SVMs can also handle non-linearly separable data by using a kernel trick. This technique maps the input features into a higher-dimensional space, where they might become linearly separable. The SVM algorithm then finds the optimal hyperplane in this transformed feature space, which corresponds to a non-linear decision boundary in the original feature space.
Linear SVMs have been widely used in many applications, including text classification, image classification, and bioinformatics. They have the advantage of being computationally efficient and easy to interpret. However, they may not perform well in highly non-linearly separable datasets, where non-linear SVMs may be a better choice.
SVM Code Implementation using python
Here’s an example implementation of a Support Vector Machine (SVM) classifier using gradient descent for optimization:
Key Concepts:
Margin: SVM aims to maximize the margin (distance between the hyperplane and the closest samples from each class).
Regularization: The regularization parameter (lambda_param) helps control the trade-off between maximizing the margin and minimizing classification error.
Gradient Descent: This is used to iteratively optimize the weights and bias.
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Assuming the SVM class is defined above and already imported
# Generate synthetic data with 2 classes
= datasets.make_blobs(n_samples=100, centers=2, random_state=42)
X, y = np.where(y == 0, -1, 1) # Convert labels to -1 and 1 instead of 0 and 1
y
# Split the data into training and test sets (80% training, 20% testing)
= train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_test, y_train, y_test
# Instantiate and train the SVM model
= SVM() # You can add parameters here like learning_rate, lambda_param, etc.
svm
svm.fit(X_train, y_train)
# Make predictions on the test set
= svm.predict(X_test)
y_pred
# Evaluate the model by calculating accuracy
= accuracy_score(y_test, y_pred)
accuracy print("Accuracy:", accuracy)
Accuracy: 1.0
In this case, since the data is simple and linearly separable, the SVM classifier likely achieves a perfect accuracy of 100%. However, for more complex data, you may see lower accuracy depending on the dataset’s difficulty.
Here’s key thing to note, we can fine-tune hyperparameters like learning rate (learning_rate), regularization strength (lambda_param), and the number of iterations (n_iters) to improve performance.
Visualization
import matplotlib.pyplot as plt
# Plot the training data
=(8, 6))
plt.figure(figsize0], X_train[:, 1], c=y_train, cmap='bwr', marker='o', label='Training data')
plt.scatter(X_train[:,
# Plot the decision boundary
= np.meshgrid(np.linspace(X[:, 0].min(), X[:, 0].max(), 100),
xx, yy 1].min(), X[:, 1].max(), 100))
np.linspace(X[:,
# Compute the decision function values on the grid
= np.dot(np.c_[xx.ravel(), yy.ravel()], svm.w) - svm.b
Z = Z.reshape(xx.shape)
Z
# Plot the contour of the decision boundary
=[0], linewidths=2, colors='k')
plt.contour(xx, yy, Z, levels
# Plot the support vectors (samples closest to the decision boundary)
= np.where(np.abs(np.dot(X_train, svm.w) - svm.b) <= 1)[0]
support_vectors 0], X_train[support_vectors, 1], c='yellow', marker='x', s=100, label='Support Vectors')
plt.scatter(X_train[support_vectors,
# Show plot with appropriate labels and legend
'Feature 1')
plt.xlabel('Feature 2')
plt.ylabel('SVM Classifier with Decision Boundary and Support Vectors')
plt.title(='best')
plt.legend(loc plt.show()
The resultant plot shows:
- Blue and Red dots representing the two different classes.
- A black line representing the decision boundary that separates the classes.
- Yellow ‘X’ markers for the support vectors, which are the critical data points closest to the decision boundary.