Chapter 12: Machine Learning Algorithms
Introduction to Machine Learning
Machine Learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. ML algorithms build mathematical models based on training data to make predictions or decisions.
This chapter covers fundamental machine learning algorithms across different categories: supervised learning, unsupervised learning, and neural networks. Understanding these algorithms is essential for building intelligent systems, data analysis, and AI applications.
Machine Learning Categories
1. Supervised Learning
Algorithms learn from labeled training data to make predictions. The model is trained on input-output pairs and learns to map inputs to outputs.
- Classification: Predict discrete categories (e.g., spam/not spam)
- Regression: Predict continuous values (e.g., house prices)
2. Unsupervised Learning
Algorithms find patterns in data without labeled examples. The model discovers hidden structures in the data.
- Clustering: Group similar data points
- Dimensionality Reduction: Reduce number of features
3. Neural Networks
Algorithms inspired by biological neural networks. Deep learning uses multiple layers to learn complex patterns.
Supervised Learning Algorithms
Linear Regression
Predicts a continuous target variable using a linear relationship between features and target. The algorithm finds the best-fit line through the data points.
# Simple linear regression: y = mx + b
def linear_regression(X, y):
n = len(X)
mean_x = sum(X) / n
mean_y = sum(y) / n
# Calculate slope (m)
numerator = sum((X[i] - mean_x) * (y[i] - mean_y) for i in range(n))
denominator = sum((X[i] - mean_x) ** 2 for i in range(n))
m = numerator / denominator
# Calculate intercept (b)
b = mean_y - m * mean_x
return m, b
# Predict function
def predict(x, m, b):
return m * x + b
Logistic Regression
Used for binary classification. Uses the logistic function to map predictions to probabilities between 0 and 1.
import numpy as np
def sigmoid(z):
"""Sigmoid activation function"""
return 1 / (1 + np.exp(-z))
def logistic_regression_predict(X, weights, bias):
"""Predict probabilities using logistic regression"""
z = np.dot(X, weights) + bias
return sigmoid(z)
def cost_function(y_true, y_pred):
"""Binary cross-entropy loss"""
m = len(y_true)
cost = -(1/m) * np.sum(
y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)
)
return cost
Decision Trees
A tree-like model that makes decisions by splitting data based on feature values. Each node represents a decision, and leaves represent outcomes.
class DecisionNode:
def __init__(self, feature=None, threshold=None, left=None, right=None, value=None):
self.feature = feature
self.threshold = threshold
self.left = left
self.right = right
self.value = value # For leaf nodes
def build_decision_tree(X, y, max_depth=10):
"""Build decision tree using recursive splitting"""
if max_depth == 0 or len(set(y)) == 1:
return DecisionNode(value=max(set(y), key=y.count))
best_feature, best_threshold = find_best_split(X, y)
left_indices = X[:, best_feature] < best_threshold
right_indices = ~left_indices
left = build_decision_tree(X[left_indices], y[left_indices], max_depth - 1)
right = build_decision_tree(X[right_indices], y[right_indices], max_depth - 1)
return DecisionNode(best_feature, best_threshold, left, right)
K-Nearest Neighbors (KNN)
A simple, instance-based learning algorithm. Classifies data points based on the majority class of their k nearest neighbors.
import numpy as np
from collections import Counter
def euclidean_distance(point1, point2):
"""Calculate Euclidean distance"""
return np.sqrt(np.sum((point1 - point2) ** 2))
def knn_predict(X_train, y_train, X_test, k=3):
"""K-Nearest Neighbors prediction"""
predictions = []
for test_point in X_test:
# Calculate distances to all training points
distances = [euclidean_distance(test_point, train_point)
for train_point in X_train]
# Get k nearest neighbors
k_indices = np.argsort(distances)[:k]
k_nearest_labels = [y_train[i] for i in k_indices]
# Majority vote
most_common = Counter(k_nearest_labels).most_common(1)[0][0]
predictions.append(most_common)
return predictions
Support Vector Machines (SVM)
Finds the optimal hyperplane that separates classes with maximum margin. Effective for both linear and non-linear classification.
from sklearn import svm
# Linear SVM
def train_svm(X_train, y_train, kernel='linear'):
"""Train Support Vector Machine"""
model = svm.SVC(kernel=kernel)
model.fit(X_train, y_train)
return model
# The algorithm finds the hyperplane that maximizes the margin
# between classes, using support vectors (data points closest to the boundary)
Unsupervised Learning Algorithms
K-Means Clustering
Partitions data into k clusters by minimizing the sum of squared distances between data points and cluster centroids.
import numpy as np
def kmeans(X, k, max_iters=100):
"""K-Means clustering algorithm"""
n_samples, n_features = X.shape
# Initialize centroids randomly
centroids = X[np.random.choice(n_samples, k, replace=False)]
for _ in range(max_iters):
# Assign points to nearest centroid
distances = np.sqrt(((X - centroids[:, np.newaxis])**2).sum(axis=2))
labels = np.argmin(distances, axis=0)
# Update centroids
new_centroids = np.array([X[labels == i].mean(axis=0)
for i in range(k)])
# Check convergence
if np.allclose(centroids, new_centroids):
break
centroids = new_centroids
return centroids, labels
Principal Component Analysis (PCA)
Reduces dimensionality by finding principal components (directions of maximum variance) in the data.
import numpy as np
def pca(X, n_components=2):
"""Principal Component Analysis"""
# Center the data
X_centered = X - np.mean(X, axis=0)
# Calculate covariance matrix
cov_matrix = np.cov(X_centered.T)
# Eigenvalue decomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
# Sort by eigenvalues
idx = eigenvalues.argsort()[::-1]
eigenvectors = eigenvectors[:, idx]
# Select top n_components
components = eigenvectors[:, :n_components]
# Transform data
X_transformed = X_centered.dot(components)
return X_transformed, components
Neural Networks
Perceptron
The simplest neural network - a single neuron that can perform binary classification.
import numpy as np
class Perceptron:
def __init__(self, learning_rate=0.01, n_iterations=1000):
self.learning_rate = learning_rate
self.n_iterations = n_iterations
self.weights = None
self.bias = None
def fit(self, X, y):
"""Train the perceptron"""
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
for _ in range(self.n_iterations):
for idx, x_i in enumerate(X):
linear_output = np.dot(x_i, self.weights) + self.bias
y_predicted = self.activation(linear_output)
# Update weights
update = self.learning_rate * (y[idx] - y_predicted)
self.weights += update * x_i
self.bias += update
def activation(self, x):
"""Step activation function"""
return np.where(x >= 0, 1, 0)
def predict(self, X):
"""Make predictions"""
linear_output = np.dot(X, self.weights) + self.bias
return self.activation(linear_output)
Multi-Layer Perceptron (MLP)
A feedforward neural network with multiple layers. Uses backpropagation for training.
import numpy as np
def sigmoid(x):
"""Sigmoid activation function"""
return 1 / (1 + np.exp(-np.clip(x, -250, 250)))
def sigmoid_derivative(x):
"""Derivative of sigmoid"""
s = sigmoid(x)
return s * (1 - s)
class MLP:
def __init__(self, layers):
"""Initialize multi-layer perceptron"""
self.layers = layers
self.weights = []
self.biases = []
# Initialize weights and biases
for i in range(len(layers) - 1):
w = np.random.randn(layers[i], layers[i+1]) * 0.1
b = np.zeros((1, layers[i+1]))
self.weights.append(w)
self.biases.append(b)
def forward(self, X):
"""Forward propagation"""
activations = [X]
for i in range(len(self.weights)):
z = np.dot(activations[-1], self.weights[i]) + self.biases[i]
a = sigmoid(z)
activations.append(a)
return activations
def backward(self, activations, y):
"""Backward propagation"""
m = y.shape[0]
gradients_w = []
gradients_b = []
# Output layer error
error = activations[-1] - y
for i in reversed(range(len(self.weights))):
grad_w = np.dot(activations[i].T, error) / m
grad_b = np.sum(error, axis=0, keepdims=True) / m
gradients_w.insert(0, grad_w)
gradients_b.insert(0, grad_b)
if i > 0:
error = np.dot(error, self.weights[i].T) * sigmoid_derivative(activations[i])
return gradients_w, gradients_b
Evaluation Metrics
Classification Metrics
- Accuracy: (TP + TN) / (TP + TN + FP + FN)
- Precision: TP / (TP + FP)
- Recall: TP / (TP + FN)
- F1-Score: 2 × (Precision × Recall) / (Precision + Recall)
Regression Metrics
- Mean Squared Error (MSE): Average of squared differences
- Mean Absolute Error (MAE): Average of absolute differences
- R² Score: Coefficient of determination
Best Practices
- Data Preprocessing: Normalize features, handle missing values
- Train/Test Split: Separate data for training and evaluation
- Cross-Validation: Use k-fold cross-validation for robust evaluation
- Feature Engineering: Create meaningful features from raw data
- Regularization: Prevent overfitting with L1/L2 regularization
- Hyperparameter Tuning: Optimize algorithm parameters
Real-World Applications
- Image Recognition: Convolutional Neural Networks (CNNs)
- Natural Language Processing: Recurrent Neural Networks (RNNs), Transformers
- Recommendation Systems: Collaborative filtering, matrix factorization
- Fraud Detection: Anomaly detection, classification
- Medical Diagnosis: Classification, pattern recognition
- Autonomous Vehicles: Computer vision, reinforcement learning
What's Next?
Machine learning is a vast field. Continue learning:
- Chapter 11: Encryption Algorithms - Secure data with cryptography
- Deep Learning - Advanced neural network architectures
- Reinforcement Learning - Learning through interaction
- Natural Language Processing - Text analysis and generation