Mastering Algorithms

Chapter 12: Machine Learning Algorithms

Introduction to Machine Learning

Machine Learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. ML algorithms build mathematical models based on training data to make predictions or decisions.

This chapter covers fundamental machine learning algorithms across different categories: supervised learning, unsupervised learning, and neural networks. Understanding these algorithms is essential for building intelligent systems, data analysis, and AI applications.

Machine Learning Categories

1. Supervised Learning

Algorithms learn from labeled training data to make predictions. The model is trained on input-output pairs and learns to map inputs to outputs.

  • Classification: Predict discrete categories (e.g., spam/not spam)
  • Regression: Predict continuous values (e.g., house prices)

2. Unsupervised Learning

Algorithms find patterns in data without labeled examples. The model discovers hidden structures in the data.

  • Clustering: Group similar data points
  • Dimensionality Reduction: Reduce number of features

3. Neural Networks

Algorithms inspired by biological neural networks. Deep learning uses multiple layers to learn complex patterns.

Supervised Learning Algorithms

Linear Regression

Predicts a continuous target variable using a linear relationship between features and target. The algorithm finds the best-fit line through the data points.


# Simple linear regression: y = mx + b
def linear_regression(X, y):
    n = len(X)
    mean_x = sum(X) / n
    mean_y = sum(y) / n
    
    # Calculate slope (m)
    numerator = sum((X[i] - mean_x) * (y[i] - mean_y) for i in range(n))
    denominator = sum((X[i] - mean_x) ** 2 for i in range(n))
    m = numerator / denominator
    
    # Calculate intercept (b)
    b = mean_y - m * mean_x
    
    return m, b

# Predict function
def predict(x, m, b):
    return m * x + b
                

Logistic Regression

Used for binary classification. Uses the logistic function to map predictions to probabilities between 0 and 1.


import numpy as np

def sigmoid(z):
    """Sigmoid activation function"""
    return 1 / (1 + np.exp(-z))

def logistic_regression_predict(X, weights, bias):
    """Predict probabilities using logistic regression"""
    z = np.dot(X, weights) + bias
    return sigmoid(z)

def cost_function(y_true, y_pred):
    """Binary cross-entropy loss"""
    m = len(y_true)
    cost = -(1/m) * np.sum(
        y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)
    )
    return cost
                

Decision Trees

A tree-like model that makes decisions by splitting data based on feature values. Each node represents a decision, and leaves represent outcomes.


class DecisionNode:
    def __init__(self, feature=None, threshold=None, left=None, right=None, value=None):
        self.feature = feature
        self.threshold = threshold
        self.left = left
        self.right = right
        self.value = value  # For leaf nodes

def build_decision_tree(X, y, max_depth=10):
    """Build decision tree using recursive splitting"""
    if max_depth == 0 or len(set(y)) == 1:
        return DecisionNode(value=max(set(y), key=y.count))
    
    best_feature, best_threshold = find_best_split(X, y)
    left_indices = X[:, best_feature] < best_threshold
    right_indices = ~left_indices
    
    left = build_decision_tree(X[left_indices], y[left_indices], max_depth - 1)
    right = build_decision_tree(X[right_indices], y[right_indices], max_depth - 1)
    
    return DecisionNode(best_feature, best_threshold, left, right)
                

K-Nearest Neighbors (KNN)

A simple, instance-based learning algorithm. Classifies data points based on the majority class of their k nearest neighbors.


import numpy as np
from collections import Counter

def euclidean_distance(point1, point2):
    """Calculate Euclidean distance"""
    return np.sqrt(np.sum((point1 - point2) ** 2))

def knn_predict(X_train, y_train, X_test, k=3):
    """K-Nearest Neighbors prediction"""
    predictions = []
    
    for test_point in X_test:
        # Calculate distances to all training points
        distances = [euclidean_distance(test_point, train_point) 
                     for train_point in X_train]
        
        # Get k nearest neighbors
        k_indices = np.argsort(distances)[:k]
        k_nearest_labels = [y_train[i] for i in k_indices]
        
        # Majority vote
        most_common = Counter(k_nearest_labels).most_common(1)[0][0]
        predictions.append(most_common)
    
    return predictions
                

Support Vector Machines (SVM)

Finds the optimal hyperplane that separates classes with maximum margin. Effective for both linear and non-linear classification.


from sklearn import svm

# Linear SVM
def train_svm(X_train, y_train, kernel='linear'):
    """Train Support Vector Machine"""
    model = svm.SVC(kernel=kernel)
    model.fit(X_train, y_train)
    return model

# The algorithm finds the hyperplane that maximizes the margin
# between classes, using support vectors (data points closest to the boundary)
                

Unsupervised Learning Algorithms

K-Means Clustering

Partitions data into k clusters by minimizing the sum of squared distances between data points and cluster centroids.


import numpy as np

def kmeans(X, k, max_iters=100):
    """K-Means clustering algorithm"""
    n_samples, n_features = X.shape
    
    # Initialize centroids randomly
    centroids = X[np.random.choice(n_samples, k, replace=False)]
    
    for _ in range(max_iters):
        # Assign points to nearest centroid
        distances = np.sqrt(((X - centroids[:, np.newaxis])**2).sum(axis=2))
        labels = np.argmin(distances, axis=0)
        
        # Update centroids
        new_centroids = np.array([X[labels == i].mean(axis=0) 
                                  for i in range(k)])
        
        # Check convergence
        if np.allclose(centroids, new_centroids):
            break
        
        centroids = new_centroids
    
    return centroids, labels
                

Principal Component Analysis (PCA)

Reduces dimensionality by finding principal components (directions of maximum variance) in the data.


import numpy as np

def pca(X, n_components=2):
    """Principal Component Analysis"""
    # Center the data
    X_centered = X - np.mean(X, axis=0)
    
    # Calculate covariance matrix
    cov_matrix = np.cov(X_centered.T)
    
    # Eigenvalue decomposition
    eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
    
    # Sort by eigenvalues
    idx = eigenvalues.argsort()[::-1]
    eigenvectors = eigenvectors[:, idx]
    
    # Select top n_components
    components = eigenvectors[:, :n_components]
    
    # Transform data
    X_transformed = X_centered.dot(components)
    
    return X_transformed, components
                

Neural Networks

Perceptron

The simplest neural network - a single neuron that can perform binary classification.


import numpy as np

class Perceptron:
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None
        self.bias = None
    
    def fit(self, X, y):
        """Train the perceptron"""
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        for _ in range(self.n_iterations):
            for idx, x_i in enumerate(X):
                linear_output = np.dot(x_i, self.weights) + self.bias
                y_predicted = self.activation(linear_output)
                
                # Update weights
                update = self.learning_rate * (y[idx] - y_predicted)
                self.weights += update * x_i
                self.bias += update
    
    def activation(self, x):
        """Step activation function"""
        return np.where(x >= 0, 1, 0)
    
    def predict(self, X):
        """Make predictions"""
        linear_output = np.dot(X, self.weights) + self.bias
        return self.activation(linear_output)
                

Multi-Layer Perceptron (MLP)

A feedforward neural network with multiple layers. Uses backpropagation for training.


import numpy as np

def sigmoid(x):
    """Sigmoid activation function"""
    return 1 / (1 + np.exp(-np.clip(x, -250, 250)))

def sigmoid_derivative(x):
    """Derivative of sigmoid"""
    s = sigmoid(x)
    return s * (1 - s)

class MLP:
    def __init__(self, layers):
        """Initialize multi-layer perceptron"""
        self.layers = layers
        self.weights = []
        self.biases = []
        
        # Initialize weights and biases
        for i in range(len(layers) - 1):
            w = np.random.randn(layers[i], layers[i+1]) * 0.1
            b = np.zeros((1, layers[i+1]))
            self.weights.append(w)
            self.biases.append(b)
    
    def forward(self, X):
        """Forward propagation"""
        activations = [X]
        for i in range(len(self.weights)):
            z = np.dot(activations[-1], self.weights[i]) + self.biases[i]
            a = sigmoid(z)
            activations.append(a)
        return activations
    
    def backward(self, activations, y):
        """Backward propagation"""
        m = y.shape[0]
        gradients_w = []
        gradients_b = []
        
        # Output layer error
        error = activations[-1] - y
        
        for i in reversed(range(len(self.weights))):
            grad_w = np.dot(activations[i].T, error) / m
            grad_b = np.sum(error, axis=0, keepdims=True) / m
            
            gradients_w.insert(0, grad_w)
            gradients_b.insert(0, grad_b)
            
            if i > 0:
                error = np.dot(error, self.weights[i].T) * sigmoid_derivative(activations[i])
        
        return gradients_w, gradients_b
                

Evaluation Metrics

Classification Metrics

  • Accuracy: (TP + TN) / (TP + TN + FP + FN)
  • Precision: TP / (TP + FP)
  • Recall: TP / (TP + FN)
  • F1-Score: 2 × (Precision × Recall) / (Precision + Recall)

Regression Metrics

  • Mean Squared Error (MSE): Average of squared differences
  • Mean Absolute Error (MAE): Average of absolute differences
  • R² Score: Coefficient of determination

Best Practices

  • Data Preprocessing: Normalize features, handle missing values
  • Train/Test Split: Separate data for training and evaluation
  • Cross-Validation: Use k-fold cross-validation for robust evaluation
  • Feature Engineering: Create meaningful features from raw data
  • Regularization: Prevent overfitting with L1/L2 regularization
  • Hyperparameter Tuning: Optimize algorithm parameters

Real-World Applications

  • Image Recognition: Convolutional Neural Networks (CNNs)
  • Natural Language Processing: Recurrent Neural Networks (RNNs), Transformers
  • Recommendation Systems: Collaborative filtering, matrix factorization
  • Fraud Detection: Anomaly detection, classification
  • Medical Diagnosis: Classification, pattern recognition
  • Autonomous Vehicles: Computer vision, reinforcement learning

What's Next?

Machine learning is a vast field. Continue learning:

  • Chapter 11: Encryption Algorithms - Secure data with cryptography
  • Deep Learning - Advanced neural network architectures
  • Reinforcement Learning - Learning through interaction
  • Natural Language Processing - Text analysis and generation