Linear Regression

📈 Predicting Continuous Values with the Simplest ML Algorithm

Linear Regression is the foundation of machine learning and statistics. It models the linear relationship between input features and a continuous output variable.

What is Linear Regression?

Linear Regression fits a straight line through data points to model the relationship between variables. Given input features (X) and target values (y), it finds the best-fit line that minimizes prediction errors.

$$y = mx + b$$

where $m$ = slope, $b$ = intercept, $x$ = input, $y$ = output

For a single house with n features:

$$\mathbf{X} = [x_1, x_2, x_3, \ldots, x_n]$$

$$Y = w_1x_1 + w_2x_2 + \cdots + w_nx_n + b$$

For M houses (dataset):

$$y_1 = w_1x_{11} + w_2x_{12} + \cdots + w_nx_{1n} + b$$

$$y_2 = w_1x_{21} + w_2x_{22} + \cdots + w_nx_{2n} + b$$

$$\vdots$$

$$y_m = w_1x_{m1} + w_2x_{m2} + \cdots + w_nx_{mn} + b$$

Matrix form:

$$\mathbf{y} = \mathbf{X}\mathbf{w} + b$$

Assumptions of Linear Regression

  • Linearity: The relationship between X and y is linear
  • Independence: Observations are independent of each other
  • Homoscedasticity: Constant variance of residuals
  • Normality: Residuals are normally distributed
  • No Multicollinearity: Predictors are not highly correlated

How Linear Regression Works

Method of Least Squares

$$\mathbf{w} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$$

import numpy as np

def fit(X, y):
    X = np.insert(X, 0, 1, axis=1)
    XT_X = np.dot(X.T, X)
    XT_X_inv = np.linalg.inv(XT_X)
    XT_y = np.dot(X.T, y)
    betas = np.dot(XT_X_inv, XT_y)
    return betas[0], betas[1:]

Gradient Descent Method

Cost Function (Mean Squared Error):

$$E = \frac{1}{2}\sum_{i=1}^{m}(\text{Å·}_i - y_i)^2$$

Deriving the Gradient:

Start with the cost function:

$$\frac{\partial E}{\partial w_i} = \frac{\partial}{\partial w_i}\left[\frac{1}{2}(\text{Å·} - y_i)^2\right]$$

Apply the chain rule:

$$\frac{\partial E}{\partial w_i} = (\text{Å·} - y_i) \cdot \frac{\partial \text{Å·}}{\partial w_i}$$

Since $\text{Å·} = w_1x_1 + w_2x_2 + \cdots + w_ix_i + \cdots + w_nx_n + b$, only $w_i$ affects the derivative:

$$\frac{\partial E}{\partial w_i} = (\text{Å·} - y_i) \cdot x_i$$

Similarly, for the bias term:

$$\frac{\partial E}{\partial b} = (\text{Å·} - y_i)$$

Weight and Bias Updates:

$$w_i^{new} = w_i^{old} - \alpha(\text{Å·} - y_i)x_i$$

$$b^{new} = b^{old} - \alpha(\text{Å·} - y_i)$$

import numpy as np

def fit(X, y, epochs, learning_rate):
    n_samples, n_features = X.shape
    X_aug = np.insert(X, 0, 1, axis=1)
    weights = np.random.randn(n_features + 1) * 0.01

    for epoch in range(epochs):
        for i in range(n_samples):
            y_pred = np.dot(X_aug[i], weights)
            error = y_pred - y[i]
            gradient = error * X_aug[i]
            weights -= learning_rate * gradient
    return weights

Implementation with scikit-learn

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Key Metrics

  • R² Score: Proportion of variance explained (0-1)
  • RMSE: Root Mean Squared Error
  • MAE: Mean Absolute Error

Advantages & Disadvantages

✓ Advantages: Fast, Efficient, Interpretable

✗ Disadvantages: Sensitive to outliers, Assumes linearity

Ready to explore other algorithms? Check out Logistic Regression for classification tasks!