Linear Regression
📈 Predicting Continuous Values with the Simplest ML Algorithm
Linear Regression is the foundation of machine learning and statistics. It models the linear relationship between input features and a continuous output variable.
What is Linear Regression?
Linear Regression fits a straight line through data points to model the relationship between variables. Given input features (X) and target values (y), it finds the best-fit line that minimizes prediction errors.
$$y = mx + b$$
where $m$ = slope, $b$ = intercept, $x$ = input, $y$ = output
For a single house with n features:
$$\mathbf{X} = [x_1, x_2, x_3, \ldots, x_n]$$
$$Y = w_1x_1 + w_2x_2 + \cdots + w_nx_n + b$$
For M houses (dataset):
$$y_1 = w_1x_{11} + w_2x_{12} + \cdots + w_nx_{1n} + b$$
$$y_2 = w_1x_{21} + w_2x_{22} + \cdots + w_nx_{2n} + b$$
$$\vdots$$
$$y_m = w_1x_{m1} + w_2x_{m2} + \cdots + w_nx_{mn} + b$$
Matrix form:
$$\mathbf{y} = \mathbf{X}\mathbf{w} + b$$
Assumptions of Linear Regression
- Linearity: The relationship between X and y is linear
- Independence: Observations are independent of each other
- Homoscedasticity: Constant variance of residuals
- Normality: Residuals are normally distributed
- No Multicollinearity: Predictors are not highly correlated
How Linear Regression Works
Method of Least Squares
$$\mathbf{w} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$$
import numpy as np
def fit(X, y):
X = np.insert(X, 0, 1, axis=1)
XT_X = np.dot(X.T, X)
XT_X_inv = np.linalg.inv(XT_X)
XT_y = np.dot(X.T, y)
betas = np.dot(XT_X_inv, XT_y)
return betas[0], betas[1:]
Gradient Descent Method
Cost Function (Mean Squared Error):
$$E = \frac{1}{2}\sum_{i=1}^{m}(\text{Å·}_i - y_i)^2$$
Deriving the Gradient:
Start with the cost function:
$$\frac{\partial E}{\partial w_i} = \frac{\partial}{\partial w_i}\left[\frac{1}{2}(\text{Å·} - y_i)^2\right]$$
Apply the chain rule:
$$\frac{\partial E}{\partial w_i} = (\text{Å·} - y_i) \cdot \frac{\partial \text{Å·}}{\partial w_i}$$
Since $\text{Å·} = w_1x_1 + w_2x_2 + \cdots + w_ix_i + \cdots + w_nx_n + b$, only $w_i$ affects the derivative:
$$\frac{\partial E}{\partial w_i} = (\text{Å·} - y_i) \cdot x_i$$
Similarly, for the bias term:
$$\frac{\partial E}{\partial b} = (\text{Å·} - y_i)$$
Weight and Bias Updates:
$$w_i^{new} = w_i^{old} - \alpha(\text{Å·} - y_i)x_i$$
$$b^{new} = b^{old} - \alpha(\text{Å·} - y_i)$$
import numpy as np
def fit(X, y, epochs, learning_rate):
n_samples, n_features = X.shape
X_aug = np.insert(X, 0, 1, axis=1)
weights = np.random.randn(n_features + 1) * 0.01
for epoch in range(epochs):
for i in range(n_samples):
y_pred = np.dot(X_aug[i], weights)
error = y_pred - y[i]
gradient = error * X_aug[i]
weights -= learning_rate * gradient
return weights
Implementation with scikit-learn
from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X_train, y_train) y_pred = model.predict(X_test)
Key Metrics
- R² Score: Proportion of variance explained (0-1)
- RMSE: Root Mean Squared Error
- MAE: Mean Absolute Error
Advantages & Disadvantages
✓ Advantages: Fast, Efficient, Interpretable
✗ Disadvantages: Sensitive to outliers, Assumes linearity
Ready to explore other algorithms? Check out Logistic Regression for classification tasks!