Building a Simple Linear Regression Model in Python from Scratch



Linear Regression is a foundational algorithm in Machine Learning, used to predict a continuous target variable based on one or more predictor variables. While libraries like scikit-learn provide convenient implementations, understanding the underlying mechanics by building a model from scratch is invaluable. This guide will walk you through a step-by-step implementation of Simple Linear Regression in Python, focusing on the mathematical principles involved.

1. Understanding the Mathematical Foundation

Simple Linear Regression aims to find the best-fitting line that describes the relationship between a single predictor variable (X) and a target variable (Y). The equation of this line is:

Y = β₀ + β₁X + ε

Where:

  • Y is the predicted value of the target variable.
  • X is the predictor variable.
  • β₀ is the y-intercept (the value of Y when X is 0).
  • β₁ is the slope (the change in Y for a one-unit change in X).
  • ε is the error term (representing the difference between the predicted and actual values).

Our goal is to find the optimal values for β₀ and β₁ that minimize the sum of squared errors (SSE) between the predicted and actual values.

2. Implementing Linear Regression from Scratch

Let's implement the algorithm in Python. We'll start by defining a function to calculate the coefficients (β₀ and β₁).

      import numpy as np

def linear_regression(X, Y):
    """
    Calculates the coefficients (beta_0, beta_1) for simple linear regression.

    Args:
        X: NumPy array of predictor variables.
        Y: NumPy array of target variables.

    Returns:
        A tuple containing beta_0 (intercept) and beta_1 (slope).
    """

    n = len(X)

    # Calculate the mean of X and Y
    X_mean = np.mean(X)
    Y_mean = np.mean(Y)

    # Calculate the numerator and denominator for beta_1
    numerator = np.sum((X - X_mean) * (Y - Y_mean))
    denominator = np.sum((X - X_mean)**2)

    # Calculate beta_1 (slope)
    beta_1 = numerator / denominator

    # Calculate beta_0 (intercept)
    beta_0 = Y_mean - beta_1 * X_mean

    return beta_0, beta_1
    

3. Making Predictions

Now that we have the coefficients, we can use them to make predictions.

      def predict(X, beta_0, beta_1):
    """
    Predicts the target variable Y for given predictor variables X.

    Args:
        X: NumPy array of predictor variables.
        beta_0: The y-intercept.
        beta_1: The slope.

    Returns:
        A NumPy array of predicted values.
    """
    return beta_0 + beta_1 * X
    

4. Example Usage

Let's test our implementation with some sample data.

      # Sample data
X = np.array([1, 2, 3, 4, 5])
Y = np.array([2, 4, 5, 4, 5])

# Calculate the coefficients
beta_0, beta_1 = linear_regression(X, Y)

print(f"Intercept (beta_0): {beta_0}")
print(f"Slope (beta_1): {beta_1}")

# Make predictions
X_test = np.array([6, 7, 8])
Y_pred = predict(X_test, beta_0, beta_1)

print(f"Predictions for X_test: {Y_pred}")
    

5. Evaluating the Model (Optional)

To assess the performance of our model, we can calculate metrics like the Mean Squared Error (MSE).

      def mean_squared_error(Y_true, Y_pred):
    """
    Calculates the Mean Squared Error (MSE).

    Args:
        Y_true: NumPy array of actual values.
        Y_pred: NumPy array of predicted values.

    Returns:
        The MSE value.
    """
    return np.mean((Y_true - Y_pred)**2)

# Calculate MSE
Y_pred = predict(X, beta_0, beta_1)
mse = mean_squared_error(Y, Y_pred)
print(f"Mean Squared Error: {mse}")
    

Conclusion

By building a Simple Linear Regression model from scratch, you've gained a deeper understanding of the underlying mathematical principles and the implementation details. While libraries like scikit-learn offer more advanced features and optimizations, this exercise provides a solid foundation for exploring more complex machine learning algorithms. Remember to experiment with different datasets and explore the impact of various parameters on the model's performance.

Key Takeaways:

  • Understanding the mathematical foundation is crucial.
  • Building from scratch reinforces learning.
  • This provides a stepping stone to more complex models.

Comments