14 Apr 2023 - Raviteja Gullapalli

Foundations of Machine Learning: Understanding Regression

Regression is one of the foundational concepts in machine learning. It helps in predicting a continuous target variable based on input features. In this article, we'll explore the basics of regression, specifically linear and logistic regression, and understand how these models form the basis for more advanced machine learning algorithms.

What is Regression?

At its core, regression is a method for modeling the relationship between a dependent variable (also called the target or outcome) and one or more independent variables (features or predictors). In simpler terms, regression allows us to make predictions based on the relationship we discover between the input and the output.

Linear Regression

Linear regression models a linear relationship between the input variables and the output. It assumes that changes in the input variables (independent) will cause proportional changes in the target variable (dependent). The equation for simple linear regression is:

y = β0 + β1x + ε

Where:

y is the predicted output
β0 is the intercept (the value of y when x = 0)
β1 is the slope (how much y changes for each unit change in x)
ε is the error term (the difference between predicted and actual values)

Let’s look at an example using Python's scikit-learn library to implement linear regression.

Example: Linear Regression in Python

# Importing libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Generating some synthetic data
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Creating the model and training it
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)

# Making predictions
y_pred = lin_reg.predict(X_test)

# Visualizing the results
plt.scatter(X_test, y_test, color="blue")
plt.plot(X_test, y_pred, color="red")
plt.title("Linear Regression Example")
plt.xlabel("X")
plt.ylabel("y")
plt.show()

# Display the learned parameters
print(f"Intercept: {lin_reg.intercept_}")
print(f"Coefficient: {lin_reg.coef_}")

This code demonstrates a simple linear regression model where we generate synthetic data, train the model, and then visualize the predicted values against the actual values. The red line in the plot represents the regression line, while the blue dots are the actual data points.

Logistic Regression

Logistic regression, despite its name, is used for binary classification rather than regression. It predicts the probability that a given input belongs to a specific class (e.g., "Yes" or "No", "Spam" or "Not Spam"). The output of logistic regression is a probability value between 0 and 1, which can be converted into class labels.

The logistic function (or sigmoid function) is used to model the probability:

P(y=1 | x) = 1 / (1 + e^-(β0 + β1x))

Where e is the base of the natural logarithm, and the equation inside the exponent is a linear combination of input features. The output gives us the probability of the event happening (class 1), and 1 minus this gives the probability of class 0.

Example: Logistic Regression in Python

# Importing libraries
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generating synthetic binary classification data
X, y = make_classification(n_samples=1000, n_features=2, n_classes=2, random_state=42)

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating the model and training it
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

# Making predictions
y_pred = log_reg.predict(X_test)

# Calculating accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

In this example, we use logistic regression to classify data into two classes. The accuracy_score function gives us the percentage of correct predictions. This is a basic demonstration of how logistic regression can be used for binary classification tasks.

Key Concepts: Hypothesis Testing in Regression

In both linear and logistic regression, understanding the significance of the input variables is crucial. This is where hypothesis testing comes into play. The key hypothesis tests in regression include:

Null Hypothesis (H0): Assumes that there is no relationship between the independent and dependent variables (the coefficient of the feature is 0).
Alternative Hypothesis (H1): Assumes that there is a significant relationship between the independent and dependent variables (the coefficient is not 0).

A p-value is calculated for each feature to determine if we can reject the null hypothesis. A low p-value (typically < 0.05) suggests that the feature significantly contributes to predicting the target.

Conclusion

Regression, both linear and logistic, is a fundamental concept in machine learning. It serves as the building block for more complex algorithms and helps us understand the relationships within data. Mastering these basics is essential for tackling advanced topics in AI and machine learning.

Raviteja Gullapalli

Friday, 14 April 2023

Mind of Machines Series : Foundations of Machine Learning: Understanding Regression

Foundations of Machine Learning: Understanding Regression

What is Regression?

Linear Regression

Example: Linear Regression in Python

Logistic Regression

Example: Logistic Regression in Python

Key Concepts: Hypothesis Testing in Regression

Conclusion

0 comments:

Post a Comment

Search

Popular posts

categories

Have something for me?

SAY HELLO TO ME