Friday, 14 April 2023

  • Mind of Machines Series : Foundations of Machine Learning: Understanding Regression

    14 Apr 2023 - Raviteja Gullapalli



    Foundations of Machine Learning: Understanding Regression

    Regression is one of the foundational concepts in machine learning. It helps in predicting a continuous target variable based on input features. In this article, we'll explore the basics of regression, specifically linear and logistic regression, and understand how these models form the basis for more advanced machine learning algorithms.

    What is Regression?

    At its core, regression is a method for modeling the relationship between a dependent variable (also called the target or outcome) and one or more independent variables (features or predictors). In simpler terms, regression allows us to make predictions based on the relationship we discover between the input and the output.

    Linear Regression

    Linear regression models a linear relationship between the input variables and the output. It assumes that changes in the input variables (independent) will cause proportional changes in the target variable (dependent). The equation for simple linear regression is:

    y = β0 + β1x + ε
    

    Where:

    • y is the predicted output
    • β0 is the intercept (the value of y when x = 0)
    • β1 is the slope (how much y changes for each unit change in x)
    • ε is the error term (the difference between predicted and actual values)

    Let’s look at an example using Python's scikit-learn library to implement linear regression.

    Example: Linear Regression in Python

    # Importing libraries
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn.linear_model import LinearRegression
    from sklearn.model_selection import train_test_split
    
    # Generating some synthetic data
    np.random.seed(0)
    X = 2 * np.random.rand(100, 1)
    y = 4 + 3 * X + np.random.randn(100, 1)
    
    # Splitting the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
    
    # Creating the model and training it
    lin_reg = LinearRegression()
    lin_reg.fit(X_train, y_train)
    
    # Making predictions
    y_pred = lin_reg.predict(X_test)
    
    # Visualizing the results
    plt.scatter(X_test, y_test, color="blue")
    plt.plot(X_test, y_pred, color="red")
    plt.title("Linear Regression Example")
    plt.xlabel("X")
    plt.ylabel("y")
    plt.show()
    
    # Display the learned parameters
    print(f"Intercept: {lin_reg.intercept_}")
    print(f"Coefficient: {lin_reg.coef_}")
    

    This code demonstrates a simple linear regression model where we generate synthetic data, train the model, and then visualize the predicted values against the actual values. The red line in the plot represents the regression line, while the blue dots are the actual data points.

    Logistic Regression

    Logistic regression, despite its name, is used for binary classification rather than regression. It predicts the probability that a given input belongs to a specific class (e.g., "Yes" or "No", "Spam" or "Not Spam"). The output of logistic regression is a probability value between 0 and 1, which can be converted into class labels.

    The logistic function (or sigmoid function) is used to model the probability:

    P(y=1 | x) = 1 / (1 + e^-(β0 + β1x))
    

    Where e is the base of the natural logarithm, and the equation inside the exponent is a linear combination of input features. The output gives us the probability of the event happening (class 1), and 1 minus this gives the probability of class 0.

    Example: Logistic Regression in Python

    # Importing libraries
    from sklearn.linear_model import LogisticRegression
    from sklearn.datasets import make_classification
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import accuracy_score
    
    # Generating synthetic binary classification data
    X, y = make_classification(n_samples=1000, n_features=2, n_classes=2, random_state=42)
    
    # Splitting data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Creating the model and training it
    log_reg = LogisticRegression()
    log_reg.fit(X_train, y_train)
    
    # Making predictions
    y_pred = log_reg.predict(X_test)
    
    # Calculating accuracy
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy:.2f}")
    

    In this example, we use logistic regression to classify data into two classes. The accuracy_score function gives us the percentage of correct predictions. This is a basic demonstration of how logistic regression can be used for binary classification tasks.

    Key Concepts: Hypothesis Testing in Regression

    In both linear and logistic regression, understanding the significance of the input variables is crucial. This is where hypothesis testing comes into play. The key hypothesis tests in regression include:

    • Null Hypothesis (H0): Assumes that there is no relationship between the independent and dependent variables (the coefficient of the feature is 0).
    • Alternative Hypothesis (H1): Assumes that there is a significant relationship between the independent and dependent variables (the coefficient is not 0).

    A p-value is calculated for each feature to determine if we can reject the null hypothesis. A low p-value (typically < 0.05) suggests that the feature significantly contributes to predicting the target.

    Conclusion

    Regression, both linear and logistic, is a fundamental concept in machine learning. It serves as the building block for more complex algorithms and helps us understand the relationships within data. Mastering these basics is essential for tackling advanced topics in AI and machine learning.

  • 0 comments:

    Post a Comment

    Hey, you can share your views here!!!

    Have something for me?

    Let us have a chat, schedule a 30 min meeting with me. I am looking forward to hear from you.

    * indicates required
    / ( mm / dd )