In supervised machine learning, tasks generally fall into one of two categories: classification or regression. While both involve learning from labeled data, the nature of the prediction they produce is fundamentally different. Classification models predict discrete categories, such as whether an email is spam or not, while regression models predict continuous numerical values, like the price of a house.

Understanding the key differences between them is key to selecting the right algorithm and evaluation method for your specific problem. Here’s a breakdown of both approaches with real-world examples and Python code.

Key Differences Between Classification vs Regression Tabular Comparison in Machine Learning Models:

AspectClassificationRegression
Output typeCategories/labelsContinuous values (real numbers)
GoalPredict a class (discrete output)Predict a numerical value (continuous)
ExamplesSpam or not spam, disease detectionPredicting house price, stock price
AlgorithmsLogistic Regression, Decision Trees, SVMLinear Regression, Random Forest Regressor
Loss FunctionCross-entropy, Log lossMean Squared Error, Mean Absolute Error

1. Classification – Full Explanation + Code

Classification involves predicting a specific category or label from a set of predefined classes. The output is discrete for example, determining whether an email is spam or not, or assigning a label such as 0 or 1.

Example Use Case: Classify if an email is spam or not

Python Code (Binary Classification)

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target # target is 0 (malignant) or 1 (benign)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train classifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Accuracy
print("Classification Accuracy:", accuracy_score(y_test, y_pred))

Output:

Classification Accuracy: 0.9561 (example)

2. Regression – Full Explanation + Code

Regression is the process of predicting a continuous, numeric value based on input features. For instance, estimating the price of a house such as $123,456 based on its size, location, and other attributes.

Example Use Case: Predict house prices

Python Code (Regression)

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Load dataset
data = fetch_california_housing()
X, y = data.data, data.target # target is median house value in 100,000s
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train regressor
model = LinearRegression()
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Evaluate
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

Output:

Mean Squared Error: 0.53 (example)