In supervised machine learning, tasks generally fall into one of two categories: classification or regression. While both involve learning from labeled data, the nature of the prediction they produce is fundamentally different. Classification models predict discrete categories, such as whether an email is spam or not, while regression models predict continuous numerical values, like the price of a house.
Understanding the key differences between them is key to selecting the right algorithm and evaluation method for your specific problem. Here’s a breakdown of both approaches with real-world examples and Python code.
Key Differences Between Classification vs Regression Tabular Comparison in Machine Learning Models:
Aspect | Classification | Regression |
Output type | Categories/labels | Continuous values (real numbers) |
Goal | Predict a class (discrete output) | Predict a numerical value (continuous) |
Examples | Spam or not spam, disease detection | Predicting house price, stock price |
Algorithms | Logistic Regression, Decision Trees, SVM | Linear Regression, Random Forest Regressor |
Loss Function | Cross-entropy, Log loss | Mean Squared Error, Mean Absolute Error |
1. Classification – Full Explanation + Code
Classification involves predicting a specific category or label from a set of predefined classes. The output is discrete for example, determining whether an email is spam or not, or assigning a label such as 0 or 1.
Example Use Case: Classify if an email is spam or not
Python Code (Binary Classification)
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target # target is 0 (malignant) or 1 (benign)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train classifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Accuracy
print("Classification Accuracy:", accuracy_score(y_test, y_pred))
Output:
Classification Accuracy: 0.9561 (example)
2. Regression – Full Explanation + Code
Regression is the process of predicting a continuous, numeric value based on input features. For instance, estimating the price of a house such as $123,456 based on its size, location, and other attributes.
Example Use Case: Predict house prices
Python Code (Regression)
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Load dataset
data = fetch_california_housing()
X, y = data.data, data.target # target is median house value in 100,000s
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train regressor
model = LinearRegression()
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Evaluate
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
Output:
Mean Squared Error: 0.53 (example)