{"id":1739,"date":"2025-07-28T12:25:17","date_gmt":"2025-07-28T12:25:17","guid":{"rendered":"https:\/\/www.cmarix.com\/qanda\/?p=1739"},"modified":"2026-02-05T12:00:28","modified_gmt":"2026-02-05T12:00:28","slug":"ml-model-evaluation-metrics","status":"publish","type":"post","link":"https:\/\/www.cmarix.com\/qanda\/ml-model-evaluation-metrics\/","title":{"rendered":"How do Confusion Matrix, Precision, Recall, and F1-score Help Evaluate ML Model Performance?"},"content":{"rendered":"\n<p>It is important to evaluate machine learning models to make sure it performs as intended, and to identify classification problems. The most commonly used metrics for such evaluation include: Confusion Matrix, Precision Recall, and F1-score.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Explanation of Key Concepts<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Confusion Matrix<\/h3>\n\n\n\n<p>This matrix is a tabular representation of the number of correct and incorrect predictions. This helps visualize performance as:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Confusion Matrix<\/strong><\/td><td><strong>Predicted Positive<\/strong><strong><\/strong><\/td><td><strong>Predicted Negative<\/strong><\/td><\/tr><tr><td><br><strong>Actual Positive<\/strong><\/td><td>True Positive (TP)<\/td><td>False Negative (FN)<\/td><\/tr><tr><td><strong>Actual Negative<\/strong><\/td><td>False Positive (FP)<\/td><td>True Negative (TN)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Precision<\/h3>\n\n\n\n<p>Precision = TP \/ (TP + FP)<\/p>\n\n\n\n<p>Precision tells you how many of the positively predicted instances are actually positive.<\/p>\n\n\n\n<p>Useful when false positives are costly (e.g., spam filters).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Recall (Sensitivity)<\/h3>\n\n\n\n<p>Recall = TP \/ (TP + FN)<\/p>\n\n\n\n<p>Recall shows the number of actual positives that were correctly predicted.<\/p>\n\n\n\n<p>Important when false negatives are costly (e.g., cancer detection).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">F1-Score<\/h3>\n\n\n\n<p>F1 = 2 \u00d7 (Precision \u00d7 Recall) \/ (Precision + Recall)<\/p>\n\n\n\n<p>The F1-score checks the harmonic mean of precision and recall.<\/p>\n\n\n\n<p>Best used when you need to balance precision and recall.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Steps to Evaluate Model Performance<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Train your classification model (e.g., logistic regression, random forest).<\/li>\n\n\n\n<li>Make predictions using test data.<\/li>\n\n\n\n<li>Calculate confusion matrix to understand TP, FP, FN, TN.<\/li>\n\n\n\n<li>Compute precision, recall, and F1-score from the confusion matrix.<\/li>\n\n\n\n<li>Analyze results and tune your model if needed.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Code Example (Binary Classification)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Use Case: Breast Cancer Detection<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>from sklearn.datasets import load_breast_cancer\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.metrics import confusion_matrix, classification_report, ConfusionMatrixDisplay\nimport matplotlib.pyplot as plt\n# Load dataset\ndata = load_breast_cancer()\nX, y = data.data, data.target  # 0 = malignant, 1 = benign\n# Split data\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n# Train classifier\nmodel = RandomForestClassifier()\nmodel.fit(X_train, y_train)\n# Predict\ny_pred = model.predict(X_test)\n# Confusion Matrix\ncm = confusion_matrix(y_test, y_pred)\nprint(\"Confusion Matrix:\\n\", cm)\n# Display Confusion Matrix\ndisp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=data.target_names)\ndisp.plot()\nplt.show()\n# Precision, Recall, F1-score\nreport = classification_report(y_test, y_pred, target_names=data.target_names)\nprint(\"Classification Report:\\n\", report)<\/code><\/pre>\n\n\n\n<p><strong>Sample Output:<\/strong><\/p>\n\n\n\n<p><strong>Confusion Matrix:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;&#91;39  3]\n&#91; 2 70]]<\/code><\/pre>\n\n\n\n<p><strong>Classification Report:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>precision    recall  f1-score   support\nmalignant       0.95      0.93      0.94        42\nbenign       0.96      0.97      0.96        72\naccuracy                           0.95       114\nmacro avg       0.95      0.95      0.95       114\nweighted avg       0.95      0.95      0.95       114<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The confusion matrix helps visualize how well your classifier is performing.<\/li>\n\n\n\n<li>Precision is crucial when false positives are risky (e.g., spam filters).<\/li>\n\n\n\n<li>Recall is essential when missing positive cases is risky (e.g., medical diagnosis).<\/li>\n\n\n\n<li>F1-score gives a balanced view of model performance, especially on imbalanced datasets.<\/li>\n<\/ul>\n\n\n\n<p>Together, these metrics provide a comprehensive evaluation of your machine learning model\u2019s effectiveness.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It is important to evaluate machine learning models to make sure it performs as intended, and to identify classification problems. The most commonly used metrics for such evaluation include: Confusion Matrix, Precision Recall, and F1-score. Explanation of Key Concepts Confusion Matrix This matrix is a tabular representation of the number of correct and incorrect predictions. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":1741,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[156,160],"tags":[],"class_list":["post-1739","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-ai-ml"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.cmarix.com\/qanda\/wp-json\/wp\/v2\/posts\/1739","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.cmarix.com\/qanda\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.cmarix.com\/qanda\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.cmarix.com\/qanda\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cmarix.com\/qanda\/wp-json\/wp\/v2\/comments?post=1739"}],"version-history":[{"count":7,"href":"https:\/\/www.cmarix.com\/qanda\/wp-json\/wp\/v2\/posts\/1739\/revisions"}],"predecessor-version":[{"id":1748,"href":"https:\/\/www.cmarix.com\/qanda\/wp-json\/wp\/v2\/posts\/1739\/revisions\/1748"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.cmarix.com\/qanda\/wp-json\/wp\/v2\/media\/1741"}],"wp:attachment":[{"href":"https:\/\/www.cmarix.com\/qanda\/wp-json\/wp\/v2\/media?parent=1739"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.cmarix.com\/qanda\/wp-json\/wp\/v2\/categories?post=1739"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.cmarix.com\/qanda\/wp-json\/wp\/v2\/tags?post=1739"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}