I agree. This is true some of the time. But sometimes, the imbalance class can make the f1 score higher than how it should be. For example, I was dealing with a classification problem (0 or 1). I got a f1-score binary of 81.01%. But the ROCAUC curves and the confusion matrix doesn't show that my result is as good as it seems: https://datapane.com/sygnals/reports/analyze_Coal_XGBClassifier/.

My f1 score macro is .405 though. So you might say I should use the f1 score macro instead. But I think it is still good to double check on the graph to understand your results and make sure the metric you are using is a good metric

Written by

Data scientist. I share a little bit of goodness every day through articles and daily data science tips: https://mathdatasimplified.com/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store