I agree. This is true some of the time. But sometimes, the imbalance class can make the f1 score higher than how it should be. For example, I was dealing with a classification problem (0 or 1). I got a f1-score binary of 81.01%. But the ROCAUC curves and the confusion matrix doesn't show that my result is as good as it seems: https://datapane.com/sygnals/reports/analyze_Coal_XGBClassifier/.
My f1 score macro is .405 though. So you might say I should use the f1 score macro instead. But I think it is still good to double check on the graph to understand your results and make sure the metric you are using is a good metric