Artificial Intelligence Motifs: Model Evaluation

Predictive Model Accuracy for Low-frequency Events

There model evaluation theory and practice can be vastly improved in many application contexts.

While modeling low-frequency events is undoubtedly possible, model evaluation is challenging. A rare false positive or false negative prediction might be disastrous, but keep the model at close to 100% accuracy. Depending on the context, false positives might be acceptable. In contrast, false negatives are not. For example, a model saying that a machine requires service while operating properly in predictive maintenance applications is not a disaster (false negative). On the other hand, imagine the consequences of an overly confident model, producing false positives in aerospace or military technology contexts where many lives depend on the model.

These discussions are sometimes called "yes-means-yes," i.e., we don't want false positives. But, conversely, "no-means-no" tests mean that we want to ensure that there are no false negatives.

Each evaluation method can offer valuable insight, especially in combination (precision-recall, #ROC, #AUC, and #probabilistic extensions for error understanding). Often the evaluation methods lack the degree of direct

interpretability when compared to threshold-based counterparts.

The #harmonicmean, used in F1 calculation, penalizes unequal and shallow precision and recall values.

The F1 score seems well-suited for the discussion since it is a harmonic average of precision and recall.

Model evaluation metrics and many insightful examples can be found in this insightful publication by Zeya LT. #modeling #AI #ML #F1 #precision #recall #modelevaluation #evaluationmetrics

https://towardsdatascience.com/essential-things-you-need-to-know-about-f1-score-dbd973bf1a3

Nov 12, 2022

Model Evaluation

No comments:

Post a Comment