Boosted decision trees in the era of new physics: a smuon analysis case study
A. Cornell et al.
Machine learning algorithms are growing increasingly popular in particle physics analyses, where they are used for their ability to solve difficult classification and regression problems. While the tools are very powerful, they may often be under- or mis-utilised. In the following, we investigate the use of gradient boosting techniques as applicable to a generic particle physics problem. We use as an example a Beyond the Standard Model smuon collider analysis which applies to both current and future hadron colliders, and we compare our results to a traditional cut-and-count approach. In particular, we interrogate the use of metrics in imbalanced datasets which are characteristic of high energy physics problems, offering an alternative to the widely used area under the curve (auc) metric through a novel use of the F-score metric. We present an in-depth comparison of feature selection and investigation using a principal component analysis, Shapley values, and feature permutation methods in a way which we hope will be widely applicable to future particle physics analyses. Moreover, we show that a machine learning model can extend the 95% confidence level exclusions obtained in a traditional cut-and-count analysis, while potentially bypassing the need for complicated feature selections. Finally, we discuss the possibility of constructing a general machine learning model which is applicable to probe a two-dimensional mass plane.