First page Back Continue Last page Overview Image

Expt(3)

arXiv:2011.08280

ATLAS Collab.

Measurements of Higgs Bosons Decaying to Bottom Quarks from Vector Boson Fusion Production with the ATLAS Experiment at โˆšs= 13 TeV

The sensitivity of the analysis is boosted by using an ANN to divide events into regions of varying signalto background composition. This type of multivariate classifier can be constructed such that the network is penalized for learning a feature of the dataset [47, 48]. Because the Higgs boson signal is extracted from a fit to ๐‘š๐‘๐‘, the ANN used in this analysis is penalized for learning the ๐‘š๐‘๐‘ distribution. Using this construction, the non-resonant background shape is independent of the classifier score and therefore the same for each region, which reduces the number of free parameters in the fit.

The ANN consists of a classifier and an adversary. The classifierโ€™s role is to determine if the event is signal- or background-like. The adversaryโ€™s role is to determine the value of ๐‘š๐‘๐‘ in terms of a binned ๐‘š๐‘๐‘ distribution. Then the two are combined such that the overall network discriminates between signal and background but is penalized if the ๐‘š๐‘๐‘ value is learned, i.e. if it can accurately determine the ๐‘š๐‘๐‘ bin. To achieve this, a three-step training procedure is used. First the classifier is pre-trained with binary cross-entropy loss, while keeping the adversary parameters frozen. Next, the adversary is pre-trained with categorical cross-entropy loss, keeping the classifier parameters fixed. Third, the classifier and adversary are trained together with a combined loss function,๐ฟ. The combined training proceeds in two sub-steps fore ach epoch. First the classifier is trained with ๐ฟ=๐ฟclassifier -๐œ†๐ฟadversary, keeping the adversary weights frozen, and then the adversary is trained with loss function ๐ฟ=๐ฟadversary. The configurable parameter ๐œ† controls to what extent the adversary impacts the overall loss function, i.e. how much the network is penalized for learning ๐‘š๐‘๐‘. As a post-training step, the classifier scores are scaled to quantiles of the signal MC distribution with the output values ranging from 0 to 1.