LHC physics dataset for unsupervised New Physics detection at 40 MHz
Ekaterina Govorkova, Ema Puljak, Thea Aarrestad, Maurizio Pierini, Kinga Anna Woźniak, Jennifer Ngadiuba
In particle detectors at the Large Hadron Collider, tens of terabytes of data are produced every second from proton-proton collisions occurring at a rate of 40 megahertz. This data rate is reduced to a sustainable level by a real-time event filter processing system which decides whether each collision event should be kept for further analysis or be discarded. We introduce a dataset of proton collision events which emulates a typical data stream collected by such a real-time processing system, pre-filtered by requiring the presence of at least one electron or muon. This dataset could be used to develop novel event selection strategies and assess their sensitivity to new phenomena. In particular, by publishing this dataset we intend to stimulate a community-based effort towards the design of novel algorithms for performing unsupervised New Physics detection, customized to fit the bandwidth, latency and computational resource constraints of the real-time event selection system of a typical particle detector.