# Method¶

Here the considerations taken in combining the datasets and defining the test are outlined

## Toolchain¶

The software tool chain used in [110]. Other generators and interfaces may also be substituted for e.g. Feynrules or Herwig7.

Contur exploits three important developments to survey existing measurements and set limits on new physics.

- SM predictions for differential and exclusive, or semi-exclusive, final states are made using sophisticated calculational software, often embedded in Monte Carlo generators capable of simulating full, realistic final states [101]. These generators now incorporate matrix-elements for higher-order processes matched to logarithmic parton showers, and successful models of soft physics such as hadronisation and the underlying event. They are also capable of importing new physics models into this framework, thus allowing the rapid prediction of their impact on a wide variety of final states simultaneously. In this paper we make extensive use of these capabilities within Herwig 7 [93][81].
- As the search for many of the favoured BSM scenarios has been unsuccessful, there has been a move toward “simplified models” of new physics [66][52], which aim to be as generic as possible and which provide a framework for interpreting BSM signatures with a minimal amount of new particles, interactions and model assumptions. The philosophy is similar to an “effective lagrangian” approach in which effective anomalous couplings are introduced to describe new physics, but is more powerful, as such simplified models also include new particles, and thus can remain useful up to and beyond the scale of new physics — a region potentially probed by LHC measurements.
- The precision measurements from the LHC have mostly been made in a manner which minimises their model-dependence. That is, they are defined in terms of final-state signatures in fiducial regions well-matched to the acceptance of the detector. Many such measurements are readily available for analysis and comparison in the library [100].

These three developments together make it possible to efficiently bring the power of a very wide range of data to bear on the search for new physics. While such a generic approach is unlikely to compete in terms of speed and sensitivity with a search optimised for a specific theory, the breadth of potential signatures and models which can be covered makes it a powerful complementary approach. [1] On the one hand, any theory seeking to explain a new signature or anomaly in the data may predict a BSM signal in other final states, which should be checked against data this way. On the other hand, if no BSM physics emerges, a model-independent and systematic approach becomes mandatory to exclude new physics models or narrow down the corresponding model parameter space.

[1] | Limits from existing searches can sometimes be applied to new models, for example by accessing archived versions of the original analysis code and detector simulation via the RECAST [146] project, or by independent implementations of experimental searches, see, for example, Refs. [141][166][210][228][84]. |

## Strategy¶

The current approach considers BSM models in the light of existing measurements which have already been shown to agree with SM expectations. Thus this is inherently an exercise in limit-setting rather than discovery. The assumption is that a generic, measurement-based approach such as this will not be competitive in terms of sensitivity, or speed of discovery, with a dedicated search for a specific BSM final-state signature. However, it will have the advantage of breadth of coverage, and will make a valuable contribution to physics at the energy frontier whether or not new signatures are discovered at the LHC.

In the case of a new discovery, many models will be put forward to explain the data (as was seen for example [181] after the 750 GeV diphoton anomaly reported by ATLAS and CMS at the end of 2015 and start of 2016 [140][139]). Checking these models for consistency with existing measurements will be vital for unravelling whatever the data might be telling us. Models designed to explain one signature may have somewhat unexpected consequences in different final states, some of which have already been precisely measured.

If it should turn out that no BSM signatures are in the end confirmed at the LHC, Contur offers potentially the broadest and most generic constraint on new physics, and motivates the precise model-independent measurements over a wide range of final states, giving the best chance of an indirect pointer to the eventual scale of new physics.

## Dynamical data selection¶

We define a procedure to combine exclusion limits from different measured distributions. The data used for comparison in come in the form of histograms (or 2D scatter plots), which at present do not carry information about the correlations between uncertainties - although in several cases detailed information is made available in the experimental papers. There are highly correlated uncertainties in several measurements, for example on the integrated luminosity, or the energy scale of jet measurements. In some cases these are dominant. Including correlations would be a highly complex process, since as well as correlations within a single data-set, there are also common systematic uncertainties between different results, which are generally not provided by the experiments. There are also overlaps between event samples used in many different measurements, which lead to non-trivial correlations in the statistical uncertainties. To attempt to avoid spuriously high exclusion rates due to multiply-counting what might be the same exclusion against several datasets, we take the following approach:

- Divide the measurements into groups that have no overlap in the event samples used, and hence no statistical correlation between them. These measurements are grouped by, crudely, different final states, different experiments, and different beam energies (see Table [tab:Rivet]).
- Scan within each group for the most significant deviation between
BSM+SM and SM. This is done distribution-by-distribution and
bin-by-bin within distributions. Use only the most significant
deviation, and disregard the rest. Although the selection of the most
significant deviation sounds intuitively suspect, in this case it is
a conservative approach, since we are setting limits, and discarding
the less-significant bins simply reduces sensitivity. The use of a
single bin from each measurement removes the dominant effect of
highly correlated systematic uncertainties within a single
measurement. Where a number of statistically-independent measurements
exists
*within*a group, their likelihoods may be combined to give a single likelihood ratio from the group. - Combine the likelihood ratios of the different groups to give a single exclusion limit.

## Statistical Method¶

The question we wish to ask of any given BSM proposal is *‘at what
significance do existing measurements, which agree with the SM, already
exclude this’*. For all the measurements considered, comparisons to SM
calculations have shown consistency between them and the data. Thus as a
starting point, we take the data as our “null signal”, and we superpose
onto them the contribution from the BSM scenario under consideration.
The uncertainties on the data will define the allowed space for these
extra BSM contributions.

Several statistical methods are available in Contur. Since unfolded measurements generally have reasonably high statistics, a simple \(\chi^2\) method is appropriate and is used for most of these results, for speed and simplicity. However, this has been validated against the more sopisticate liklihood method described below.

Taking each bin of each distribution considered as a separate statistic to be tested, a likelihood function for each bin can be constructed as follows,

where the three factors are:

- A Poisson event count, noting that the measurements considered are
differential cross section measurements, hence the counts are
multiplied by a factor of the integrated luminosity taken from the
experimental paper behind each analysis, to convert to an event count
in each bin (and subsequently the additional events that the new
physics would have added to the measurement made). This statistic in
each tested bin then is comprised of:
- \(s\), the parameter defining the BSM signal event count.
- \(b\), the parameter defining the background event count.
- \(n\), the observed event count.
- \(\mu\), the signal strength parameter modulating the strength of the signal hypothesis tested, thus \(\mu=0\) corresponds to the background-only hypothesis and \(\mu=1\) the full signal strength hypothesis;

- A convolution with a Gaussian defining the distribution of the
background count, where the following additional components are
identified:
- \(m\), the background count. The expectation value of this count, which is used to construct the test, is taken as the central value of the measured data point.
- \(\sigma_{b}\), the uncertainty in the background event count taken, from the data, as 1 \(\sigma\) error on a Gaussian (uncertainties taken as the combination of statistical and systematics uncertainties in quadrature. Typically the systematic uncertainty dominates).

- An additional Poisson term describing the Monte Carlo error on the simulated BSM signal count with \(k\) being the actual number of generated BSM events. The expectation value of \(k\) is related to \(s\) by a factor \(\tau\), which is the ratio of the generated MC luminosity to the experimental luminosity.

This likelihood is then used to construct a test statistic based on the profile likelihood ratio, following the arguments laid out in Ref. [145]. In particular, the \(\tilde{q}_{\mu}\) test statistic is constructed. This enables the setting of a one-sided upper limit on the confidence in the strength parameter hypothesis, \(\mu\), desirable since in the situation that the observed strength parameter exceeds the tested hypothesis, agreement with the hypothesis should not diminish. In addition this construction places a lower limit on the strength parameter, where any observed fluctuations below the background-only hypothesis are said to agree with the background-only hypothesis [3]. The required information then is the sampling distribution of this test statistic. This can either be evaluated either using the so called Asimov data set to build an approximate distribution of the considered test statistic, or explicitly using multiple Monte Carlo ‘toy model’ tests [4].

The information needed to build the approximate sampling distributions is contained in the covariance matrix composed of the second derivatives with respect to the parameters (\(\mu, b\) and \(s\)), of the log of the likelihood given in equation (1). They are as follows:

Which are arranged in the inverse covariance matrix as follows.

The variance of \(\mu\) is extracted from the inverse of the matrix given in (3) as;

In order to evaluate this, the counting parameters (\(n, m\) and \(k\)) are evaluated at their Asimov values, following arguments detailed in Ref. [145]. These are taken as follows,

- \(n_{A} = E[n] = \mu' s + b\). The total count under the assumed signal strength, \(\mu'\), which for the purposes of this argument is equal to 1
- \(m_{A}=E[m] = b\). The background count is defined as following a Gaussian distribution with a mean of \(b\).
- \(k_{A} = E[k] = \tau s\). The signal count is defined following a Poisson distribution with a mean of \(\tau s\)

Using this data set the variance of the strength parameter, \(\mu\), under the assumption of a hypothesised value, \(\mu'\), can be found. This is then taken to define the distribution of the \(\tilde{q}_{\mu}\) statistic, and consequently the size of test corresponding to the observed value of the count. The size of the test can be quoted as a \(p\)-value, or equivalently the confidence level which is the inverse of the size of the test. As is convention in the particle physics community, the final measure of statistical agreement is presented in terms of what is known as the CL\(_{s}\) method [196][234]. Then, for a given distribution, CL\(_{s}\) can be evaluated separately for each bin, where the bin with the largest CL\(_{s}\) value (and correspondingly smallest \(p_{s+b}\) value) is taken to represent the sensitivity measure used to evaluate each distribution, a process outlined in section [sec:selec].

Armed then with a list of selected sensitive distributions with minimal correlations, a total combined CL\(_{s}\) across all considered channels can then be constructed from the product of the likelihoods. This leaves the core of the methodology presented here unchanged, the effect is simply extending the covariances matrix. The overall result gives a probability, for each tested parameter set, that the observed counts \(n_{i}\), across all the measurement bins considered, are compatible with the full signal strength hypothesis.

Finally it is noted that this methodology has been designed to simply profile BSM contributions against data taken. This can be extended to incorporate a separate background simulation or include correlation between bins where available.

[3] | This is not unexpected, the construction up to this point has been designed to look at smoothly falling well-measured processes at energies that the LHC is designed to probe. This is however a result that should be monitored when considering different models. |

[4] | For the cases considered here the results were found to be equivalent, implying that the tested parameter space values fall into the asymptotic, or large sample, limit, and so the Asimov approach is used |

## Limitations¶

Most of the limitations come from the fact that (at present) Contur assumes the data are identically equal to the SM. This is an assumption that is reasonable for distributions where the uncertainties on the SM prediction are not larger than the uncertainties on the data. It is also the assumption made in the control regions of many searches, where the background evaluation is “data driven”.

Because of this, and because of the way correlations are treated, Contur as currently implemented is best adapted to identifying kinematic features (mass peaks, kinematic edges) and will be less sensitive to smooth deviations in normalisation. In particular, since we currently take the data to be identically equal to the SM expectation, we will be insensitive to a signal which might in principle arise as the cumulative effect of a number of statistically insignificant deviations across a range of experimental measurements. To do this properly requires an extensive evaluation of the theoretical uncertainties on the SM predictions for each channel. (An extension of the method planned for future work.)

Additionally, in low statistics regions, outlying events in the tails of the data will not lead to a weakening of the limit, as would be the case in a search. However, measurements unfolded to the particle-level are typically performed in bins with a requirement of minimum number of events in any given bin, reducing the impact of this effect (and also weakening the exclusion limits). Our limits focus on the impact of high precision measurements on the BSM model, in which systematic uncertainties typically dominate.

For these reasons, the limits derived are described as expected limits, and could be seen as delineating regions where the measurements are sentivive and deviations are disfavoured. In regions where the confidence level is high, they do represent a real exclusion.