Here the considerations taken in combining the datasets and defining the test are outlined


The approach taken is to consider simplified BSM models in the light of existing measurements which have already been shown to agree with SM expectations. Thus this is inherently an exercise in limit-setting rather than discovery. The assumption is that a generic, measurement-based approach such as this will not be competitive in terms of sensitivity, or speed of discovery, with a dedicated search for a specific BSM final-state signature. However, it will have the advantage of breadth of coverage, and will make a valuable contribution to physics at the energy frontier whether or not new signatures are discovered at the LHC. In the case of a new discovery, many models will be put forward to explain the data, as has for example already been seen:cite:PhysRevLett.116.150001 after the 750 GeV diphoton anomaly reported by ATLAS and CMS at the end of 2015 and start of 2016 [32][34]. Checking these models for consistency with existing measurements will be vital for unravelling whatever the data might be telling us. As will be shown in subsequent sections, models designed to explain one signature may have somewhat unexpected consequences in different final states, some of which have already been precisely measured. If it should turn out that no BSM signatures are in the end confirmed at the LHC, offers potentially the broadest and most generic constraints on new physics, and motivates the most precise possible model-independent measurements over a wide range of final states, giving the best chance of an indirect pointer to the eventual scale of new physics.

Dynamical data selection

Starting with the measurements discussed in Section we define a procedure to combine exclusion limits from different measured distributions. The data used for comparison in come in the form of histograms, which do not carry information about the correlations between uncertainties — even when in several cases detailed information is made available in the experimental papers. There are highly correlated uncertainties in several measurements, for example on the integrated luminosity, or the energy scale of jet measurements. In some cases these are dominant. Including correlations would be a highly complex process, since as well as correlations within a single data-set, there are also common systematic uncertainties between different results, which are generally not provided by the experiments. There are also overlaps between event samples used in many different measurements, which lead to non-trivial correlations in the statistical uncertainties. To attempt to avoid spuriously high exclusion rates due to multiply-counting what might be the same exclusion against several datasets, we take the following approach:

  1. Divide the measurements into groups that have no overlap in the event samples used, and hence no statistical correlation between them. These measurements are grouped by, crudely, different final states, different experiments, and different beam energies (see Table [tab:Rivet]).
  2. Scan within each group for the most significant deviation between BSM+SM and SM. This is done distribution-by-distribution and bin-by-bin within distributions. Use only the most significant deviation, and disregard the rest. Although the selection of the most significant deviation sounds intuitively suspect, in this case it is a conservative approach, since we are setting limits, and discarding the less-significant bins simply reduces sensitivity. The use of a single bin from each measurement removes the dominant effect of highly correlated systematic uncertainties within a single measurement. Where a number of statistically-independent measurements exists within a group, their likelihoods may be combined to give a single likelihood ratio from the group.
  3. Combine the likelihood ratios of the different groups to give a single exclusion limit.

Statistical Method

The question we wish to ask of any given BSM proposal is ‘at what significance do existing measurements, which agree with the SM, already exclude this’. For all the measurements considered, comparisons to SM calculations have shown consistency between them and the data. Thus as a starting point, we take the data as our “null signal”, and we superpose onto them the contribution from the BSM scenario under consideration. The uncertainties on the data will define the allowed space for these extra BSM contributions.

Taking each bin of each distribution considered as a separate statistic to be tested, a likelihood function for each bin can be constructed as follows,

(1)\[\begin{aligned} L(\mu, {b}, {\sigma}_{b}, {s}) = { \frac{(\mu s + b)^{n}}{n!} \exp\big(-(\mu s + b)\big) \times \frac{1}{\sqrt{2 \pi} \sigma_{b}} \exp\left(-\frac{(m - b)^{2}}{2 \sigma_{b}^{2}}\right)} \times \frac{(\tau s)^{k}}{k!}\exp\big(-\tau s\big)\,,\end{aligned}\]

where the three factors are:

  • A Poisson event count, noting that the measurements considered are differential cross section measurements, hence the counts are multiplied by a factor of the integrated luminosity taken from the experimental paper behind each analysis, to convert to an event count in each bin (and subsequently the additional events that the new physics would have added to the measurement made). This statistic in each tested bin then is comprised of:
    • \(s\), the parameter defining the BSM signal event count.
    • \(b\), the parameter defining the background event count.
    • \(n\), the observed event count.
    • \(\mu\), the signal strength parameter modulating the strength of the signal hypothesis tested, thus \(\mu=0\) corresponds to the background-only hypothesis and \(\mu=1\) the full signal strength hypothesis;
  • A convolution with a Gaussian defining the distribution of the background count, where the following additional components are identified:
    • \(m\), the background count. The expectation value of this count, which is used to construct the test, is taken as the central value of the measured data point.
    • \(\sigma_{b}\), the uncertainty in the background event count taken, from the data, as 1 \(\sigma\) error on a Gaussian (uncertainties taken as the combination of statistical and systematics uncertainties in quadrature. Typically the systematic uncertainty dominates).
  • An additional Poisson term describing the Monte Carlo error on the simulated BSM signal count with \(k\) being the actual number of generated BSM events. The expectation value of \(k\) is related to \(s\) by a factor \(\tau\), which is the ratio of the generated MC luminosity to the experimental luminosity.

This likelihood is then used to construct a test statistic based on the profile likelihood ratio, following the arguments laid out in Ref. [36]. In particular, the \(\tilde{q}_{\mu}\) test statistic is constructed. This enables the setting of a one-sided upper limit on the confidence in the strength parameter hypothesis, \(\mu\), desirable since in the situation that the observed strength parameter exceeds the tested hypothesis, agreement with the hypothesis should not diminish. In addition this construction places a lower limit on the strength parameter, where any observed fluctuations below the background-only hypothesis are said to agree with the background-only hypothesis  [3]. The required information then is the sampling distribution of this test statistic. This can either be evaluated either using the so called Asimov data set to build an approximate distribution of the considered test statistic, or explicitly using multiple Monte Carlo ‘toy model’ tests  [4].

The information needed to build the approximate sampling distributions is contained in the covariance matrix composed of the second derivatives with respect to the parameters (\(\mu, b\) and \(s\)), of the log of the likelihood given in equation (1). They are as follows:

(2)\[\begin{split}\begin{aligned} \mu \mu :& &\frac{\partial^2{\text{ln}L}}{\partial{\mu^2}} = & \frac{-ns^2}{(\mu s + b)^2} \\ b b :& &\frac{\partial^2{\text{ln}L}}{\partial{b^2}} = & \frac{-n}{(\mu s + b)^2} - \frac{1}{\sigma_b^2} \\ s s :& &\frac{\partial^2{\text{ln}L}}{\partial{s^2}} = & \frac{-n\mu^2}{(\mu s + b)^2} - \frac{k}{s^2} \\ \mu s = s \mu :& &\frac{\partial^2{\text{ln}L}}{\partial{\mu \partial s}} = & \frac{nb}{(\mu s + b)^2} - 1 \\ \mu b = b \mu :& &\frac{\partial^2{\text{ln}L}}{\partial{\mu \partial b}} = &\frac{-ns}{(\mu s + b)^2} \\ b s = sb :& &\frac{\partial^2{\text{ln}L}}{\partial{s \partial b}} =& \frac{-n\mu}{(\mu s + b)^2}.\end{aligned}\end{split}\]

Which are arranged in the inverse covariance matrix as follows.

(3)\[\begin{split}\begin{aligned} V^{-1} = - E \begin{bmatrix} \mu\mu & \mu s & \mu b \\ s \mu & s s & s b \\ b \mu & b s & b b \end{bmatrix} \end{aligned}\end{split}\]

The variance of \(\mu\) is extracted from the inverse of the matrix given in (3) as;

\[\sigma_\mu^{2} = V_{\mu,\mu}\]

In order to evaluate this, the counting parameters (\(n, m\) and \(k\)) are evaluated at their Asimov values, following arguments detailed in Ref. [36]. These are taken as follows,

  • \(n_{A} = E[n] = \mu' s + b\). The total count under the assumed signal strength, \(\mu'\), which for the purposes of this argument is equal to 1
  • \(m_{A}=E[m] = b\). The background count is defined as following a Gaussian distribution with a mean of \(b\).
  • \(k_{A} = E[k] = \tau s\). The signal count is defined following a Poisson distribution with a mean of \(\tau s\)

Using this data set the variance of the strength parameter, \(\mu\), under the assumption of a hypothesised value, \(\mu'\), can be found. This is then taken to define the distribution of the \(\tilde{q}_{\mu}\) statistic, and consequently the size of test corresponding to the observed value of the count. The size of the test can be quoted as a \(p\)-value, or equivalently the confidence level which is the inverse of the size of the test. As is convention in the particle physics community, the final measure of statistical agreement is presented in terms of what is known as the CL\(_{s}\) method [47][56]. Then, for a given distribution, CL\(_{s}\) can be evaluated separately for each bin, where the bin with the largest CL\(_{s}\) value (and correspondingly smallest \(p_{s+b}\) value) is taken to represent the sensitivity measure used to evaluate each distribution, a process outlined in section [sec:selec].

Armed then with a list of selected sensitive distributions with minimal correlations, a total combined CL\(_{s}\) across all considered channels can then be constructed from the product of the likelihoods. This leaves the core of the methodology presented here unchanged, the effect is simply extending the covariances matrix. The overall result gives a probability, for each tested parameter set, that the observed counts \(n_{i}\), across all the measurement bins considered, are compatible with the full signal strength hypothesis.

Finally it is noted that this methodology has been designed to simply profile BSM contributions against data taken. This can be extended to incorporate a separate background simulation or include correlation between bins where available.


To be useful in our approach, measurements must be made in as model-independent a fashion as possible. Cross sections should be measured in a kinematic region closely matching the detector acceptance — commonly called ’fiducial cross sections’ — to avoid extrapolation into unmeasured regions, since such extrapolations must always make theoretical assumptions; usually that the SM is valid. The measurements should generally be made in terms of observable final state particles (e.g. leptons, photons) or objects constructed from such particles (e.g. hadronic jets, missing energy) rather than assumed intermediate states (\(W, Z, H\), top). Finally, differential measurements are most useful, as features in the shapes of distributions are a more sensitive test than simple event rates — especially when there are highly-correlated systematic experimental uncertainties, such as those on the integrated luminosity, or the jet energy scale.

The measurements we consider fall into five loose and independent classes.

  1. Jets: event topologies with any number of jets but no missing energy, leptons, or photons. In this category there are important measurements from both ATLAS and CMS, many of which have existing analyses. We make use of the highest integrated-luminosity inclusive [13][30], dijet [9][11] and three-jet [14] measurements made in 7 TeV collisions, as well as the jet mass measurement from CMS [29]. Unfortunately results from 8 TeV collisions are rarer, and the only one we can use currently is the four-jet measurement from ATLAS [12].
  2. Electroweak: events with leptons, with or without missing energy or photons. The high-statistics \(W+\)jet and \(Z+\)jet measurements from ATLAS [15][6] and CMS [49],:cite:Khachatryan:2014zya, are used. We also use the ATLAS \(ZZ\) and \(W/Z+\gamma\) analyses [4][7], the former of which includes \(E_T^{\rm miss}\), via the \(Z \rightarrow \nu\bar{\nu}\) measurement.
  3. Missing energy, possibly with jets but no leptons or photons. This channel could in principle provide powerful constraints, and has been used in searches (see for example [8]). Unfortunately however, there are currently no fully-corrected particle-level distributions available in this category.
  4. Isolated photons, with or without missing energy, but no leptons. Here we make use of the inclusive [10], diphoton [5] and photon-plus-jet [2] measurements, where available. We also made a new routine for the CMS photon-plus jet measurement [31].
  5. Signatures specifically based on top quark or Higgs candidates. Most such measurements to date have been made at the ’parton’ level (that is, corrected using SM MC back to the top or Higgs before decay), and many of them are extrapolated to \(4\pi\) phase space. Both steps increase the model dependence and make them unsuitable for the approach. Recently, however, fiducial, differential, particle-level measurements have begun to appear:cite:Aad:2015hna,:cite:Khachatryan:2016gxp. These are potentially very powerful in excluding some models, but will in principle overlap with the previous categories depending on decay mode. We leave the inclusion of such measurements for future work.

The choice of which measurements are actually included at this stage is driven mainly by the availability of particle-level differential fiducial cross sections implemented in Rivet.

[3]This is not unexpected, the construction up to this point has been designed to look at smoothly falling well-measured processes at energies that the LHC is designed to probe. This is however a result that should be monitored when considering different models.
[4]For the cases considered here the results were found to be equivalent, implying that the tested parameter space values fall into the asymptotic, or large sample, limit, and so the Asimov approach is used