### Model

Our aim was to estimate a treatment effect \({\theta }_{1}\) in a target subgroup of patients within a trial (e.g. the younger children in ODYSSEY), while borrowing information obtained from a larger subgroup of patients within the same trial. Suppose that data from the target subgroup provide an estimate \({y}_{1}\) of \({\theta }_{1}\) with standard error \({\sigma }_{1}\), and we assume:

$${y}_{1}\sim N\left({\theta }_{1},{\sigma }_{1}^{2}\right)$$

(1)

Suppose also that data from the larger subgroup (e.g. the older children in ODYSSEY) provide the following estimate \({y}_{0}\) of treatment effect \({\theta }_{0}\) in the larger subgroup, with standard error \({\sigma }_{0}\):

$${y}_{0}\sim N\left({\theta }_{0},{\sigma }_{0}^{2}\right)$$

(2)

We introduce an interaction parameter \(\delta\) to describe the relationship between treatment effects in the two subgroups: \({\theta }_{1}={\theta }_{0}+\delta\). Elicitation can be used to obtain opinions about likely values for \(\delta\), which represents an interaction between treatment and subgroup. In the ODYSSEY trial, the treatment effect of interest is a risk difference and \(\delta\) is a difference in risk differences. We assume a normal distribution for \(\delta\) and use elicited opinion to inform choice of the standard deviation \({\sigma }_{\delta }\):

$$\delta \sim N\left(0,{\sigma }_{\delta }^{2}\right)$$

(3)

We choose a mean of 0 for the distribution for \(\delta\), rather than using expert opinion to inform this assumption, because we wanted to use clinical opinion to inform the weight given to data from the larger subgroup but not to directly alter the location of the estimate for the smaller subgroup. We choose to specify a flat normal prior for \({\theta }_{0}\), since we do not want to introduce any prior belief about the magnitude of the treatment effect:

$${\theta }_{0}\sim N\left(0,{10}^{6}\right)$$

(4)

Under a framework proposed by Hobbs et al., the resulting prior for \({\theta }_{1}\) is a location commensurate prior with commensurability parameter \({\sigma }_{\delta }\) [9], where a commensurate prior is a prior distribution describing the extent to which a parameter in a new study varies around the corresponding parameter in a previous study or studies. Larger values for \({\sigma }_{\delta }\) represent greater uncertainty about the magnitude of the interaction and correspond to the larger subgroup of patients contributing less information to the target subgroup analysis. A value \({\sigma }_{\delta }=0\) represents certainty that \({\theta }_{1}={\theta }_{0}\) and would result in the two subgroups being combined.

In a combined analysis using data sets from both subgroups, the treatment effect in the target subgroup is estimated as follows:

$${\theta }_{1}|{y}_{1},{y}_{0},\delta \sim N\left(\frac{{y}_{1}/{\sigma }_{1}^{2}+{y}_{0}/\left({\sigma }_{0}^{2}+{\sigma }_{\delta }^{2}\right)}{1/{\sigma }_{1}^{2}+1/\left({\sigma }_{0}^{2}+{\sigma }_{\delta }^{2}\right)},\frac{1}{1/{\sigma }_{1}^{2}+1/\left({\sigma }_{0}^{2}+{\sigma }_{\delta }^{2}\right)}\right)$$

(5)

We note that \({\sigma }_{1}^{2}\) and \({\sigma }_{0}^{2}\) are assumed fixed and known, as estimated from the data, and no allowance is made for their uncertainty. The motivation for choosing a simple model and assuming normal distributions is that the variance of \(\delta\) has a direct correspondence to the relative weights allocated to the two subgroups of patients in the analysis. This simplifies communicating to clinicians how uncertainty about \(\delta\) affects the results of the Bayesian analysis. In this analysis, the relative weight given to the larger subgroup in estimation of the treatment effect in the target subgroup is:

$$\frac{1}{{\sigma }_{0}^{2}+{\sigma }_{\delta }^{2}}/\left(\frac{1}{{\sigma }_{1}^{2}}+\frac{1}{{\sigma }_{0}^{2}+{\sigma }_{\delta }^{2}}\right)$$

(6)

Derivations of formulae (5) and (6) are provided in supplementary material (Additional file 1), together with a mathematical description of the model.

### Elicitation methods

Expert opinion was sought on the interaction parameter \(\delta\) representing the difference between the risk difference in the younger children weighing < 14 kg and the risk difference in the older children weighing ≥ 14 kg.

A pilot elicitation study was carried out in which four methods were implemented and evaluated by five experts with paediatric or statistical expertise (authors AB,DF,DMG,CLM,AT). Feedback informed us that experts found it helpful to be asked the same question in multiple ways, and then to be asked to moderate their answers, because this clarified their thought process. This approach allowed us to check and discuss the coherency of the experts’ statements about their uncertainty [10]. We therefore decided to include three methods in our subsequent elicitation exercise, chosen on the basis of being well understood. The fourth method involved eliciting an estimate together with an inter-quartile range which was not easily communicated or understood.

In the final elicitation procedure, thirteen experts practising as paediatric HIV clinicians (including eleven ODYSSEY trial investigators and one member of the ODYSSEY Endpoint Review Committee) were asked to provide initial answers under each of two methods (stages 1 and 2 below) and were then asked to moderate their answers by providing an answer under a third method (stage 3). The stages of the elicitation procedure were identical across experts.

#### Stage 1

Experts were asked to assume that data were available from a very large trial comparing the two arms, DTG to SOC regimens, in older children weighing ≥ 14 kg, in which the failure rate by 96 weeks in the SOC arm was 18% and the treatment difference in failure rates was estimated as 5% in favour of the DTG arm. They were asked to suppose that the trial was so large that sampling variability was close to zero and the observed estimate was very close to the true treatment effect in older children. This assumption was discussed to ensure that experts understood our focus was on uncertainty arising from imperfect knowledge rather than uncertainty arising from sampling variation [8]. We note that the assumed values were hypothetical.

Opinion was elicited on the risk difference in younger children recruited weighing < 14 kg (age under approximately 3 years but all > 4 weeks of age), rather than directly on the difference in risk differences. In order to elicit a range for the experts’ uncertainty about the risk difference in younger children, we asked them to consider what size of true difference in younger children would surprise them, first in the direction of more extreme risk differences favouring DTG and then in the opposite direction. These values formed their uncertainty range. Eliciting uncertainty ranges provides a mean and variance for each expert’s probability distribution, under the assumption of normality; people have been shown to perform better when assessing intervals rather than variances, and it is preferable to avoid eliciting a mean value first to avoid anchoring effects [8, 10].

Next, experts were asked to assign a probability to their chosen uncertainty range, to represent how likely they believed it was that the true risk difference in younger children was included. They were asked to think about placing 100 counters either inside or on either side of the range, and were given an opportunity to use physical counters or draw their distribution to help visualise their probability beliefs, following a “bins and chips” approach to assigning probabilities to intervals (Figure S2) [11, 12].

#### Stage 2

The second stage was identical to the first stage, except for the risk difference assumed in older children. Here, experts were asked to assume that failure rates of 18% by 96 weeks had been observed in both arms in a very large trial comparing DTG to SOC in children recruited weighing ≥ 14 kg.

#### Stage 3

Experts were asked to consider the weight allocated to the data from older children if conclusions about the risk difference in younger children were based on a combined analysis of both subgroups. They were informed that the data from older children would receive approximately 90% of the weight if weightings were based on sample sizes alone or 0% of the weight if these data were ignored. They were then asked how much weight they would like allocated to the data from older children. Experts were not required to provide a rationale for their beliefs, but comments on rationale were documented if mentioned. An Excel spreadsheet was provided (Figure S1 in Additional file 1) to illustrate the correspondence between weights allocated to the data from older children and beliefs about the uncertainty range for the treatment effect in younger children. Choices for the weight, assumed risk difference in older children and level of uncertainty could be altered within the spreadsheet. Experts were asked to consider their initial range choices and think about the correspondence between weight in combined analysis and uncertainty ranges, and then make a final choice of weight to be allocated to the older children’s data.

### Elicitation process

Paediatricians with expertise in HIV treatment and management, nearly all of whom were researchers enrolling children in ODYSSEY and/or other paediatric HIV studies, were invited to participate in the elicitations. As it is recommended that opinion-based prior distributions represent a breadth of opinions [13], we invited experts from several different countries. The chosen experts all had sufficient experience and knowledge to provide valid and reliable descriptions of the quantities of interest [12]. Conducting elicitations remotely would have been possible, but face-to-face elicitations were preferred because experts can find the elicitation process difficult and it is useful to have a facilitator on hand to answer questions [8, 14]. We therefore carried out the elicitation exercise alongside an international PENTA-ID Network meeting [15] in May 2019. Recruitment was still ongoing at that time and elicitations assumed that the trial would include 700 children weighing ≥ 14 kg and 80 children weighing < 14 kg. The elicitations were carried out before results from the older children were available in order that opinions were not influenced by knowledge of the results. This avoided using the data from older children twice in the Bayesian analysis. Thirteen experts participated in the elicitation exercise; 12 experts were interviewed face-to-face by RT in 1:1 meetings and one expert was interviewed by telephone (since a face-to-face meeting was not possible).

### Analysis of elicitation results

We mapped each expert’s chosen uncertainty range \(\left(a,b\right)\) and corresponding probability *p* to a normal distribution. The following values were calculated for the mean \(\mu\) and standard deviation \(\sigma\) of the expert’s probability distribution:

$$\mu =\frac{a+b}{2}$$

(7)

$$\sigma =\frac{b-a}{2{\Phi }^{-1}\left(\frac{p+1}{2}\right)}$$

where \(\Phi\) is the cumulative distribution function of the standard normal distribution. We present means and inter-quartile ranges from the fitted distributions rather than the original ranges chosen, in order that distributions are comparable across experts who assigned different probabilities to their ranges.

The relative weight to be allocated to evidence from the ≥ 14 kg children in Bayesian analysis of the < 14 kg children was derived from the clinical opinions elicited. To pool opinions across experts, we used the median of the weights chosen. This method of pooling was chosen in order that the pooled weight represents the opinion of a ‘typical’ expert and is not influenced by extreme opinions.

### Planned Bayesian analysis in the ODYSSEY trial

A Bayesian analysis will be reported for the children weighing < 14 kg, alongside frequentist analyses of the children weighing < 14 kg and of the whole trial population (< 14 kg and ≥ 14 kg). The analysis is expected to be performed in early 2022. The Statistical Analysis Plan was written in advance of conducting the elicitations and specifies that if at least 80% of the experts chose weights within a 30% absolute range, the Bayesian analysis will be reported as the primary analysis; alternatively, if less than 80% of the experts chose weights within a 30% absolute range, the Bayesian analysis would be reported as a secondary analysis and the frequentist analysis of the < 14 kg children would be reported as the primary analysis of these data. The range threshold of 30% was chosen on the basis of how much variation in opinion among the clinical experts was considered acceptable.