Inference of synergy/antagonism between anticancer drugs from the pooled analysis of clinical trials

Background Drug interactions can have a significant impact on the response to combinatorial therapy for anticancer treatment. In some instances these interactions can be anticipated based on pre-clinical models. However, the anticipation of drug interactions in the clinical context is in general a challenging task. Methods Here we propose the pooled analysis of clinical trials as a mean to investigate drug interactions in anticancer therapy. To this end we collected 1,163 Phase II clinical trials with response data on over 53,745 subjects. Results We provide statistical definitions of drugs resulting in clinical synergy and antagonism and identify drug combinations in each group. We also quantify the possibility of inferring interactions between three or more drugs from parameters characterizing the action of single and two-drugs combinations. Conclusions Our analysis provides a statistical methodology to track the performance of drug combinations in anticancer therapy and to quantify drug interactions in the clinical context.


I. INTRODUCTION
In this supplementary methods we provide a more detailed description of the statistical methods utilized in the main manuscript.

II. ONE COMBINATION ONE TRIAL
In this section we discuss the simplest case of one agent combination tested in one trial of sample size N . The outcome is the number of patients n that responded to treatment. Under the assumption that all patients are equivalent the probability to obtain n responses after treating N patients is given by the binomial distribution where p is the probability that a patient responds to treatment. p is an unknown parameter that we will estimate from the available data. To do so we use the Bayes theorem P (p|n, N )P (n, N ) = P (n|p, N )P (p, N ) that reads, the probability of the model parameter p given the data times the probability of the data equals the probability of the data given the model parameter p times the probability of the model parameter. P (p, N ) is called the prior distribution and P (p|n, N ) the posterior distribution. In general we can assume that p and N are independent and, therefore, P (p, N ) = P (p)P (N ) and (2) can be rewritten as where Z = P (n, N )/P (N ). Z(n, N ) can also be derived from the normalization of P (p|n, N ): 1 0 dpP (p|, n, N ) = 1, obtaining The prior distribution of p for the binomial model is a beta distribution (2; 3) where Be(p;α,β) denotes the beta distribution, B(α,β) denotes the beta function, andα andβ are called hyperparameters. Using symmetry arguments it can be shown that the correct choice of hyperparameters isα ≪ 1 andβ ≪ 1 (1). From (3) and (5) we obtain that the posterior distribution also follows a beta distribution as well P (p|n, N ) = Be(p; α, β) where α =α + n ≈ n (7) where the approximations are obtained after impossing thatα ≪ 1 andβ ≪ 1. From the posterior distribution we can calculate different quantities. For example the posterior estimate of the mean of p

III. ONE COMBINATION MULTIPLE TRIALS
In this section we discuss the case of one agent combination that has been tested in one or more trials of sample sizes N 1 , N 2 , ... N T , where T is the number of trials. The outcome is the number of patients n i that responded to treatment in trial i = 1, 2, . . . , T . In practice the clinical trials may have been conducted using different doses and/or treatment schedules and in different cancer subtypes. Thefore, the assumption that all patients are equivalent may not represent the real scenario. To deal with this case we assume that within each trial all patients are equivalent, but different trials may be characterized by different response rates. Specifically, we assume there are G groups of trials, each characterized by its one probability of response p k , k = 1, . . . , G and each trial belongs to one of these groups. We denote by g i the group to which trial i belongs, i = 1, . . . , T . We note G and g i are unknown parameters that will be estimated from the data. The probability to observe the data given this model is Again, using Bayes theorem we write P (p, g|n, N )P (n, N ) = P (n|p, g, N )P (p, g, N ) Assuming that the response rates, trial group and sample sizes are independent we obtain P (p, g, N ) = P (p)P (g)P (N ). The prior distribution of the response rates is, as explained above, given by a beta distribution, now across multiple groups The prior distribution of g can be generated assuming a multinomial model (2), where a trial belong to group k with probability π k , resulting in where and P (π) is the prior distribution of π. The prior distribution of the multinomial model is the Dirichlet distribution (2), the generalization of the beta distribution, is the generalized beta function. Once again, using symmetry arguments it can be shown that the correct choice if hyperparameters isγ ≪ 1 (1). Putting all together, from equations (10)-(15) we obtain the prior and posterior distributions where We can integrate (18) over the p and π variables to obtain the marginal posterior distribution for g Using equation (22) we can discriminate between different clusterings of the trials, including the case were all trials belong to one group, and select the one with highest probability. The hyperparametersα,β andγ should be choosen as small as possible, taking into account that too small values may compromise the numerical accuracy when computing the Γ function. We usedα =β =γ = 1 × 10 −2 . For no so large T and G, we can evaluate equation (22) for all possible clusterings of the T trials in K groups and select the group with largest likelihood P (g|n, N ). As a case study, we analyzed the case when there are two different trials with response probability p 1 and p 2 and sample sizes N 1 and N 2 . At each simulation of the trials, we generated values of n 1 and n 2 from the binomial distributions p(n 1 |p 1 , N 1 ) and p(n 2 |p 2 , N 2 ). Then, using equation(22), we determined if the trials fall into one or two different groups. For the case N 1 = N 2 and p 2 = 1 − p 1 (p 2 − p 1 = 1 − 2p 1 ), the Supplementary Figure 1 shows the fraction of times the variational method predicted that the two trials are statistically different in 1,000 trial simulations. When p 2 − p 1 > 0.3, in most cases the variational method correctly predicts that n 1 and n 2 are generated by differnt distributions. In contrast, when p 2 − p 1 < 0.1, in most cases the variational method fail to detect that n 1 and n 2 are generated from different distributions. This is, however, expected since for p 2 − p 1 < 0.1 the distributions of n 1 and n 2 are quite close to each other. The transition between these two extreme regimes is smoother or sharper depending on the sample sizes, being sharper the larger the sample sizes.

IV. 2-AGENTS APPROXIMATION
In this section we describe the numerical implementation of the 2-agent approximation. The 2-agent approximation is given by where C is the number of combinations, N a is the number of agents, s ci is the agent to combination matrix (s ci = 1 if agent i is used in combination c and s ci = 0 otherwise), the h i quantifies the response of single agents and the parameters J ij the interactions between agents i and j. We can write (23) in a more compact form after introducing the following notation. We create a list with all the hs and Js parameters, arranged such that the element k contains the parameter h k for 1 ≤ k ≤ N a and the Js for N a + 1 ≤ k > N p , where N p is the total number of hs and Js. In the most general scenario there are N a (N a − 1)/2 of Js, one for each pair of agents, and N p = N a + N a (N a − 1)/2. However, the 2-agents approximation can only constraint a give J ij if there is at least one combination containing agents i and j. The Js values that cannot be constrained are removed. We denote by N J the number of Js that are not removed, resulting in N p = N a + N J . We will also denonte by i k and j k the i and j associated with the k-th J ij element, N a + 1 ≤ k ≤ N p , in the parameters list. Using this notation we we can write (23) as where The system of equations (24) has C equations and N p variables. In the problem considered within the main manuscript N p > C. Taking into account that we do not know the precise values of p c but its distribution, we double the number equations as follows In this way we obtain a system of 2C equations and N p variables. In the problem considered in the main manuscript 2C > N p , resulting in more equations than variables. Finally, we take into account that the response rates take values between 0 and 1, imposing constraints on the hs and Js. In particular, since the hs represent the response rates for single agent combinations, they take values between 0 and 1, i.e. 0 ≤ x k ≤ 1 for 1 ≤ k ≤ N a . Similarly, for every combination of two agents we have 0 This constraint can be taken into account introducing the auxiliary equations After adding this constraint we augments our systems of equations to

Nv k=1
A ck x k = b c , 1 ≤ c ≤ 2C + N J (30) where N v = N p + N J and where δ ij = 1 when i = j and 0 otherwise.