 Research
 Open access
 Published:
Elucidating vaccine efficacy using a correlate of protection, demographics, and logistic regression
BMC Medical Research Methodology volume 24, Article number: 101 (2024)
Abstract
Background
Vaccine efficacy (VE) assessed in a randomized controlled clinical trial can be affected by demographic, clinical, and other subjectspecific characteristics evaluated as baseline covariates. Understanding the effect of covariates on efficacy is key to decisions by vaccine developers and public health authorities.
Methods
This work evaluates the impact of including correlate of protection (CoP) data in logistic regression on its performance in identifying statistically and clinically significant covariates in settings typical for a vaccine phase 3 trial. The proposed approach uses CoP data and covariate data as predictors of clinical outcome (diseased versus nondiseased) and is compared to logistic regression (without CoP data) to relate vaccination status and covariate data to clinical outcome.
Results
Clinical trial simulations, in which the true relationship between CoP data and clinical outcome probability is a sigmoid function, show that use of CoP data increases the positive predictive value for detection of a covariate effect. If the true relationship is characterized by a decreasing convex function, use of CoP data does not substantially change positive or negative predictive value. In either scenario, vaccine efficacy is estimated more precisely (i.e., confidence intervals are narrower) in covariatedefined subgroups if CoP data are used, implying that using CoP data increases the ability to determine clinical significance of baseline covariate effects on efficacy.
Conclusions
This study proposes and evaluates a novel approach for assessing baseline demographic covariates potentially affecting VE. Results show that the proposed approach can sensitively and specifically identify potentially important covariates and provides a method for evaluating their likely clinical significance in terms of predicted impact on vaccine efficacy. It shows further that inclusion of CoP data can enable more precise VE estimation, thus enhancing study power and/or efficiency and providing even better information to support health policy and development decisions.
Background
This work introduces a novel use of immune response biomarkers to help identify baseline covariates affecting vaccine efficacy (VE). VE is defined as a proportional reduction in risk of disease occurrence for vaccinated subjects, compared to placebo control subjects, and is often assessed by counting disease cases and noncases in randomized controlled clinical trials [1]. Baseline covariates refer to demographic and clinical characteristics (e.g., age, gender, race and ethnicity, or prevaccination serostatus) or other information (e.g., time and site of enrollment) collected from subjects before the time of randomization (i.e., random assignment to active vaccination versus placebo arm). A randomized controlled trial can be used to estimate VE even if the primary analysis does not consider baseline covariates because, due to randomization, measured and unmeasured covariates will, on average, be balanced between the vaccinated and control groups. However, VE may be affected by baseline covariates (for example, it can vary with age) and understanding the impact of covariates on VE is key to making informed decisions, not only in the development of safe and effective vaccines, but also in public health considerations postlicensure.
Statistical significance of covariate effects on binary clinical outcome (e.g., diseased versus nondiseased) is typically [2,3,4,5] evaluated using multiple (also referred to as “multivariable”) logistic regression, which (as used here) involves incorporating multiple explanatory variables (predictors) in the regression analysis for prediction of a single binary outcome. Clinical significance of a covariate effect on VE, a notion different from that of statistical significance, can be assessed by comparing estimated relative risks (RR) between the vaccinated and control subjects in covariatedefined subgroups. Estimated effect can be large in magnitude but not prove statistically significant (e.g., due to variability, trial size, etc.), or can prove highly significant statistically, yet not clinically (e.g., when the effect size is too small to have a measurable impact on public health).
Efficacy trials often measure subjects’ immune response postvaccination (immunogenicity) as an exploratory endpoint in addition to assessing the trial’s primary clinical endpoint(s). An immunogenicity biomarker which reliably predicts protection is termed correlate of protection (CoP) [6]. The first formal method to validate an immunogenicity biomarker as a CoP using data from a phase 3 trial was proposed by Prentice [7], who introduced the following criteria:

Criterion of vaccine efficacy: demonstrating vaccine effect on the clinical endpoint (e.g., occurrence of disease or infection) evaluated using casecounting

Criterion of vaccine immunogenicity: demonstrating vaccine effect on the immunogenicity biomarker

Criterion of a correlate of risk (CoR): demonstrating that the immunogenicity biomarker correlates with the clinical endpoint

Criterion of a CoP: demonstrating that the probability of the clinical endpoint is conditionally independent of vaccination status when conditioned on the immunogenicity biomarker (indicating that the full vaccine effect is mediated by immunogenicity)
The Prentice framework has gained widespread adoption [8,9,10,11] due to its simplicity. In this work, the term “CoP” is used for a biomarker that meets all four Prentice criteria [7] and fully mediates the vaccine effect. It has been shown that CoPbased VE prediction is more precise than casecountbased VE estimation [12].
The work presented here is motivated by the need to assess a natural extension of that result: inclusion of CoP data could increase efficiency in finding covariate effects and comparing VE between subgroups (by reducing the width of CoPbased confidence interval compared to casecounting). VE can be affected in three ways (given immunogenicity is a CoP): either (i) immunogenicity measurements are distributed significantly differently across subpopulations, or (ii) the subpopulations differ in the relationship between immunogenicity and clinical outcome, or (iii) they differ in both the immunogenicity distributions and the relationship. Because the assessment of (i) is typically done using existing methods [13], the work here focuses on the harder problem of assessing (ii) and (iii).
The aim of this work is to compare two kinds of logistic regression models in terms of their ability to identify and estimate covariate effects on VE. Specifically, we compare (1) the “typical” approach, which evaluates the effects of baseline covariates and vaccination status on clinical outcome (disease status), to (2) the proposed approach (referred to as “CoPbased”) which assesses the effects of baseline covariates and CoP on clinical outcome. Both approaches enable the estimation of RR (vaccinated versus control subjects) and VE in subgroups of interest.
This paper is organized as follows. Sect. "Methods" describes the modeling assumptions and the typical and CoPbased logistic regression approaches used in our analysis. Sect. "Simulation Study" evaluates the properties of these approaches (their relative ability to identify baseline covariates that impact VE) using many (5000) simulated vaccine clinical trials. Sect. "Example Analysis of a Single Hypothetical Vaccine Clinical Trial Dataset" illustrates their application to one kind of typical vaccine clinical trial (using simulated data from one representative trial). Sects. "Discussion" and "Conclusions" summarize key findings and (respectively) the implications for efficacybased decisions.
Methods
Data collection and assumptions
Because they are often collected in, e.g., a randomized controlled phase 3 vaccine clinical trial, assume that the following data are available for each subject: disease status (diseased or nondiseased), vaccination status (vaccinated or control), immunogenicity biomarker value (assumed to be a CoP), and a set of baseline covariates. Disease status is a binary variable, with value 1 in diseased and 0 in nondiseased subjects, an indicator of clinical outcome set to 1 if the disease is diagnosed (by formal trial endpoint criteria) at any time during the fixed duration of the trial’s observation period. Vaccination status is a binary variable indicating treatment, with value 1 in vaccinated and 0 in control (assumed here to be placebo) subjects. The immunogenicity biomarker is a continuous variable, typically lognormally distributed (within properly defined subgroups) and typically increased by an efficacious prophylactic treatment (vaccination). Baseline covariates can be binary, categorical (ordinal or without ordering), or continuous; they are determined at baseline, prior to randomization (and vaccination).
Let \({T}_{i}^{\text{vaccinated}}\), \({T}_{j}^{\text{control}}\) be the log immunogenicity biomarker measurement (also referred to as logtiter, since neutralizing antibody titer is often used) for \(i\)th vaccinated subject (\(i=1,\dots , N\)) and \(j\)th control subject (\(j=1,\dots , M\)), respectively. \(N\) and \(M\) are the number of subjects in the vaccinated and control groups, respectively. Let \({VS}_{i}^{\text{vaccinated}}\), \({VS}_{j}^{\text{control}}\) be vaccination status, and \({DS}_{i}^{\text{vaccinated}}\), \({DS}_{j}^{\text{control}}\) be disease status for \(i\)th vaccinated subject and \(j\)th control subject. Let \({C}_{i,k}^{\text{vaccinated}}\), \({C}_{j,k}^{\text{control}}\) be the covariate value for \(i\)th vaccinated subject, \(j\)th control subject and \(k\)th baseline covariate variable (\(k=1,\dots , K\)). \(K\) represents the total number of collected baseline covariates. For a given set of \(L\) independent variables \({x}_{1}, {x}_{2}, \dots , {x}_{L}\), the logodds of disease (\(y\)) can be estimated by logistic regression, using a linear predictor (\(lp\)) as:
where \(lp={\beta }_{0}+\sum_{l=1}^{L}{{\beta }_{l}x}_{l}\).
Alternatively, a logistic model involving an interaction term (denoted \({\beta }_{\mathrm{1,2}}\)) between independent variables \({x}_{1}\) and \({x}_{2}\) may be described as:
The probability of disease (PoD), \(p\), is
If one of the independent variables in the logistic model is logtiter, the probability of disease will be referred to as \(PoD(T)\) or a PoD curve, a function of logtiter (as well as, potentially, other independent variables).
Both approaches described below, i.e., typical (not involving logtiter) and CoPbased (involving logtiter), can be used to evaluate statistical and clinical significance of covariate effects on clinical outcome as follows:

To assess statistical significance of covariate effects, the test for the presence of an effect is deemed positive if either the covariate or interaction effect is proved significant (e.g., at statistical significance level \(\alpha =0.05\)).

Clinical significance of any covariate effect depends on the application; it might be ascertained by comparing the relative health impact between subpopulations, defined with respect to the covariate of interest, using the VE difference associated with subpopulations in question. Thus, to assess clinical significance of a covariate effect associated with specific subpopulations, VE is estimated and compared across covariatedefined subgroups.
The models below are assumed to include all potentially clinically meaningful covariates, following the concept of a full covariate modeling approach [14].
Typical approach
Independent variables used to predict disease status in the typical approach are vaccination status and baseline covariate(s) of interest. Logodds of disease is given by Eqs. 1 and 2, with \({x}_{1}=VS\), and \({x}_{2},{x}_{3},\dots ,{x}_{L}={C}_{1},{C}_{2},\dots ,{C}_{K}\).
For illustration, in Sects. "Simulation Study" and "Example Analysis of a Single Hypothetical Vaccine Clinical Trial Dataset", only one baseline covariate, \({C}_{1}\), is considered. The following models are fitted (i.e., parameters are estimated to maximize posterior likelihood for a given dataset):
a model not involving interaction between the independent variables (derived from Eq. 1),
a model involving interaction between the independent variables (derived from Eq. 2),
VE can be estimated for each of the models above, using RR as:
where \({p}^{\text{vaccinated}}\), \({p}^{\text{control}}\) are expected values for each of the two populations, expressed as:
For a given set of data, the 95% confidence interval (CI) associated with estimated VE needs to account for the uncertainty regarding the \({\beta }_{0}, {\beta }_{1}, {\beta }_{2}, {\beta }_{\mathrm{1,2}}, \dots , {\beta }_{L}\) parameters and variability in the observed data. This can be done via parametric resampling of the posterior distribution for parameters and bootstrapping the observed data in the vaccinated and control groups. The bootstrap resampling of observed data is performed on subjects: each time a subject is selected, all his/her characteristics (covariate values) are used in the estimation of VE.
CoPbased approach
Several approaches have been proposed to model the relationship between the CoP and probability of disease [12, 15, 16]. In this paper, a logistic model is used for the PoD curve estimation (see Fig. 1 for comparison between logistic model and other models [12, 15, 16]).
Independent variables used to predict disease status in the CoPbased approach include (as above) the baseline covariate(s) of interest, and now also include titer (usually in the form of logtiter). Logodds of disease is given by Eqs. 1 and 2, with \({x}_{1}=T\), and \({x}_{2},{x}_{3},\dots ,{x}_{L}={C}_{1},{C}_{2},\dots ,{C}_{K}\).
Several models can be considered, when one baseline covariate, \({C}_{1}\), is evaluated (here, again as above, for illustration):
a model with linear term for titer, not involving interaction between the independent variables,
a model with linear term for titer, involving interaction between the independent variables,
a model with quadratic term for titer, not involving interaction between the independent variables,
a model with quadratic term for titer, involving interaction between the independent variables,
If the assumption of logtiter being a CoP is met (according to the Prentice framework), the effect of logtiter (linear or quadratic) is significant (among other conditions). Here, “significant” means that the coefficient involving logtiter is different from 0 at prespecified level of statistical significance (here we adopt \(\alpha =0.05\)). The use of a quadratic term is here used as an illustration for a more general, nonlinear relationship: in general, an unrealistically large amount of data is likely to be required to distinguish the curvature of different models in this context. The importance of logistic models with the nonlinear effect of logtiter was highlighted by Callegaro and Tibaldi, 2019 [17], who demonstrated that lack of fit of a model (e.g., when using a linear effect of logtiter in the context of high VE) leads to substantial loss in power to meet Prentice criteria. Although these learnings are applicable primarily to the CoP assessment (an objective different from ours), the CoPbased approach to evaluate baseline covariate effects (proposed here) is analogous to evaluating Prentice criterion four (i.e., evaluating the effect of vaccination status when controlling for logtiter).
CoPbased VE can be determined for each of the models above using Eq. 6. The 95% CI is calculated as described above; logtiter is treated as any other covariate. Accuracy and precision of CoPbased VE, as well as the coverage probability of the respective confidence intervals, are evaluated in Sect. "Simulation Study".
A similar approach for predicting VE (without covariates) was described by Coudeville et al., 2010 [16], who used a different functional form in representing the PoD curve (Fig. 1C), and prevaccination and postvaccination immune marker measurements in the vaccinated subjects (instead of immune marker data postvaccination in the vaccinated and control groups used here).
Simulation Study
Overview of the simulation process
To test relative performance of the typical and the CoPbased approach in identifying impactful baseline covariates, four steps were performed:

Step 1: Assumed true values were assigned (i) to PoD curve parameters, and (ii) to logtiter distribution parameters for all covariatedefined subgroups of the vaccinated and control group.

Step 2: Logtiter data and baseline covariate data were generated for all vaccinated and control in silico subjects using random sampling from true distributions. Disease status was assigned to each subject randomly using the probability of disease defined by the true PoD curve.

Step 3: To evaluate statistical significance of a baseline covariate, pvalues associated with estimated regression coefficients in Eqs. 4, 5 (typical approach), and 9, 10 (CoPbased approach, with linear term for logtiter) were used as described in Sect. "Data collection and assumptions".

Step 4: To evaluate clinical significance, the bestfitting model for each of the two approaches was selected, using the Akaike Information Criterion (AIC), from Eqs. 4 or 5 (covariate model for the typical approach), and from Eqs. 9, 10, 11, or 12 (structural and covariate model for the CoPbased approach). The selected model was used to estimate VE as described in Sects. "Typical approach" and "CoPbased approach". (By “structural model” we mean the linear, quadratic, or other form of dependence of probability of disease on logtiter.)
Steps 2 to 4 were repeated 5000 times to yield 5000 sets of data and corresponding results of covariate analysis (i.e., pvalues of fitted coefficients in Step 3 and VE estimates in covariatedefined subgroups in Step 4), which were compared to the “truth” (values implied by the assigned model and parameter values).
Alternatively, the assessment of statistical significance of baseline covariate(s) may be performed using the bestfitting model, selected in Step 4. Even when numerous covariates and nonlinear logtiter dependence are considered, it can be advantageous (in terms of finding the most parsimonious model) to take the stepwiseapproach proposed above. This approach would be to use a linear term for logtiter in Step 3 for statistical significance assessment, and proceed to Step 4 to find the bestfitting structural model (e.g., potentially quadratic dependence on logtiter) for CoPbased VE estimation only if any covariate(s) are significant.
Data generation and parameter estimation
Five thousand datasets representing typical phase 3 vaccine efficacy trials were simulated for each of the four scenarios, numbered i through iv, defined by parameter values shown in Table 1, and illustrated in Figs. 1, 2 and 3. For ease of understanding and computational efficiency, only one binary baseline covariate (e.g., age group, for illustration) was considered. Difference in VE across subgroups (80% in younger, 50% in older), in simulated scenarios ii and iv, was driven by the difference in the PoD curves (Fig. 2). Simulated immunogenicity distributions of the vaccinated and control subjects were the same in younger and older subgroups in all scenarios (Fig. 3).
The true PoD curve was represented by a logistic function in scenarios i and ii:
and by a Hill function in scenarios iii and iv:
where \(A\) represents the age group, with \(A=1\) for younger participants and \(A=0\) for older participants. In these forms, the \({\beta }_{\mathrm{1,2}}\) and \(k\) parameters, respectively, represent a shift in logtiter required to provide a given protection for older subjects (versus that for younger).
Each simulated dataset was fitted with six logistic models (derived from Eqs. 4, 5, 9, 10, 11, 12, respectively):
Assessment of age group effect
Age group, indicated by marker \(A\) (with values of \({A}_{i}^{\text{vaccinated}}\), \({A}_{j}^{\text{control}}\) for \(i\)th vaccinated subject, and \(j\)th control subject), was termed a significant covariate in the typical approach if pvalue associated with \({\beta }_{2}\) (being different from 0) in Eq. 15 or \({\beta }_{\mathrm{1,2}}\) (similarly) in Eq. 16 was less than \(0.05\). In the CoPbased approach, the effect of age group was deemed significant if pvalue associated with \({\beta }_{2}\) in Eq. 17 or \({\beta }_{\mathrm{1,2}}\) in Eq. 18 was less than \(0.05\). (Issues of pvalue correction for multiple testing are not addressed here.)
Table 2 summarizes true positive rates (percentage of simulated trials in which the effect of age group was correctly identified as significant in scenarios ii, iv), true negative rates (percentage of simulated trials in which the effect of age group was correctly identified as not significant in scenarios i, iii), positive predictive values (probability that age group impacts the outcome, when its effect was found significant), negative predictive values (probability that age group does not impact the outcome, when its effect was not found significant), and areas under the ROC curve (AUC).
The typical approach provided true positive rates, false positive rates, negative predictive values, and AUCs that appear to be very similar to those from the CoPbased approach. When the true PoD curve was a Hill function, inclusion of the CoP predictor in the logistic regression increased positive predictive value of covariate effect detection by 2.6% from 91.8% to 94.4% (5000 simulated trials with 15,000 subjects, ~ 200 disease cases, Table 2), an effect size with potentially substantial impact (cf. Sect. "Discussion").
The CoPbased logistic regression with linear term for logtiter showed (Table 2) good performance of covariate effects assessment based on the pvalue, even when the true relationship was a Hill function (i.e., when logodds is a nonlinear function of logtiter). Thus, for the scenarios investigated here, the model selection step (to determine the best fitting structural model, as done in Sect. "Accuracy and precision of VE estimation") is not necessary for detection of covariate effects using the criterion of statistical significance.
Accuracy and precision of VE estimation
VE and its CI were estimated using the selected best fitting model (Eq. 15 or 16 for the typical approach; Eqs. 17, 18, 19, or 20 for the CoPbased approach; Table 3) and compared to casecountingbased estimation of VE and its CI. The simulated trials for which the selected model does not match the simulated model are generally consistent between the CoPbased and the typical approach (Table 3) and can be understood in terms of data from those trials (Supplementary Material, Figure S1). Distributions of the VE point estimates for the 5000 simulation scenarios are summarized in Fig. 4.
In the CoPbased approach, the logistic PoD curve fit (based on the selected model), combined with immunogenicity data, produced accurate point estimates of VE (Fig. 4) and wellcalibrated CI (details in Supplementary Material, Tables S2S4) for all four scenarios.
In both agedefined subgroups, the VE estimation by the CoPbased approach was more precise when compared to VE estimated by the typical approach (which was more precise than casecounting in the absence of age effect, i.e., scenarios i and iii; Fig. 4 and additional details in Supplementary Material, Figure S2). The VE estimate by the CoPbased method was closer to the (simulated) “truth” than that of the typical approach for most simulated trials for these scenarios (Fig. 4). Further, Fig. 5 shows that over 90% of the time the CoPbased CI on that estimate was narrower than that obtained by the typical approach.
Implications of skipping the model selection step for accuracy and precision of VE estimation are discussed in Supplementary Material (Figures S3–S6). Even if model selection is not performed (e.g., due to computation time), accurate CoPbased estimates (one for each scenario iiv) of VE are obtained (i.e., using only the Eq. 20 model to fit the simulated trials, Supplementary Material, Figure S6). However, precision of such estimation can be lower than that obtained using models resulting from the model selection step. If the logistic regression uses an incorrect representation of covariate effects (and, in the case of the CoPbased approach, of the structural model), then the result can be biased; if the model is not misspecified then VE predicted by typical logistic regression can be nearly identical to that obtained from casecounting.
Example analysis of a single hypothetical vaccine clinical trial dataset
To illustrate the proposed data analysis, all the methods used in the simulation study were applied to a single simulated dataset of a vaccine clinical trial (randomly selected from 5000 simulated datasets of scenario iv), which used a Hill function as the underlying true PoD curve, and the age group effect on the PoD curve leading to true VE of 50% in older and 80% in younger subjects. The simulation was of 15,000 subjects: 10,000 vaccinated, 5000 control; 3760 older, 11,240 younger; 191 diseased, 14,809 nondiseased. As shown in Fig. 6, respective immunogenicity distributions for the older and younger subjects were very similar, as there was no age effect on the simulated immunogenicity in the model used for simulation.
Table 4 shows results of age group effect assessment based on the statistical criteria. Age group was a significant predictor in all six fitted models. Table 5 reports VE estimated using the selected model for typical logistic regression (Eq. 16), and that for CoPbased logistic regression (Eq. 20), enabling assessment of clinical significance of the age group effect. CoPbased VEs were closer to true VEs and had narrower CIs than those obtained by the typical logistic regression or by casecounting. PoD curve estimates (Eq. 20) and immunogenicity data used for CoPbased VE estimation in older and younger subgroups are visualized in Fig. 6.
Even though both methods correctly identify age group as a significant factor affecting VE, the CoPbased approach correctly indicates that the vaccine is efficacious in the older group, whereas the typical approach incorrectly suggests potentially negative vaccine efficacy (Tables 4 and 5). Thus, use of the typical approach could result in health authority hesitancy to license or recommend the use of this vaccine in older subjects or in a requirement for additional clinical evidence (e.g., results of an additional phase 3 study).
In contrast, if the CoPbased method is applied, a significant vaccineinduced protection in older subjects, 60% (95% CI, 44% to 70%), is estimated (appropriately).
Discussion
Logistic regression can be reliably used to detect the effect of a binary covariate on VE. In the simulated trials of 15,000 subjects with a Hill curve as the true PoD (~ 200 disease cases), inclusion of the CoP predictor in the logistic regression increased positive predictive value of covariate effect detection by 2.6%, compared to the typical approach (PPV, i.e., probability that the true model involves the covariate when our test indicates the covariate should be included: 94.4% versus 91.8%). Thus, with a Hill curve as the true PoD, the typical approach was 46% more likely (than CoPbased) to falsely indicate that there is a difference in subgroups (100% minus PPV, i.e., probability of the positive test to be a false positive is 8.2% for the typical approach versus 5.6% for the CoPbased method). In other words, when the test for a covariate effect (based on statistical significance) was positive, it was more likely to be correct when using the CoPbased approach. The difference in this performance was still present (although smaller) when the true PoD curve was logistic (this was tested using a linear term for logtiter).
And, even when the typical approach detects the presence of a covariate effect, the ability of the CoPbased approach to reduce the width of VE confidence interval and to detect clinically significant effects represents a strong advantage in understanding the degree to which baseline covariates impact VE. Simulation results show that using (casecounting or) the typical logistic regression approach in the presence of a covariate effect could result in VE being underestimated enough to stop development of an efficacious vaccine. Simulations also showed that vaccine efficacy in covariatedefined subgroups was estimated accurately and more precisely if CoP data were used in the logistic regression. Phase 3 studies to evaluate vaccine efficacy are typically powered for overall casecount VE as a primary endpoint, and our analysis shows that the resulting confidence intervals of casecount VE in covariatedefined subgroups are wider than those from CoPbased methods, potentially too wide to demonstrate that VE in subgroups is significantly different from zero. Even when the covariate effect is detected, the wider CIs can result in standard methods failing to identify clinically (and even statistically) significant VE differences between subgroups (e.g., due to overlapping CIs in subgroups of interest). The loss in information (leading to the loss in precision) of the standard methods compared to the CoPbased methods is due to not incorporating the biomarker data (when a predictive biomarker exists): even use of dichotomized ("absolute CoP”) information would be likely to improve predictions (i.e., it is not just the use of a continuous versus binary predictor).
The work presented assumes immunogenicity is measured in all trial participants. In the frequent case that only a subset of nondiseased subjects is assayed for immunogenicity values, weighted logistic regression [19, 20] accounting for the casecohort design can be used. Further research should examine how the casecohort trial design interacts with approaches described here.
The proposed method further assumes immunogenicity data are correlated with protection, meeting Prentice criteria [7], and fully mediating the vaccine effect. In case of lack of full mediation of vaccineinduced protection through the immune response biomarker, the CoPbased approach can still be applied, and vaccination status should be included in the CoPbased logistic model (vaccination status can be added as a predictor, in addition to immune biomarker and covariate data) to account for the residual effect of the vaccine. To the extent that the biomarker is sufficiently predictive in populations of interest (despite not being fully mediating), the conclusions of work presented here could be expected to hold even without adding vaccination status as an additional predictor. While it may be reasonable in some cases to expect many of the conclusions of the work presented here to hold even with that additional predictor, future work should evaluate the implications of this modification on accuracy and precision of VE estimation and on covariate effect assessment.
Conclusions
Inclusion of CoP data in logistic regression models provides a new method to identify baseline covariates affecting VE, offering a way to determine, sensitively and specifically, the impact of demographic, clinical, and other subjectspecific characteristics on protective efficacy of a vaccine. This approach has potential to increase the precision of efficacy estimation, thus enabling increased precision and/or power in clinical trials, with concomitant enhancement of the decisions they inform.
Availability of data and materials
The datasets and the code supporting the conclusions of this article are available in the GitHub repository, https://github.com/MSDLLCpapers/simvaxpmx. A package implementing the presented methods is available in the Comprehensive R Archive Network (CRAN) repository, https://cran.rproject.org/web/packages/vaxpmx/index.html.
Abbreviations
 AIC:

Akaike Information Criterion
 AUC:

Area under the ROC curve
 CI:

Confidence interval
 CoP:

Correlate of protection
 CoR:

Correlate of risk
 PoD:

Probability of disease
 PPV:

Positive predictive value
 ROC:

Receiver operating characteristic
 RR:

Relative risk
 VE:

Vaccine efficacy
References
Halloran ME, Longini IM, Struchiner CJ. Design and Analysis of Vaccine Studies. New York: Springer; 2010. p. 1–18.
Tartof SY, Slezak JM, Fischer H, et al. Effectiveness of mRNA BNT162b2 COVID19 vaccine up to 6 months in a large integrated health system in the USA: a retrospective cohort study. Lancet. 2021;398:1407–16.
PichéRenaud PP, Swayze S, Buchan SA, et al. COVID19 Vaccine Effectiveness Against Omicron Infection and Hospitalization. Pediatrics. 2023;151(4):e2022059513.
Blanquart F, Abad C, Ambroise J, et al. Temporal, age, and geographical variation in vaccine efficacy against infection by the Delta and Omicron variants in the community in France, December 2021 to March 2022. Int J Infect Dis. 2023;133:89–96.
Deputy NP, Deckert J, Chard AN, et al. Vaccine Effectiveness of JYNNEOS against Mpox Disease in the United States. New Engl J Med. 2023;388:2434–43.
Plotkin SA, Orenstein WA, Offit PA, Edwards KM. Plotkin’s Vaccines. Amsterdam: Elsevier; 2017.
Prentice RL. Surrogate endpoints in clinical trials: definition and operational criteria. Stat Med. 1989;8(4):431–40.
Black S, Nicolay U, Vesikari T, et al. Hemagglutination inhibition antibody titers as a correlate of protection for inactivated influenza vaccines in children. Pediatr Infect Dis J. 2011;30:1081–5.
Habib MA, Prymula R, Carryn S, et al. Correlation of protection against varicella in a randomized Phase III varicellacontaining vaccine efficacy trial in healthy infants. Vaccine. 2021;39:3445–54.
Salje H, Alera MT, Chua MN, et al. Evaluation of the extended efficacy of the Dengvaxia vaccine against symptomatic and subclinical dengue infection. Nat Med. 2021;27:1395–400.
Danier J, Callegaro A, Soni J, et al. Association Between Hemagglutination Inhibition Antibody Titers and Protection Against ReverseTranscription Polymerase Chain ReactionConfirmed Influenza Illness in Children 6–35 Months of Age: Statistical Evaluation of a Correlate of Protection. Open Forum Infect Dis. 2022;9(2):ofab477.
Dudasova J, Laube R, Valiathan C, et al. A method to estimate probability of disease and vaccine efficacy from clinical trial immunogenicity data. NPJ Vaccines. 2021;6(1):133.
Genser B, Cooper PJ, Yazdanbakhsh M, Barreto ML, Rodrigues LC. A guide to modern statistical analysis of immunological data. BMC Immunol. 2007;8:27.
Xu XS, Yuan M, Zhu H, et al. Full covariate modelling approach in population pharmacokinetics: understanding the underlying hypothesis tests and implications of multiplicity. Br J Clin Pharmacol. 2018;84(7):1525–34.
Dunning AJ. A model for immunological correlates of protection. Stat Med. 2006;25(9):1485–97.
Coudeville L, Andre P, Bailleux F, Weber F, Plotkin SA. A new approach to estimate vaccine efficacy based on immunogenicity data applied to influenza vaccines administered by the intradermal or intramuscular routes. Hum Vaccin. 2010;6(10):841–8.
Callegaro A, Tibaldi F. Assessing correlates of protection in vaccine trials: statistical solutions in the context of high vaccine efficacy. BMC Med Res Methodol. 2019;19:47.
Dunning AJ, Kensler J, Coudeville L, Bailleux F. Some extensions in continuous models for immunological correlates of protection. BMC Med Res Methodol. 2015;15:107.
Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M. Using the whole cohort in the analysis of casecohort data. Am J Epidemiol. 2009;169(11):1398–405.
Noma H, Tanaka S. Analysis of casecohort designs with binary outcomes: Improving efficiency using wholecohort auxiliary information. Stat Methods Med Res. 2017;26(2):691–706.
Acknowledgements
The authors gratefully acknowledge the support, review, and input from Ferdous Gheyas, Larissa Wenning, Julie A. Stone, BethAnn G. Coller, and Alexander D. Becker.
Author information
Julie Dudášová: MSD Czech Republic, Svornosti 3321/2, 150 00 Prague 5, Czech Republic.
Zdeněk Valenta: Dept. of Statistical Modelling, Institute of Computer Science of the Czech Academy of Sciences, Pod Vodárenskou věží 271/2 182 00 Prague 8, Czech Republic.
Jeffrey R. Sachs: Merck & Co., Inc., 126 East Lincoln Avenue, P.O. Box 2000, Rahway, NJ 07065, USA.
Funding
This work was funded by Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA.
Author information
Authors and Affiliations
Contributions
Conceptualization: J.D.; Methodology: J.D., Z.V, and J.R.S.; Software: J.D.; Validation: J.D.; Investigation: J.D.; Writing, original draft preparation: J.D.; Writing, review, and editing: J.D., Z.V., and J.R.S.; Visualization, J.D.; Supervision: Z.V. and J.R.S.; Project administration and funding acquisition: J.R.S. The authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
J.D. is an employee of MSD Czech Republic s.r.o., Prague, Czech Republic, and J.R.S. is an employee of Merck Sharp & Dohme LLC, subsidiaries of Merck & Co., Inc., Rahway, NJ, USA. J.R.S. owns stock in Merck & Co., Inc., Rahway, NJ, USA. Z.V. has no conflict of interest to declare.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Dudášová, J., Valenta, Z. & Sachs, J.R. Elucidating vaccine efficacy using a correlate of protection, demographics, and logistic regression. BMC Med Res Methodol 24, 101 (2024). https://doi.org/10.1186/s12874024021973
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874024021973