 Technical advance
 Open Access
 Published:
Group testing can improve the costefficiency of prospectiveretrospective biomarker studies
BMC Medical Research Methodology volume 21, Article number: 55 (2021)
Abstract
Background
Cancer treatment is increasingly dependent on biomarkers for prognostication and treatment selection. Potential biomarkers are frequently evaluated in prospectiveretrospective studies in which biomarkers are measured retrospectively on archived specimens after completion of prospective clinical trials. In light of the high costs of some assays, random sampling designs have been proposed that measure biomarkers for a random subsample of subjects selected on the basis of observed outcome and possibly other variables. Compared with a standard design that measures biomarkers on all subjects, a random sampling design can be costefficient in the sense of reducing the cost of the study substantially while achieving a reasonable level of precision.
Methods
For a biomarker that indicates the presence of some molecular alteration (e.g., mutation in a gene), we explore the use of a group testing strategy, which involves physically pooling specimens across subjects and assaying pooled samples for the presence of the molecular alteration of interest, for further improvement in costefficiency beyond random sampling. We propose simple and general approaches to estimating the prognostic and predictive values of biomarkers with group testing, and conduct simulation studies to validate the proposed estimation procedures and to assess the costefficiency of the group testing design in comparison to the standard and random sampling designs.
Results
Simulation results show that the proposed estimation procedures perform well in realistic settings and that a group testing design can have considerably higher costefficiency than a random sampling design.
Conclusions
Group testing can be used to improve the costefficiency of biomarker studies.
Background
Biomarkers and biomarker studies
A biomarker is “a characteristic that is measured as an indicator of normal biological processes, pathogenic processes, or responses to an exposure or intervention” [1]. Biomarkers play increasingly important roles in the treatment of cancer and other disease conditions [2,3,4]. A biomarker is said to be prognostic if it is associated with clinical outcomes in the absence of therapy or in the setting of some therapy that most patients are likely to receive (e.g., standard of care). A biomarker is said to be predictive if it is related to the effect of one treatment versus another. A predictive biomarker must be prognostic for at least one of the two treatments being compared. On the other hand, a prognostic biomarker does not need to be predictive. Both types of biomarker are of great interest in contemporary clinical research and practice.
The prognostic or predictive value of a biomarker can be evaluated in a variety of study settings with varying levels of evidence [5]. The highest level of evidence is attained by a fully prospective clinical study in which patients are prospectively enrolled, treated, and followed for clinical outcomes, with specimens collected at baseline and assayed in real time for marker values. Such a study can be highly expensive and may take many years to complete. By the time the study is completed, the biomarker may have become obsolete. A practical alternative to this fully prospective approach is a twophase prospectiveretrospective (PR) clinical study which differs from a fully prospective study in that baseline specimens are archived after collection and assayed later for specific biomarkers [5]. This PR approach can save a great deal of time for biomarker researchers by allowing them to focus their efforts on assaying archived specimens from completed clinical trials. This approach has been used successfully to validate KRAS as a predictive biomarker in colorectal cancer [6, 7] and is now commonly adopted for biomarker studies [5, 8].
PR studies are timeefficient but can be rather costly due to the high costs of some molecular assays such as next generation sequencing [9]. To improve the costefficiency of PR studies, random sampling (RS) designs have been proposed that measure biomarkers for a random subsample of subjects selected on the basis of observed outcome and possibly other variables. Examples of RS designs include the casecohort and nested casecontrol designs [10, 11]. If the outcome of interest is an infrequent event, it is generally advisable to oversample cases (i.e., subjects who had the event) for biomarker measurement. The RS design has the potential to be costefficient in the sense of attaining a higher level of precision on a perassay basis than the standard design (for example, using 50% of the assays to produce 60% of the precision as compared to the standard design). On the other hand, it does not make use of all available specimens, raising questions about the possibility of further improvement.
Group testing
In this article, we explore the use of group testing (GT) to further improve the costefficiency of PR studies (beyond the RS design) when the biomarker of interest indicates the presence of some molecular alteration (e.g., mutation in a gene). GT refers to the practice of physically pooling specimens across subjects and assaying pooled samples for the presence of the molecular alteration in the pool. For an assay with negligible error, a positive test result for a pooled sample would indicate that the molecular alteration is present in one or more subjects in the pool, while a negative test result would indicate the contrary. Since its introduction by Dorfman [12] as a costefficient way of screening for syphilis, GT has been applied to many different areas of biomedical research including virology [13,14,15], genetics [16,17,18,19], drug development [20], and most recently Covid19 [21,22,23]. In particular, the feasibility and performance of GT for detecting mutations in tumor have been investigated with promising results [17,18,19]. Possible motivations for GT include costefficiency, statistical efficiency [24, 25], limited availability of specimens, and confidentiality concerns [26]. Some authors have considered the use of GT in retrospective epidemiologic studies [25, 27], but the potential utility of GT in PR biomarker studies seems largely unnoticed.
This commentary provides a statistical investigation of the potential utility of GT to improve the costefficiency of PR biomarker studies beyond that achieved by the RS design. Efficiency comparisons will be made with or without adjusting for the number of assays required. We will consider a simple yet common situation with a dichotomous outcome, where GT is performed on a dichotomous biomarker in an outcomedependent fashion, under the assumption that assay error is negligible. We extend the methods in References [25, 27] to this situation and develop simple procedures for estimating the prognostic or predictive value of a biomarker measured by GT. The main ideas are described in the text with technical details provided in an online supplement. Simulation studies are conducted to evaluate the proposed estimation procedures as well as the statistical efficiency and costefficiency of the GT design in comparison to the standard design and the RS design.
Study setting
The ideas will be illustrated using the ECOGACRIN Cancer Research Group trial E1900 (NCT00049517), a randomized clinical trial comparing highdose (HD) daunorubicin (90 mg/m^{2}) with standarddose (SD) daunorubicin (45 mg/m^{2}) for patients 17–60 years of age with de novo untreated acute myeloid leukemia [28]. A total of 657 patients were randomized in a 1:1 ratio and followed for a median of 80.1 months. The trial demonstrated significant benefits of HD versus SD with respect to overall survival (hazard ratio 0.74; 95% CI 0.61–0.89; P = 0.001) and complete remission (odds ratio 1.79; 95% CI 1.27–2.52; P = 0.001). For illustration, we will use complete remission as the dichotomous outcome of interest, even though it was not the primary outcome of the trial. The trial had several biomarkers of interest, including the FLT3ITD internal tandem duplication variant and mutation in DNMT3A, both of which were present in 24% of the trial participants. Figure 1 shows the observed complete remission rates for HD and SD in each biomarker subgroup. Both biomarkers were assayed using PCR amplification and bidirectional Sanger sequencing [29]. Because such assays have nearperfect sensitivity and specificity [30], we will focus on perfect assays in the main text and present estimation methods and simulation results for lessthanperfect assays in the online supplement.
Methods
Evaluating a prognostic biomarker
Evaluation of a prognostic biomarker, say X, usually focuses on its association with an outcome variable, say Y, for a given treatment, which is fixed in this section and therefore suppressed from the notation. We assume that X is a binary indicator of some molecular alteration (e.g., mutation); so X = 1 if the alteration is present and 0 otherwise. A patient is said to be “markerpositive” if X = 1 and “markernegative” if X = 0. For simplicity, we assume here that Y is also binary (0 or 1) with Y = 1 representing treatment response (e.g., complete remission). A patient with Y = 1 is said to be a responder. In this setting, the association between X and Y may be assessed by comparing the markerspecific response rates p_{1} and p_{0}, where p_{x} = P(Y = 1 X = x), x = 0, 1. Common measures of association include the logodds ratio log[p_{1}(1 − p_{0})/{p_{0}(1 − p_{1})}], the logratio log(p_{1}/p_{0}), and the difference p_{1} − p_{0} [31, 32]. Each of these can be written as g(p_{1}) − g(p_{0}), where g is, respectively, the logit function, the log function, or the identity function.
Suppose a clinical trial has been completed to yield outcome data for a random sample of n subjects (either in a onearm trial or in one arm of a multiarm trial), together with archived specimens available for biomarker studies. As shown in Fig. 2a, a standard PR study of the biomarker X would entail assaying all specimens of individual subjects and measuring the biomarker for each individual subject. From such data it is straightforward to estimate p_{1} (p_{0}) as the proportion of responders among the markerpositive (negative) subjects, which can then be substituted into any measure of association. For illustration, the upper portion of Table 1 shows point estimates and standard errors of the three association measures mentioned earlier for the two biomarkers (FLT3ITD and DNMT3A) in the two treatment groups (HD and SD) of the E1900 trial.
Under the RS design, subjects are selected randomly, typically in an outcomedependent manner, for measurement of X, as illustrated in Fig. 2b. Let n_{1} (n_{0}) denote the total number of responders (nonresponders) in the trial, and let m_{1} (m_{0}) denote the number of responders (nonresponders) to be selected for measurement of X. If treatment response is rare (i.e., n_{1} is very small), it is common to select all responders (i.e., m_{1} = n_{1}) and a comparable number of nonresponders. Similar considerations apply to the opposite situation where treatment nonresponse is rare and n_{0} is very small. The RS design permits direct estimation of the prevalence of markerpositives among responders and nonresponders, formally defined as the conditional probabilities q_{y} = P(X = 1 Y = y), y = 0, 1. Specifically, q_{1} (q_{0}) is estimated by the proportion of markerpositives among the m_{1} responders (m_{0} nonresponders) selected for biomarker measurement. These estimates alone are sufficient for estimating the odds ratio for X and Y. For other measures of association, Bayes’ theorem can be used to obtain estimates of p_{1} and p_{0}, which can then be substituted into any measure of association. These and other technical details are provided in the online supplement.
The GT design is a generalization of the RS design which allows more subjects to be assayed, though not necessarily on an individual basis. Figure 2c gives an example GT design for the same PR study with the same numbers of assays for responders (m_{1}) and nonresponders (m_{0}) as required by the RS design in Fig. 2b. Compared to the RS design, the GT design allows assaying twice as many responders and nonresponders with the potential to produce more information. In general, the GT design is a stratified (by outcome) pooling design, and the pool sizes (i.e., number of subjects in a pool) for responders and nonresponders may or may not be the same. If the pool size is equal to 1 in both strata, the GT design reduces to the RS design. In each stratum of the GT design, the marker prevalence q_{y} can be estimated with pooled assay data using a maximum likelihood approach [20]. These estimates can be used in the same manner as in the RS design to estimate any measure of association between X and Y.
These designs are compared in a simulation study mimicking the E1900 trial. A separate simulation experiment is conducted for each combination of treatment group (HD or SD) and biomarker (FLT3ITD or DNMT3A). Each experiment consists of 10,000 replicate trials in which T is fixed, X is generated randomly with P(X = 1) ≈ 0.24 (observed proportion), Y is generated conditionally on (T, X) according to the observed proportions in Fig. 1, and the sample size is the same as the actual size of the treatment group (327 for HD; 330 for SD). Each simulated trial is used to assess the prognostic value of X under the standard, RS and GT designs. The RS design is implemented in two versions which assay approximately one half (RS2) or one third (RS3) of the trial participants and which attempt to assay equal numbers of responders and nonresponders to the extent possible. Accordingly, the GT design is also implemented in two versions which match the RS designs in the number of assays and which attempt to use a group size of 2 (GT2) or 3 (GT3) to the extent possible.
Evaluating a predictive biomarker
We now consider the problem of evaluating a predictive biomarker for choosing between an experimental treatment (T = 1) and a standard treatment (T = 0) in a randomized clinical trial. Let X and Y be defined as in the last section and note that T is independent of X by randomization. The predictive value of X can be quantified by the T X interaction in a regression model relating Y to (T, X) . For a binary Y, such a regression model may be specified as
where g is a specified link function which is commonly chosen to be the logit, log or identity function. For any link function, the interaction coefficient β_{TX} can be interpreted as a “difference in difference”:
where p_{tx} = P(Y = 1 T = t, X = x), t, x = 0, 1.
Suppose a randomized clinical trial has been completed to produce treatment and outcome data on a random sample of n subjects, together with archived specimens available for measurement of X. A standard PR biomarker study would simply measure X for each individual subject in the trial, which requires a total of n assays. The resulting data can be used to fit model (1) and estimate β_{TX} using standard software. Alternatively, one can estimate each p_{tx} as the proportion of responders among subjects in the T = t treatment group with marker status X = x, and substitute these estimates into Eq. (2) to estimate β_{TX}. These two approaches are generally equivalent. The lower portion of Table 1 shows the results (point estimates and standard errors) of estimating β_{TX} for the aforementioned three link functions in the E1900 trial.
The RS design involves random selection of subjects for measurement of X, which may be stratified on treatment and outcome; this can be illustrated with two copies of Fig. 2b, one for each treatment group. Let n_{ty} denote the total number of subjects available in the (T = t, Y = y) stratum, and let m_{ty} denote the number of subjects to be selected for measurement of X in the same stratum. Conventional wisdom suggests that the m_{ty} ’s should be made comparable to each other, which may require oversampling subjects in small strata. The RS design permits direct estimation of the prevalence of markerpositives in each treatmentoutcome stratum, formally defined as the conditional probabilities q_{ty} = P(X = 1 T = t, Y = y), t, y = 0, 1. Specifically, each q_{ty} is estimated by the proportion of markerpositives among the m_{ty} subjects in the (T = t, Y = y) stratum who are selected for biomarker measurement. For the logit link, these estimates suffice for estimating β_{TX}. For other link functions, Bayes’ theorem can be used to combine these estimates of q_{ty} ’s with the fully observed treatment and outcome data to estimate all p_{tx} ’s and hence β_{TX}.
A GT design in this context is essentially a stratified (by treatment and outcome) pooling design and can be thought of as two copies of Fig. 2c, one for each treatment group. Compared with an RS design with the same number of assays (m_{ty}) in each treatmentoutcome stratum, a GT design with pool size 2 allows twice as many subjects to be assayed (though not on an individual basis) in an attempt to produce more information. In general, a GT design may prescribe pooling in some or all treatmentoutcome strata, and the pool size may or may not vary across strata. The RS design can be seen as a special type of GT design in which the pool size is equal to 1 in each stratum. In each treatmentoutcome stratum of a general GT design, the marker prevalence q_{ty} can be estimated with pooled assay data using a maximum likelihood approach [20]. These estimates can be used in the same manner as in the RS design to estimate β_{TX} for any link function.
These designs are compared via simulation in the setting of the E1900 trial, with a separate simulation experiment for each biomarker (FLT3ITD or DNMT3A). Each experiment consists of 10,000 replicate trials in which T and X are independently generated with P(T = 1) = 0.5 and P(X = 1) ≈ 0.24, Y is generated conditionally on (T, X) according to the observed proportions in Fig. 1, and the sample size is the same as the actual size of the trial (657). Each simulated trial is used to assess the predictive value of X under the standard, RS and GT designs. The RS design is implemented in two versions which assay approximately one half (RS2) or one third (RS3) of the trial participants and which attempt to perform the same number of assays in each stratum defined by (T, Y). Accordingly, the GT design is also implemented in two versions which match the RS designs in the number of assays and which attempt to use a group size of 2 (GT2) or 3 (GT3) in each stratum.
Measures of performance
The performance of various designs is assessed in terms of relative efficiency and relative costefficiency, both of which are relative to the standard design, for estimating the association/interaction measure of interest. The relative efficiency of a nonstandard design is defined as the ratio of the estimation variance for the standard design to that for the nonstandard design in question. A GT2 design with a relative efficiency of 0.85, for example, retains 85% of the information (i.e., precision) with half of the assays required by the standard design. The relative costefficiency of a nonstandard design is defined as its relative efficiency multiplied by the ratio of the number of assays for the standard design to that for the nonstandard design in question. For example, a GT3 design with a relative costefficiency of 2 yields twice as much information as does the standard design on a perassay basis.
Choosing a pool size
Implementing the GT design requires choosing a pool size for each pooling stratum (based on outcome and possibly treatment). While we do not attempt to answer this question in full in this article, we provide some statistical insights here on how to choose a pool size to maximize costefficiency. As we explain in the online supplement, the statistical efficiency for estimating an association/interaction measure depends on the amount of available information (known in statistics as Fisher information) about the prevalence of the biomarker in each pooling stratum. Assuming that a fixed number of assays has been allocated to a given stratum with sufficient subjects/samples for all realistic pool sizes, the question then becomes how to choose a pool size to maximize the Fisher information about marker prevalence in a single pooled assay result. This perassay Fisher information can be calculated analytically as a function of the true prevalence for each candidate pool size. This information, together with a preliminary estimate of the stratumspecific marker prevalence, provides a starting point for choosing a stratumspecific pool size, which can then be validated or revised on the basis of other considerations such as number of subjects, sample availability, pooling feasibility, and assay performance.
Results
Evaluating a prognostic biomarker
Simulation results for evaluating a prognostic biomarker are shown in Table 2. As expected, all five designs yield nearly unbiased estimates of association measures (results not shown). For the RS and GT designs, Table 2 presents simulation results of relative efficiency and relative costefficiency. The RS and GT designs are expected to have relative efficiency less than 1 because they use fewer assays than the standard design. Comparing RS and GT designs with the same number of assays, the GT design is clearly and substantially more efficient than the RS design. For studying DNMT3A in the SD group, the GT3 design achieves 70–71% of the precision level of the standard design while requiring only one third of the assays, and is more than twice as efficient as the RS3 design with the same number of assays. The other scenarios follow the same pattern with slightly different numbers. In Table 2, the relative costefficiency ranges between 0.94 and 1.27 for the RS designs, indicating that the RS designs are either similar or superior to the standard design in costefficiency. It is worth noting that the GT designs attain much higher levels of relative costefficiency (1.65–1.79 for GT2; 1.94–2.40 for GT3). In summary, the results in Table 2 indicate that RS and GT designs are usually costefficient as compared to the standard design, and that GT designs can achieve much higher costefficiency than RS designs.
Evaluating a predictive biomarker
Simulation results for evaluating a predictive biomarker are shown in Table 3. As in the case of evaluating a prognostic biomarker, estimation bias is negligible for each interaction measure in each design (results not shown). Therefore, our comparison of designs is focused on (cost)efficiency. For the RS and GT designs, Table 3 presents simulation results of relative efficiency and relative costefficiency. In this setting, the RS designs are largely similar in costefficiency to the standard design, with relative costefficiency ranging from 0.89 to 1.13. In contrast, the GT designs are highly competitive in terms of relative costefficiency (1.72–1.75 for GT2; 2.10–2.25 for GT3). Thus, the simulation results in Table 3 demonstrate that GT designs are much more costefficient than the standard and RS designs for estimating an interaction measure. This can be an important advantage when the cost of a biomarker study is driven by the cost of assays.
Choosing a pool size
Figure 3 shows the perassay Fisher information as a function of the true prevalence for four different pool sizes (1 through 4); the specific formula for any pool size is provided in the online supplement. Under the previously stated assumptions, Fig. 3 suggests that the optimal pool size among the four pool sizes with maximal costefficiency is 1 (i.e., no pooling) if the true prevalence of the biomarker is above 0.67 in the given stratum, 2 if the prevalence is between 0.48 and 0.66, 3 if the prevalence is between 0.37 and 0.47, and 4 or more if the prevalence is below 0.37.
Discussion
To the best of our knowledge, this work is the first attempt to explore the use of GT in PR biomarker studies. Our simulations and theoretical calculations have demonstrated that the GT design can be highly costefficient compared to both the standard design and the RS design, at least in some situations. Higher costefficiency translates into more efficient use of resources, which is desirable even as assay costs decline owing to technological advances.
We have assumed in the main text that assay error is negligible. While this assumption may be reasonable for some assays (such as the PCRbased assay employed in the E1900 trial), many assays have lessthanperfect accuracy, which should be incorporated in statistical estimation. In the online supplement, we provide estimation methods to account for possible misclassification and report an additional simulation study on the performance of GT designs when the assay is subject to misclassification. The additional simulation results indicate that GT designs generally achieve higher costefficiency than the standard and RS designs, consistent with the results in Tables 2 and 3.
An additional complication in the GT design is the wellknown dilution effect, which may result in decreased sensitivity for pooled samples [33]. The magnitude of the dilution effect depends on assay specifics and may be expected to increase with pool size [34]. This issue has been considered by several authors in different contexts. For example, McMahan et al. [35] proposed a mechanistic modeling approach in which pool testing error rates are estimated from a rich set of lowlevel assay data; Hung and Swallow [36] and Zhang et al. [37] postulate that the pool testing error rates are known functions of the pool size and the number of diseased individuals in the pool. Further research is warranted on how to incorporate the dilution effect in GT designs of PR biomarker studies.
We have assumed in this article that the biomarker is a binary indicator of the presence of some molecular alteration. If this is not the case, the relationship between a pooled assay result and individual assay results may become more complicated and more difficult to deal with in statistical estimation. For some continuous biomarkers, a pooled assay result may be plausibly assumed to be a (weighted) average of individual assay results, possibly with a random measurement error [38]. Novel statistical methods are needed to analyze GT designs with biomarkers that do not follow the poolindividual relationship assumed here.
Other areas of future research include development of statistical methods for GT designs with nonbinary outcomes such as censored survival outcomes, which are commonly encountered in oncology trials, and optimization of GT designs for various combinations of outcomes and biomarkers.
Conclusions
It has been demonstrated that group testing can substantially improve the costefficiency of prospectiveretrospective biomarker studies. Further research is warranted to investigate the performance of the GT design in a wider range of realworld applications and to extend the statistical methods developed here to a greater variety of estimation problems.
Availability of data and materials
We do not have the permission to distribute the E1900 trial data used in this article. However, the summary statistics needed to reproduce our simulation results are shown in Fig. 1.
Abbreviations
 GT:

Group testing
 HD:

Highdose
 PR:

Prospectiveretrospective
 RS:

Random sampling
 SD:

Standarddose
References
 1.
FDANIH Biomarker Working Group. BEST (Biomarkers, EndpointS, and other Tools) resource. Silver Spring: US FDA; 2016.
 2.
Kalia M. Biomarkers for personalized oncology: recent advances and future challenges. Metabolism. 2015;64(3):S16–21.
 3.
Badve S, Kumar GL. Predictive biomarkers in oncology: applications in precision medicine. Switzerland: Springer; 2018.
 4.
Rabbee N. Biomarker analysis in clinical trials with R. Boca Raton: Chapman & Hall/CRC; 2020.
 5.
Simon R, Paik S, Hayes DF. Use of archived specimens in evaluation of prognostic and predictive biomarkers. J Natl Cancer Inst. 2009;101(21):1446–52.
 6.
Amado RG, Wolf M, Peeters M, et al. Wildtype KRAS is required for panitumumab efficacy in patients with metastatic colorectal cancer. J Clin Oncol. 2008;26(10):1626–34.
 7.
Karapetis CS, KhambataFord S, Jonker DJ, et al. Kras mutations and benefit from cetuximab in advanced colorectal cancer. N Engl J Med. 2008;359(17):1757–65.
 8.
Matsui S, Buyse M, Simon R. Design and analysis of clinical trials for predictive medicine. Boca Raton: Chapman and Hall/CRC; 2015.
 9.
Marino P, Touzani R, Perrier L, et al. Cost of cancer diagnosis using nextgeneration sequencing targeted gene panels in routine practice: a nationwide French study. Eur J Hum Genet. 2018;26(3):314–23.
 10.
Prentice R. A casecohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73(1):1–11.
 11.
Langholz B, Thomas D. Nested casecontrol and casecohort methods of sampling from a cohort: a critical comparison. Am J Epidemiol. 1990;131(1):169–76.
 12.
Dorfman R. The detection of defective members of large populations. Ann Math Stat. 1943;14(4):436–40.
 13.
Emmanuel JC, Bassett MT, Smith HJ, Jacobs JA. Pooling of sera for human immunodeficiency virus (HIV) testing: an economic method for use in developing countries. Am J Clin Pathol. 1988;41(5):582–5.
 14.
Cardoso M, Koerner K, Kubanek B. Minipool screening by nucleic acid testing for hepatitis B virus, hepatitis C virus, and HIV: preliminary results. Transfusion. 1998;38(10):905–7.
 15.
Van TT, Miller J, Warchauer DM, et al. Pooling nasopharyngeal/throat swab specimens to increase testing capacity for influenza viruses by PCR. J Clin Microbiol. 2012;50(3):891–6.
 16.
Gastwirth JL. The efficiency of pooling in the detection of rare mutations. Am J Hum Genet. 2000;67(4):1036–9.
 17.
Pearson JV, Huentelman MJ, Halperin RF, et al. Identification of the genetic basis for complex disorders by use of poolingbased genomewide singlenucleotidepolymorphism association studies. Am J Hum Genet. 2007;80:126–39.
 18.
Futschik A, Schlotterer C. The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics. 2010;186:207–18.
 19.
Anand S, Mangano E, Barizzone N, et al. Next generation sequencing of pooled samples: guideline for variants’ filtering. Sci Rep. 2016;6:33735.
 20.
Xie M, Tatsuoka K, Sacks J, Young SS. Group testing with blockers and synergism. J Am Stat Assoc. 2001;96(453):92–102.
 21.
Eberhardt JN, Breuckmann NP, Eberhardt CS. Multistage group testing improves efficiency of largescale COVID19 screening. J Clin Virol. 2020;128:104382.
 22.
Ellenberg J. Five people. One test. This is how you get there: New York Times; 2020. https://www.nytimes.com/2020/05/07/opinion/coronavirusgrouptesting.html
 23.
Broadfoot M. Coronavirus test shortages trigger a new strategy: group screening: Scientific American; 2020. https://www.scientificamerican.com/article/coronavirustestshortagestrigger anewstrategygroupscreening2/
 24.
Tu XM, Litvak E, Pagano M. On the informativeness and accuracy of pooled testing in estimating prevalence of a rare disease: application to HIV screening. Biometrika. 1995;82(2):287–97.
 25.
Liu A, Liu C, Zhang Z, Albert PS. Optimality of group testing in the presence of misclassification. Biometrika. 2012;99(1):245–51.
 26.
Gastwirth JL, Hammick PA. Estimation of the prevalence of a rare disease, preserving the anonymity of the subjects by group testing: application to estimating the prevalence of AIDS antibodies in blood donors. J Stat Plan Inference. 1989;22(1):15–27.
 27.
Zhang Z, Liu A, Lyles RH, Mukherjee B. Logistic regression analysis of biomarker data subject to pooling and dichotomization. Stat Med. 2012;31(22):2473–84.
 28.
Luskin MR, Lee JW, Fernandez HF, et al. Benefit of highdose daunorubicin in AML induction extends across cytogenetic and molecular groups. Blood. 2016;127(12):1551–8.
 29.
Patel JP, Gönen M, Figueroa ME, et al. Prognostic relevance of integrated genetic profiling in acute myeloid leukemia. N Engl J Med. 2012;366(12):1079–89.
 30.
Rosenthal SH, Gerasimova A, Ma C, et al. Analytical validation and performance characteristics of a 48gene nextgeneration sequencing panel for detecting potentially actionable genomic alterations in myeloid neoplasm. bioRxiv. 2020. https://doi.org/10.1101/2020.11.30.403634.
 31.
Agresti A. Categorical data analysis. 3rd ed. Hoboken: Wiley; 2013.
 32.
Lui KJ. Binary data analysis of randomized clinical trials with noncompliance. New York: Wiley; 2011.
 33.
Cutler DJ, Jensen JD. Commentary: to pool, or not to pool? Genetics. 2010;186:41–3.
 34.
Zhang Z, Liu C, Kim S, Liu A. Prevalence estimation subject to misclassification: the missubstitution bias and some remedies. Stat Med. 2014;33(25):4482–500.
 35.
McMahan CS, Tebbs JM, Bilder CR. Regression models for group testing data with pool dilution effects. Biostatistics. 2013;14(2):284–98.
 36.
Hung M, Swallow W. Robustness of group testing in the estimation of proportions. Biometrics. 1999;55:231–7.
 37.
Zhang W, Liu A, Li Q, Albert PS. Nonparametric estimation of distributions and diagnostic accuracy based on grouptested results with differential misclassification. Biometrics. 2020;76(4):1147–56.
 38.
Zhang Z, Albert PS. Binary regression analysis with pooled exposure measurements: a regression calibration approach. Biometrics. 2011;67(2):636–45.
Acknowledgements
This research was supported in part by the intramural research program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health. This manuscript was prepared using data from Dataset NCT00049517D1 from the NCTN/NCORP Data Archive of the National Cancer Institute’s (NCI’s) National Clinical Trials Network (NCTN). Data were originally collected from clinical trial NCT number NCT00049517 “A Phase III Trial in Adult Acute Myeloid Leukemia: Daunorubicin DoseIntensification Prior to RiskAllocated Autologous Stem Cell Transplantation”. All analyses and conclusions in this manuscript are the sole responsibility of the authors and do not necessarily reflect the opinions or views of the clinical trial investigators, the NCTN, the NCORP or the NCI.
Funding
Research of W. Zhang was partially supported by the National Natural Science Foundation of China (Grant No. 12001522), which played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. Open Access funding provided by the National Institutes of Health (NIH).
Author information
Affiliations
Contributions
AL conceived and initiated this research. ZZ and WZ developed statistical methodology. WZ implemented the methodology and conducted simulation studies. ZZ wrote the first draft of the manuscript. All authors provided comments on early versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Permission to use the E1900 trial data was granted by the National Cancer Institute’s National Clinical Trials Network.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Zhang, W., Zhang, Z., Krushkal, J. et al. Group testing can improve the costefficiency of prospectiveretrospective biomarker studies. BMC Med Res Methodol 21, 55 (2021). https://doi.org/10.1186/s12874021012394
Received:
Accepted:
Published:
Keywords
 Biomarker study design
 Costefficiency
 Group testing
 Pooling
 Twophase sampling