Skip to main content

Group testing can improve the cost-efficiency of prospective-retrospective biomarker studies

Abstract

Background

Cancer treatment is increasingly dependent on biomarkers for prognostication and treatment selection. Potential biomarkers are frequently evaluated in prospective-retrospective studies in which biomarkers are measured retrospectively on archived specimens after completion of prospective clinical trials. In light of the high costs of some assays, random sampling designs have been proposed that measure biomarkers for a random sub-sample of subjects selected on the basis of observed outcome and possibly other variables. Compared with a standard design that measures biomarkers on all subjects, a random sampling design can be cost-efficient in the sense of reducing the cost of the study substantially while achieving a reasonable level of precision.

Methods

For a biomarker that indicates the presence of some molecular alteration (e.g., mutation in a gene), we explore the use of a group testing strategy, which involves physically pooling specimens across subjects and assaying pooled samples for the presence of the molecular alteration of interest, for further improvement in cost-efficiency beyond random sampling. We propose simple and general approaches to estimating the prognostic and predictive values of biomarkers with group testing, and conduct simulation studies to validate the proposed estimation procedures and to assess the cost-efficiency of the group testing design in comparison to the standard and random sampling designs.

Results

Simulation results show that the proposed estimation procedures perform well in realistic settings and that a group testing design can have considerably higher cost-efficiency than a random sampling design.

Conclusions

Group testing can be used to improve the cost-efficiency of biomarker studies.

Peer Review reports

Background

Biomarkers and biomarker studies

A biomarker is “a characteristic that is measured as an indicator of normal biological processes, pathogenic processes, or responses to an exposure or intervention” [1]. Biomarkers play increasingly important roles in the treatment of cancer and other disease conditions [2,3,4]. A biomarker is said to be prognostic if it is associated with clinical outcomes in the absence of therapy or in the setting of some therapy that most patients are likely to receive (e.g., standard of care). A biomarker is said to be predictive if it is related to the effect of one treatment versus another. A predictive biomarker must be prognostic for at least one of the two treatments being compared. On the other hand, a prognostic biomarker does not need to be predictive. Both types of biomarker are of great interest in contemporary clinical research and practice.

The prognostic or predictive value of a biomarker can be evaluated in a variety of study settings with varying levels of evidence [5]. The highest level of evidence is attained by a fully prospective clinical study in which patients are prospectively enrolled, treated, and followed for clinical outcomes, with specimens collected at baseline and assayed in real time for marker values. Such a study can be highly expensive and may take many years to complete. By the time the study is completed, the biomarker may have become obsolete. A practical alternative to this fully prospective approach is a two-phase prospective-retrospective (P-R) clinical study which differs from a fully prospective study in that baseline specimens are archived after collection and assayed later for specific biomarkers [5]. This P-R approach can save a great deal of time for biomarker researchers by allowing them to focus their efforts on assaying archived specimens from completed clinical trials. This approach has been used successfully to validate KRAS as a predictive biomarker in colorectal cancer [6, 7] and is now commonly adopted for biomarker studies [5, 8].

P-R studies are time-efficient but can be rather costly due to the high costs of some molecular assays such as next generation sequencing [9]. To improve the cost-efficiency of P-R studies, random sampling (RS) designs have been proposed that measure biomarkers for a random sub-sample of subjects selected on the basis of observed outcome and possibly other variables. Examples of RS designs include the case-cohort and nested case-control designs [10, 11]. If the outcome of interest is an infrequent event, it is generally advisable to over-sample cases (i.e., subjects who had the event) for biomarker measurement. The RS design has the potential to be cost-efficient in the sense of attaining a higher level of precision on a per-assay basis than the standard design (for example, using 50% of the assays to produce 60% of the precision as compared to the standard design). On the other hand, it does not make use of all available specimens, raising questions about the possibility of further improvement.

Group testing

In this article, we explore the use of group testing (GT) to further improve the cost-efficiency of P-R studies (beyond the RS design) when the biomarker of interest indicates the presence of some molecular alteration (e.g., mutation in a gene). GT refers to the practice of physically pooling specimens across subjects and assaying pooled samples for the presence of the molecular alteration in the pool. For an assay with negligible error, a positive test result for a pooled sample would indicate that the molecular alteration is present in one or more subjects in the pool, while a negative test result would indicate the contrary. Since its introduction by Dorfman [12] as a cost-efficient way of screening for syphilis, GT has been applied to many different areas of biomedical research including virology [13,14,15], genetics [16,17,18,19], drug development [20], and most recently Covid-19 [21,22,23]. In particular, the feasibility and performance of GT for detecting mutations in tumor have been investigated with promising results [17,18,19]. Possible motivations for GT include cost-efficiency, statistical efficiency [24, 25], limited availability of specimens, and confidentiality concerns [26]. Some authors have considered the use of GT in retrospective epidemiologic studies [25, 27], but the potential utility of GT in P-R biomarker studies seems largely unnoticed.

This commentary provides a statistical investigation of the potential utility of GT to improve the cost-efficiency of P-R biomarker studies beyond that achieved by the RS design. Efficiency comparisons will be made with or without adjusting for the number of assays required. We will consider a simple yet common situation with a dichotomous outcome, where GT is performed on a dichotomous biomarker in an outcome-dependent fashion, under the assumption that assay error is negligible. We extend the methods in References [25, 27] to this situation and develop simple procedures for estimating the prognostic or predictive value of a biomarker measured by GT. The main ideas are described in the text with technical details provided in an online supplement. Simulation studies are conducted to evaluate the proposed estimation procedures as well as the statistical efficiency and cost-efficiency of the GT design in comparison to the standard design and the RS design.

Study setting

The ideas will be illustrated using the ECOG-ACRIN Cancer Research Group trial E1900 (NCT00049517), a randomized clinical trial comparing high-dose (HD) daunorubicin (90 mg/m2) with standard-dose (SD) daunorubicin (45 mg/m2) for patients 17–60 years of age with de novo untreated acute myeloid leukemia [28]. A total of 657 patients were randomized in a 1:1 ratio and followed for a median of 80.1 months. The trial demonstrated significant benefits of HD versus SD with respect to overall survival (hazard ratio 0.74; 95% CI 0.61–0.89; P = 0.001) and complete remission (odds ratio 1.79; 95% CI 1.27–2.52; P = 0.001). For illustration, we will use complete remission as the dichotomous outcome of interest, even though it was not the primary outcome of the trial. The trial had several biomarkers of interest, including the FLT3-ITD internal tandem duplication variant and mutation in DNMT3A, both of which were present in 24% of the trial participants. Figure 1 shows the observed complete remission rates for HD and SD in each biomarker sub-group. Both biomarkers were assayed using PCR amplification and bidirectional Sanger sequencing [29]. Because such assays have near-perfect sensitivity and specificity [30], we will focus on perfect assays in the main text and present estimation methods and simulation results for less-than-perfect assays in the online supplement.

Fig. 1
figure1

Observed complete remission rates by treatment and marker status for two biomarkers (FLT3-ITD and DNMT3A) in the E1900 trial

Methods

Evaluating a prognostic biomarker

Evaluation of a prognostic biomarker, say X, usually focuses on its association with an outcome variable, say Y, for a given treatment, which is fixed in this section and therefore suppressed from the notation. We assume that X is a binary indicator of some molecular alteration (e.g., mutation); so X = 1 if the alteration is present and 0 otherwise. A patient is said to be “marker-positive” if X = 1 and “marker-negative” if X = 0. For simplicity, we assume here that Y is also binary (0 or 1) with Y = 1 representing treatment response (e.g., complete remission). A patient with Y = 1 is said to be a responder. In this setting, the association between X and Y may be assessed by comparing the marker-specific response rates p1 and p0, where px = P(Y = 1| X = x), x = 0, 1. Common measures of association include the log-odds ratio log[p1(1 − p0)/{p0(1 − p1)}], the log-ratio log(p1/p0), and the difference p1 − p0 [31, 32]. Each of these can be written as g(p1) − g(p0), where g is, respectively, the logit function, the log function, or the identity function.

Suppose a clinical trial has been completed to yield outcome data for a random sample of n subjects (either in a one-arm trial or in one arm of a multi-arm trial), together with archived specimens available for biomarker studies. As shown in Fig. 2a, a standard P-R study of the biomarker X would entail assaying all specimens of individual subjects and measuring the biomarker for each individual subject. From such data it is straightforward to estimate p1 (p0) as the proportion of responders among the marker-positive (negative) subjects, which can then be substituted into any measure of association. For illustration, the upper portion of Table 1 shows point estimates and standard errors of the three association measures mentioned earlier for the two biomarkers (FLT3-ITD and DNMT3A) in the two treatment groups (HD and SD) of the E1900 trial.

Fig. 2
figure2

Schematics for the standard (a), random sampling (b) and group testing (c) designs for evaluating a prognostic biomarker

Table 1 Point estimates (standard errors) of various measures of association and interaction in the E1900 trial

Under the RS design, subjects are selected randomly, typically in an outcome-dependent manner, for measurement of X, as illustrated in Fig. 2b. Let n1 (n0) denote the total number of responders (non-responders) in the trial, and let m1 (m0) denote the number of responders (non-responders) to be selected for measurement of X. If treatment response is rare (i.e., n1 is very small), it is common to select all responders (i.e., m1 = n1) and a comparable number of non-responders. Similar considerations apply to the opposite situation where treatment non-response is rare and n0 is very small. The RS design permits direct estimation of the prevalence of marker-positives among responders and non-responders, formally defined as the conditional probabilities qy = P(X = 1| Y = y), y = 0, 1. Specifically, q1 (q0) is estimated by the proportion of marker-positives among the m1 responders (m0 non-responders) selected for biomarker measurement. These estimates alone are sufficient for estimating the odds ratio for X and Y. For other measures of association, Bayes’ theorem can be used to obtain estimates of p1 and p0, which can then be substituted into any measure of association. These and other technical details are provided in the online supplement.

The GT design is a generalization of the RS design which allows more subjects to be assayed, though not necessarily on an individual basis. Figure 2c gives an example GT design for the same P-R study with the same numbers of assays for responders (m1) and non-responders (m0) as required by the RS design in Fig. 2b. Compared to the RS design, the GT design allows assaying twice as many responders and non-responders with the potential to produce more information. In general, the GT design is a stratified (by outcome) pooling design, and the pool sizes (i.e., number of subjects in a pool) for responders and non-responders may or may not be the same. If the pool size is equal to 1 in both strata, the GT design reduces to the RS design. In each stratum of the GT design, the marker prevalence qy can be estimated with pooled assay data using a maximum likelihood approach [20]. These estimates can be used in the same manner as in the RS design to estimate any measure of association between X and Y.

These designs are compared in a simulation study mimicking the E1900 trial. A separate simulation experiment is conducted for each combination of treatment group (HD or SD) and biomarker (FLT3-ITD or DNMT3A). Each experiment consists of 10,000 replicate trials in which T is fixed, X is generated randomly with P(X = 1) ≈ 0.24 (observed proportion), Y is generated conditionally on (T, X) according to the observed proportions in Fig. 1, and the sample size is the same as the actual size of the treatment group (327 for HD; 330 for SD). Each simulated trial is used to assess the prognostic value of X under the standard, RS and GT designs. The RS design is implemented in two versions which assay approximately one half (RS-2) or one third (RS-3) of the trial participants and which attempt to assay equal numbers of responders and non-responders to the extent possible. Accordingly, the GT design is also implemented in two versions which match the RS designs in the number of assays and which attempt to use a group size of 2 (GT-2) or 3 (GT-3) to the extent possible.

Evaluating a predictive biomarker

We now consider the problem of evaluating a predictive biomarker for choosing between an experimental treatment (T = 1) and a standard treatment (T = 0) in a randomized clinical trial. Let X and Y be defined as in the last section and note that T is independent of X by randomization. The predictive value of X can be quantified by the TX interaction in a regression model relating Y to (T, X) . For a binary Y, such a regression model may be specified as

$$ g\left\{P\left(Y=1|T,X\right)\right\}={\upbeta}_1+{\upbeta}_TT+{\upbeta}_XX+{\upbeta}_{TX} TX, $$
(1)

where g is a specified link function which is commonly chosen to be the logit, log or identity function. For any link function, the interaction coefficient βTX can be interpreted as a “difference in difference”:

$$ {\upbeta}_{\mathrm{TX}}=\left\{g\left({P}_{11}\right)-g\left({P}_{10}\right)\right\}-\left\{g\left({P}_{01}\right)-g\left({P}_{00}\right)\right\}, $$
(2)

where ptx = P(Y = 1| T = t, X = x), t, x = 0, 1.

Suppose a randomized clinical trial has been completed to produce treatment and outcome data on a random sample of n subjects, together with archived specimens available for measurement of X. A standard P-R biomarker study would simply measure X for each individual subject in the trial, which requires a total of n assays. The resulting data can be used to fit model (1) and estimate βTX using standard software. Alternatively, one can estimate each ptx as the proportion of responders among subjects in the T = t treatment group with marker status X = x, and substitute these estimates into Eq. (2) to estimate βTX. These two approaches are generally equivalent. The lower portion of Table 1 shows the results (point estimates and standard errors) of estimating βTX for the aforementioned three link functions in the E1900 trial.

The RS design involves random selection of subjects for measurement of X, which may be stratified on treatment and outcome; this can be illustrated with two copies of Fig. 2b, one for each treatment group. Let nty denote the total number of subjects available in the (T = t, Y = y) stratum, and let mty denote the number of subjects to be selected for measurement of X in the same stratum. Conventional wisdom suggests that the mty ’s should be made comparable to each other, which may require over-sampling subjects in small strata. The RS design permits direct estimation of the prevalence of marker-positives in each treatment-outcome stratum, formally defined as the conditional probabilities qty = P(X = 1| T = t, Y = y), t, y = 0, 1. Specifically, each qty is estimated by the proportion of marker-positives among the mty subjects in the (T = t, Y = y) stratum who are selected for biomarker measurement. For the logit link, these estimates suffice for estimating βTX. For other link functions, Bayes’ theorem can be used to combine these estimates of qty ’s with the fully observed treatment and outcome data to estimate all ptx ’s and hence βTX.

A GT design in this context is essentially a stratified (by treatment and outcome) pooling design and can be thought of as two copies of Fig. 2c, one for each treatment group. Compared with an RS design with the same number of assays (mty) in each treatment-outcome stratum, a GT design with pool size 2 allows twice as many subjects to be assayed (though not on an individual basis) in an attempt to produce more information. In general, a GT design may prescribe pooling in some or all treatment-outcome strata, and the pool size may or may not vary across strata. The RS design can be seen as a special type of GT design in which the pool size is equal to 1 in each stratum. In each treatment-outcome stratum of a general GT design, the marker prevalence qty can be estimated with pooled assay data using a maximum likelihood approach [20]. These estimates can be used in the same manner as in the RS design to estimate βTX for any link function.

These designs are compared via simulation in the setting of the E1900 trial, with a separate simulation experiment for each biomarker (FLT3-ITD or DNMT3A). Each experiment consists of 10,000 replicate trials in which T and X are independently generated with P(T = 1) = 0.5 and P(X = 1) ≈ 0.24, Y is generated conditionally on (T, X) according to the observed proportions in Fig. 1, and the sample size is the same as the actual size of the trial (657). Each simulated trial is used to assess the predictive value of X under the standard, RS and GT designs. The RS design is implemented in two versions which assay approximately one half (RS-2) or one third (RS-3) of the trial participants and which attempt to perform the same number of assays in each stratum defined by (T, Y). Accordingly, the GT design is also implemented in two versions which match the RS designs in the number of assays and which attempt to use a group size of 2 (GT-2) or 3 (GT-3) in each stratum.

Measures of performance

The performance of various designs is assessed in terms of relative efficiency and relative cost-efficiency, both of which are relative to the standard design, for estimating the association/interaction measure of interest. The relative efficiency of a non-standard design is defined as the ratio of the estimation variance for the standard design to that for the non-standard design in question. A GT-2 design with a relative efficiency of 0.85, for example, retains 85% of the information (i.e., precision) with half of the assays required by the standard design. The relative cost-efficiency of a non-standard design is defined as its relative efficiency multiplied by the ratio of the number of assays for the standard design to that for the non-standard design in question. For example, a GT-3 design with a relative cost-efficiency of 2 yields twice as much information as does the standard design on a per-assay basis.

Choosing a pool size

Implementing the GT design requires choosing a pool size for each pooling stratum (based on outcome and possibly treatment). While we do not attempt to answer this question in full in this article, we provide some statistical insights here on how to choose a pool size to maximize cost-efficiency. As we explain in the online supplement, the statistical efficiency for estimating an association/interaction measure depends on the amount of available information (known in statistics as Fisher information) about the prevalence of the biomarker in each pooling stratum. Assuming that a fixed number of assays has been allocated to a given stratum with sufficient subjects/samples for all realistic pool sizes, the question then becomes how to choose a pool size to maximize the Fisher information about marker prevalence in a single pooled assay result. This per-assay Fisher information can be calculated analytically as a function of the true prevalence for each candidate pool size. This information, together with a preliminary estimate of the stratum-specific marker prevalence, provides a starting point for choosing a stratum-specific pool size, which can then be validated or revised on the basis of other considerations such as number of subjects, sample availability, pooling feasibility, and assay performance.

Results

Evaluating a prognostic biomarker

Simulation results for evaluating a prognostic biomarker are shown in Table 2. As expected, all five designs yield nearly unbiased estimates of association measures (results not shown). For the RS and GT designs, Table 2 presents simulation results of relative efficiency and relative cost-efficiency. The RS and GT designs are expected to have relative efficiency less than 1 because they use fewer assays than the standard design. Comparing RS and GT designs with the same number of assays, the GT design is clearly and substantially more efficient than the RS design. For studying DNMT3A in the SD group, the GT-3 design achieves 70–71% of the precision level of the standard design while requiring only one third of the assays, and is more than twice as efficient as the RS-3 design with the same number of assays. The other scenarios follow the same pattern with slightly different numbers. In Table 2, the relative cost-efficiency ranges between 0.94 and 1.27 for the RS designs, indicating that the RS designs are either similar or superior to the standard design in cost-efficiency. It is worth noting that the GT designs attain much higher levels of relative cost-efficiency (1.65–1.79 for GT-2; 1.94–2.40 for GT-3). In summary, the results in Table 2 indicate that RS and GT designs are usually cost-efficient as compared to the standard design, and that GT designs can achieve much higher cost-efficiency than RS designs.

Table 2 Simulation results for evaluating a prognostic biomarker in the setting of the E1900 trial

Evaluating a predictive biomarker

Simulation results for evaluating a predictive biomarker are shown in Table 3. As in the case of evaluating a prognostic biomarker, estimation bias is negligible for each interaction measure in each design (results not shown). Therefore, our comparison of designs is focused on (cost-)efficiency. For the RS and GT designs, Table 3 presents simulation results of relative efficiency and relative cost-efficiency. In this setting, the RS designs are largely similar in cost-efficiency to the standard design, with relative cost-efficiency ranging from 0.89 to 1.13. In contrast, the GT designs are highly competitive in terms of relative cost-efficiency (1.72–1.75 for GT-2; 2.10–2.25 for GT-3). Thus, the simulation results in Table 3 demonstrate that GT designs are much more cost-efficient than the standard and RS designs for estimating an interaction measure. This can be an important advantage when the cost of a biomarker study is driven by the cost of assays.

Table 3 Simulation results for evaluating a predictive biomarker in the setting of the E1900 trial

Choosing a pool size

Figure 3 shows the per-assay Fisher information as a function of the true prevalence for four different pool sizes (1 through 4); the specific formula for any pool size is provided in the online supplement. Under the previously stated assumptions, Fig. 3 suggests that the optimal pool size among the four pool sizes with maximal cost-efficiency is 1 (i.e., no pooling) if the true prevalence of the biomarker is above 0.67 in the given stratum, 2 if the prevalence is between 0.48 and 0.66, 3 if the prevalence is between 0.37 and 0.47, and 4 or more if the prevalence is below 0.37.

Fig. 3
figure3

Fisher information in a pooled assay (of size 1 through 4) about biomarker prevalence as a function of the true prevalence

Discussion

To the best of our knowledge, this work is the first attempt to explore the use of GT in P-R biomarker studies. Our simulations and theoretical calculations have demonstrated that the GT design can be highly cost-efficient compared to both the standard design and the RS design, at least in some situations. Higher cost-efficiency translates into more efficient use of resources, which is desirable even as assay costs decline owing to technological advances.

We have assumed in the main text that assay error is negligible. While this assumption may be reasonable for some assays (such as the PCR-based assay employed in the E1900 trial), many assays have less-than-perfect accuracy, which should be incorporated in statistical estimation. In the online supplement, we provide estimation methods to account for possible misclassification and report an additional simulation study on the performance of GT designs when the assay is subject to misclassification. The additional simulation results indicate that GT designs generally achieve higher cost-efficiency than the standard and RS designs, consistent with the results in Tables 2 and 3.

An additional complication in the GT design is the well-known dilution effect, which may result in decreased sensitivity for pooled samples [33]. The magnitude of the dilution effect depends on assay specifics and may be expected to increase with pool size [34]. This issue has been considered by several authors in different contexts. For example, McMahan et al. [35] proposed a mechanistic modeling approach in which pool testing error rates are estimated from a rich set of low-level assay data; Hung and Swallow [36] and Zhang et al. [37] postulate that the pool testing error rates are known functions of the pool size and the number of diseased individuals in the pool. Further research is warranted on how to incorporate the dilution effect in GT designs of P-R biomarker studies.

We have assumed in this article that the biomarker is a binary indicator of the presence of some molecular alteration. If this is not the case, the relationship between a pooled assay result and individual assay results may become more complicated and more difficult to deal with in statistical estimation. For some continuous biomarkers, a pooled assay result may be plausibly assumed to be a (weighted) average of individual assay results, possibly with a random measurement error [38]. Novel statistical methods are needed to analyze GT designs with biomarkers that do not follow the pool-individual relationship assumed here.

Other areas of future research include development of statistical methods for GT designs with non-binary outcomes such as censored survival outcomes, which are commonly encountered in oncology trials, and optimization of GT designs for various combinations of outcomes and biomarkers.

Conclusions

It has been demonstrated that group testing can substantially improve the cost-efficiency of prospective-retrospective biomarker studies. Further research is warranted to investigate the performance of the GT design in a wider range of real-world applications and to extend the statistical methods developed here to a greater variety of estimation problems.

Availability of data and materials

We do not have the permission to distribute the E1900 trial data used in this article. However, the summary statistics needed to reproduce our simulation results are shown in Fig. 1.

Abbreviations

GT:

Group testing

HD:

High-dose

P-R:

Prospective-retrospective

RS:

Random sampling

SD:

Standard-dose

References

  1. 1.

    FDA-NIH Biomarker Working Group. BEST (Biomarkers, EndpointS, and other Tools) resource. Silver Spring: US FDA; 2016.

    Google Scholar 

  2. 2.

    Kalia M. Biomarkers for personalized oncology: recent advances and future challenges. Metabolism. 2015;64(3):S16–21.

    CAS  Article  Google Scholar 

  3. 3.

    Badve S, Kumar GL. Predictive biomarkers in oncology: applications in precision medicine. Switzerland: Springer; 2018.

    Google Scholar 

  4. 4.

    Rabbee N. Biomarker analysis in clinical trials with R. Boca Raton: Chapman & Hall/CRC; 2020.

    Book  Google Scholar 

  5. 5.

    Simon R, Paik S, Hayes DF. Use of archived specimens in evaluation of prognostic and predictive biomarkers. J Natl Cancer Inst. 2009;101(21):1446–52.

    Article  Google Scholar 

  6. 6.

    Amado RG, Wolf M, Peeters M, et al. Wild-type KRAS is required for panitumumab efficacy in patients with metastatic colorectal cancer. J Clin Oncol. 2008;26(10):1626–34.

    CAS  Article  Google Scholar 

  7. 7.

    Karapetis CS, Khambata-Ford S, Jonker DJ, et al. K-ras mutations and benefit from cetuximab in advanced colorectal cancer. N Engl J Med. 2008;359(17):1757–65.

    CAS  Article  Google Scholar 

  8. 8.

    Matsui S, Buyse M, Simon R. Design and analysis of clinical trials for predictive medicine. Boca Raton: Chapman and Hall/CRC; 2015.

    Book  Google Scholar 

  9. 9.

    Marino P, Touzani R, Perrier L, et al. Cost of cancer diagnosis using next-generation sequencing targeted gene panels in routine practice: a nationwide French study. Eur J Hum Genet. 2018;26(3):314–23.

    CAS  Article  Google Scholar 

  10. 10.

    Prentice R. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73(1):1–11.

    Article  Google Scholar 

  11. 11.

    Langholz B, Thomas D. Nested case-control and case-cohort methods of sampling from a cohort: a critical comparison. Am J Epidemiol. 1990;131(1):169–76.

    CAS  Article  Google Scholar 

  12. 12.

    Dorfman R. The detection of defective members of large populations. Ann Math Stat. 1943;14(4):436–40.

    Article  Google Scholar 

  13. 13.

    Emmanuel JC, Bassett MT, Smith HJ, Jacobs JA. Pooling of sera for human immunodeficiency virus (HIV) testing: an economic method for use in developing countries. Am J Clin Pathol. 1988;41(5):582–5.

    CAS  Article  Google Scholar 

  14. 14.

    Cardoso M, Koerner K, Kubanek B. Mini-pool screening by nucleic acid testing for hepatitis B virus, hepatitis C virus, and HIV: preliminary results. Transfusion. 1998;38(10):905–7.

    CAS  Article  Google Scholar 

  15. 15.

    Van TT, Miller J, Warchauer DM, et al. Pooling nasopharyngeal/throat swab specimens to increase testing capacity for influenza viruses by PCR. J Clin Microbiol. 2012;50(3):891–6.

    Article  Google Scholar 

  16. 16.

    Gastwirth JL. The efficiency of pooling in the detection of rare mutations. Am J Hum Genet. 2000;67(4):1036–9.

    CAS  Article  Google Scholar 

  17. 17.

    Pearson JV, Huentelman MJ, Halperin RF, et al. Identification of the genetic basis for complex disorders by use of pooling-based genomewide single-nucleotide-polymorphism association studies. Am J Hum Genet. 2007;80:126–39.

    CAS  Article  Google Scholar 

  18. 18.

    Futschik A, Schlotterer C. The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics. 2010;186:207–18.

    CAS  Article  Google Scholar 

  19. 19.

    Anand S, Mangano E, Barizzone N, et al. Next generation sequencing of pooled samples: guideline for variants’ filtering. Sci Rep. 2016;6:33735.

    CAS  Article  Google Scholar 

  20. 20.

    Xie M, Tatsuoka K, Sacks J, Young SS. Group testing with blockers and synergism. J Am Stat Assoc. 2001;96(453):92–102.

    Article  Google Scholar 

  21. 21.

    Eberhardt JN, Breuckmann NP, Eberhardt CS. Multi-stage group testing improves efficiency of large-scale COVID-19 screening. J Clin Virol. 2020;128:104382.

    CAS  Article  Google Scholar 

  22. 22.

    Ellenberg J. Five people. One test. This is how you get there: New York Times; 2020. https://www.nytimes.com/2020/05/07/opinion/coronavirus-group-testing.html

  23. 23.

    Broadfoot M. Coronavirus test shortages trigger a new strategy: group screening: Scientific American; 2020. https://www.scientificamerican.com/article/coronavirus-test-shortages-trigger -a-new-strategy-group-screening2/

  24. 24.

    Tu XM, Litvak E, Pagano M. On the informativeness and accuracy of pooled testing in estimating prevalence of a rare disease: application to HIV screening. Biometrika. 1995;82(2):287–97.

    Article  Google Scholar 

  25. 25.

    Liu A, Liu C, Zhang Z, Albert PS. Optimality of group testing in the presence of misclassification. Biometrika. 2012;99(1):245–51.

    Article  Google Scholar 

  26. 26.

    Gastwirth JL, Hammick PA. Estimation of the prevalence of a rare disease, preserving the anonymity of the subjects by group testing: application to estimating the prevalence of AIDS antibodies in blood donors. J Stat Plan Inference. 1989;22(1):15–27.

    Article  Google Scholar 

  27. 27.

    Zhang Z, Liu A, Lyles RH, Mukherjee B. Logistic regression analysis of biomarker data subject to pooling and dichotomization. Stat Med. 2012;31(22):2473–84.

    CAS  Article  Google Scholar 

  28. 28.

    Luskin MR, Lee JW, Fernandez HF, et al. Benefit of high-dose daunorubicin in AML induction extends across cytogenetic and molecular groups. Blood. 2016;127(12):1551–8.

    CAS  Article  Google Scholar 

  29. 29.

    Patel JP, Gönen M, Figueroa ME, et al. Prognostic relevance of integrated genetic profiling in acute myeloid leukemia. N Engl J Med. 2012;366(12):1079–89.

    CAS  Article  Google Scholar 

  30. 30.

    Rosenthal SH, Gerasimova A, Ma C, et al. Analytical validation and performance characteristics of a 48-gene next-generation sequencing panel for detecting potentially actionable genomic alterations in myeloid neoplasm. bioRxiv. 2020. https://doi.org/10.1101/2020.11.30.403634.

  31. 31.

    Agresti A. Categorical data analysis. 3rd ed. Hoboken: Wiley; 2013.

    Google Scholar 

  32. 32.

    Lui KJ. Binary data analysis of randomized clinical trials with noncompliance. New York: Wiley; 2011.

    Book  Google Scholar 

  33. 33.

    Cutler DJ, Jensen JD. Commentary: to pool, or not to pool? Genetics. 2010;186:41–3.

    CAS  Article  Google Scholar 

  34. 34.

    Zhang Z, Liu C, Kim S, Liu A. Prevalence estimation subject to misclassification: the mis-substitution bias and some remedies. Stat Med. 2014;33(25):4482–500.

    Article  Google Scholar 

  35. 35.

    McMahan CS, Tebbs JM, Bilder CR. Regression models for group testing data with pool dilution effects. Biostatistics. 2013;14(2):284–98.

    Article  Google Scholar 

  36. 36.

    Hung M, Swallow W. Robustness of group testing in the estimation of proportions. Biometrics. 1999;55:231–7.

    CAS  Article  Google Scholar 

  37. 37.

    Zhang W, Liu A, Li Q, Albert PS. Nonparametric estimation of distributions and diagnostic accuracy based on group-tested results with differential misclassification. Biometrics. 2020;76(4):1147–56.

    Article  Google Scholar 

  38. 38.

    Zhang Z, Albert PS. Binary regression analysis with pooled exposure measurements: a regression calibration approach. Biometrics. 2011;67(2):636–45.

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported in part by the intramural research program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health. This manuscript was prepared using data from Dataset NCT00049517-D1 from the NCTN/NCORP Data Archive of the National Cancer Institute’s (NCI’s) National Clinical Trials Network (NCTN). Data were originally collected from clinical trial NCT number NCT00049517 “A Phase III Trial in Adult Acute Myeloid Leukemia: Daunorubicin Dose-Intensification Prior to Risk-Allocated Autologous Stem Cell Transplantation”. All analyses and conclusions in this manuscript are the sole responsibility of the authors and do not necessarily reflect the opinions or views of the clinical trial investigators, the NCTN, the NCORP or the NCI.

Funding

Research of W. Zhang was partially supported by the National Natural Science Foundation of China (Grant No. 12001522), which played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. Open Access funding provided by the National Institutes of Health (NIH).

Author information

Affiliations

Authors

Contributions

AL conceived and initiated this research. ZZ and WZ developed statistical methodology. WZ implemented the methodology and conducted simulation studies. ZZ wrote the first draft of the manuscript. All authors provided comments on early versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhiwei Zhang.

Ethics declarations

Ethics approval and consent to participate

Permission to use the E1900 trial data was granted by the National Cancer Institute’s National Clinical Trials Network.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Zhang, Z., Krushkal, J. et al. Group testing can improve the cost-efficiency of prospective-retrospective biomarker studies. BMC Med Res Methodol 21, 55 (2021). https://doi.org/10.1186/s12874-021-01239-4

Download citation

Keywords

  • Biomarker study design
  • Cost-efficiency
  • Group testing
  • Pooling
  • Two-phase sampling