Psychometric properties of measures of substance use: a systematic review and meta-analysis of reliability, validity and diagnostic test accuracy

Santos, Glenn-Milo; Strathdee, Steffanie A.; El-Bassel, Nabila; Patel, Poonam; Subramanian, Divya; Horyniak, Danielle; Cook, Ryan R.; McCullagh, Charlotte; Marotta, Phillip; Choksi, Foram; Kang, Brian; Allen, Isabel; Shoptaw, Steven

doi:10.1186/s12874-020-00963-7

Research article
Open access
Published: 07 May 2020

Psychometric properties of measures of substance use: a systematic review and meta-analysis of reliability, validity and diagnostic test accuracy

Glenn-Milo Santos^1,2,
Steffanie A. Strathdee³,
Nabila El-Bassel⁴,
Poonam Patel²,
Divya Subramanian²,
Danielle Horyniak^3,5,6,
Ryan R. Cook⁷,
Charlotte McCullagh⁴,
Phillip Marotta⁴,
Foram Choksi²,
Brian Kang²,
Isabel Allen⁸ &
…
Steven Shoptaw⁷

BMC Medical Research Methodology volume 20, Article number: 106 (2020) Cite this article

3942 Accesses
6 Citations
2 Altmetric
Metrics details

Abstract

Background

Synthesis of psychometric properties of substance use measures to identify patterns of use and substance use disorders remains limited. To address this gap, we sought to systematically evaluate the psychometric properties of measures to detect substance use and misuse.

Methods

We conducted a systematic review and meta-analysis of literature on measures of substance classes associated with HIV risk (heroin, methamphetamine, cocaine, ecstasy, alcohol) that were published in English before June 2016 that reported at least one of the following psychometric outcomes of interest: internal consistency (alpha), test-retest/inter-rater reliability (kappa), sensitivity, specificity, positive predictive value, and negative predictive value. We used meta-analytic techniques to generate pooled summary estimates for these outcomes using random effects and hierarchical logistic regression models.

Results

Findings across 387 paper revealed that overall, 65% of pooled estimates for alpha were in the range of fair-to-excellent; 44% of estimates for kappa were in the range of fair-to-excellent. In addition, 69, 97, 37 and 96% of pooled estimates for sensitivity, specificity, positive predictive value, and negative predictive value, respectively, were in the range of moderate-to-excellent.

Conclusion

We conclude that many substance use measures had pooled summary estimates that were at the fair/moderate-to-excellent range across different psychometric outcomes. Most scales were conducted in English, within the United States, highlighting the need to test and validate these measures in more diverse settings. Additionally, the majority of studies had high risk of bias, indicating a need for more studies with higher methodological quality.

Peer Review reports

Background

Substance use, including illicit drug use and alcohol, is prevalent worldwide with about 5% of adults using illicit substances [1] and 40% of adults consuming alcohol, in the past year [2]. Moreover, the number of people with drug use disorders was estimated at 62 million, while the number of individuals with alcohol use disorders was estimated at 100.4 million in 2016 [3]. Substance use disorders are associated with substantial morbidity and mortality globally. Illicit drug use disorders were attributed to 20 million disability-adjusted life years (DALYs) lost [4] while alcohol use disorders were attributed to 85 million DALYs lost in 2012 [5]. Specific classes of substances also play an important role in HIV risk, including needle sharing, and sexual risk behaviors, and have been linked to HIV incidence [6,7,8] [6, 9,10,11] [12,13,14,15]. Among people living with HIV (PLWH), substance use disorders may lead to less optimal HIV care outcomes because of their associations with lower likelihood of being linked to HIV care, retained in care, receiving antiretroviral therapy (ART), having high ART adherence and lower likelihood of having an undetectable HIV viral load [9, 10, 16,17,18].

Given the role of substance use in the global burden of disease and the overlap between use of specific substances and HIV, it is important for clinicians and researchers to have tools with high reliability, validity, and diagnostic accuracy [19]. Yet too few use measures with known psychometric properties when assessing substance use. Currently, there are a myriad of standardized questionnaires used to screen substance use and misuse that require patients to self-report patterns of use and substance-related problems. Examples such as the Alcohol Use Disorders Identification Test and the Drug Use Disorders Identification test [20, 21] provide scores that correspond with severity of substance use and related problems. It remains that there are no biological measures that define a substance use disorder; existing biological measures are considered to be indirect correlates of use disorders [22]. Examples include alcohol biomarkers like Carbohydrate-Deficient Transferrin (CDT), and Gamma Glutamyl Transferase (GGT), which are used to screen for alcohol dependence and heavy drinking, respectively [22]. There is a great need to evaluate the psychometric performance of these measures and markers across studies in settings of HIV to elucidate the overall validity, reliability, and diagnostic accuracy.

One approach to informing the use of psychometric measures in research and clinical care is pooling the psychometric characteristics of measures across studies involves the use of meta-analytic techniques, which generates summary estimates of the validity, reliability, and diagnostic accuracy of different questionnaires [23,24,25,26,27]. However, synthesis of psychometric properties of substance use measures to identify patterns of use and substance use disorders remains limited, with few exceptions [21, 28, 29]. One meta-analysis focused on the accuracy of self-reported assessments to diagnose alcohol and cannabis use disorders found that instruments had a pooled sensitivity of 0.88 and a pooled specificity of 0.90 among emergency room department pediatric patients [28]. Another meta-analysis observed that studies with single questions to identify alcohol use disorders in primary care had pooled sensitivity of 0.54 and pooled specificity of 0.87 while two-question measures had a pooled sensitivity of 0.87 and a pooled specificity of 0.80 [29]. More commonly, however, reviews on substance use measures present psychometric data in a descriptive fashion [19, 30, 31]. Therefore, more rigorous efforts to systematically pool the psychometric properties of substance use measures are needed to establish the overall performance and accuracy of these tools and point toward their utility in future research.

To address these gaps, we conducted a systematic review and meta-analysis of literature to identify studies that have reported validity and reliability of substance use measures and pooled these measure using meta-analytic techniques. For the purposes of this review, we targeted our search for measures of substance classes previously associated with HIV risk. Specifically, we focused our review on measures for the following: alcohol, methamphetamine and amphetamine, cocaine, heroin, and ecstasy, regardless of whether the study was conducted among a population at high risk for HIV. Additionally, we included measures that evaluated substance use in general (i.e., measures that did not differentiate between classes of substances) as long as those measures were inclusive of our targeted substance classes. This study’s review questions are: What are the summary reliability, validity--as measured by alpha and kappa coefficients—and diagnostic accuracy—as measured by sensitivity, specificity, positive predictive value, and negative predictive value—of various substance and alcohol measures to screen for use and use disorders?

Methods

Search strategy

We conducted a systematic review of studies published prior to June 2016 on substance use measures indexed in electronic databases including PubMed, PsycINFO, and EMBASE. We developed Boolean search terms to capture substance use measures that have been previously associated with HIV risk, in consultation with the reference librarian from the University of California San Francisco with a master’s degree in library and information science (MLIS). The following substance classes were included: alcohol, methamphetamine and amphetamine, cocaine, heroin, and 3,4-methylenedioxy-methamphetamine (MDMA; “ecstasy”). Because the focus of this study was to pool psychometric properties of measures, we also included search terms related to validity, reliability, and diagnostic accuracy (i.e., alpha, kappa, sensitivity, specificity, positive predictive value, negative predictive value). Search terms included MeSH headings related to our research question, general terms related to substance use and psychometric properties or interest, as well as specific terms referencing the names of well-known substance use measures. The search terms used are provided in the appendix. This review was registered in Prospero, the International prospective register of systematic reviews (study number: CRD42017058813).

Primary outcomes

We aimed to estimate the pooled summary estimates for the following psychometric outcomes: Cronbach’s alpha, kappa, sensitivity, specificity, positive predictive value, and negative predictive value. We recognize that there are a number of measure characteristics that relate to validity [32]. However, to focus our review and facilitate the feasibility of completing this study, we have decided to restrict the scope of our validity measures to Cronbach’s alpha. Descriptions for these outcomes are provided below:

Psychometric Outcome	Description
Cronbach’s alpha	measure of internal consistency, that is, how closely correlated a set of scale items are, as a group.
Kappa	measure of inter-rater agreement or inter-rater reliability for qualitative (categorical) items which takes into account the possibility of the agreement occurring by chance.
Sensitivity	measure of a test/scales’ ability to correctly detect patients who do truly have the condition (i.e., proportion of people who screen positive for substance use disorders according to the scale, among those who truly have substance use disorders based on an established standard (“gold standard”) such as meeting diagnostic criteria for a disorder).
Specificity	measure of the test/scales’ ability to correctly detect patients without a condition (i.e., proportion of people who screen negative for substance use disorders according to the scale, among those who truly do not have substance use disorders based on an established standard such as meeting diagnostic criteria for a disorder).
Positive predictive value (PPV)	the probability that persons with a positive screening result actually has the disorder. (i.e., proportion of people who meet diagnostic criteria for a substance use disorder among those who screened positive for the disorder on a scale).
Negative predictive value (NPV)	the probability that people with a negative screening test actually do not have the disease. (i.e., proportion of people who meet diagnostic criteria for a substance use disorder among those who screened negative for a substance use disorder in a scale).

Eligibility criteria

We searched for relevant publications that met all of the following inclusion criteria: 1) studies that reported one or more of the psychometric outcomes of interest; 2) studies that examined on one or more substance use measures related to our substance classes of interest (i.e., alcohol, methamphetamine and amphetamine, cocaine, heroin, and ecstasy) or for substance use in general (i.e., some measures do not differentiate between multiple substances or assess classes of substances all together); 3) publication written in English (note: studies that administered measures that were not in English were eligible as long as the publication was written in English) .

We excluded publications using the following exclusion criteria: 1) reporting insufficient information on reliability, validity and diagnostic accuracy for substance use measures/assessments (i.e., no numeric information on our psychometric outcomes, sample size); 2) articles that provide psychometric data for a measure/assessment that is not related to substance use (e.g., a study on internal consistency data on a depression scale among substance users); 3) articles and/or secondary data analyses that report reliability and validity data from a primary outcome paper that was already included in the review; 4) reviews, commentaries, case report studies and other publications with insufficient reporting of data; 5) substance use measures/assessments that focus on aspects other than actual substance consumption, dependence or substance use disorder (e.g., a study reporting validity of a self-efficacy scale for resisting substance use; a study that examines the underlying mechanisms of substance use among those who already have a substance use disorder); and 6. studies with psychometric properties that focus on substance classes outside the scope of our review (e.g. marijuana or tobacco).

Screening procedures

All citations (including their titles and abstracts) captured by the search strategy were imported into Covidence.org (Melbourne Victoria), which allowed research team members to independently review and screen citations using a centralized, online database. Each title/abstract was screened by two members of a team comprising master-, doctoral-, and post-doctoral-level researchers trained in the study protocol (co-authors PP, DH, RC, DS, CM, PM, and FC) and citations that were coded as eligible by both reviewers were moved to the full-text review phase. The same process was then repeated for full-text articles. In the event of discrepancies between reviewers in both the title and abstract phase and the full-text phase, a third team member (GMS) reviewed the relevant documents and helped reconcile the differences. Articles that were deemed eligible in the full-text review stage were included in the data extraction phase described below.

Data extraction

Team members extracted data on the psychometric properties, scale and study characteristics, sample size, study sample characteristics/co-factors of interest (country where study was conducted, number of sites, language that the scale was administered, gender of participants included), cut-offs used, comparison measure/gold-standard used, and other information relevant to study, including information on study quality [33]. Some papers reported multiple data points for psychometric outcomes from different study populations (e.g., disaggregated data by sex or different research sites). These data points were extracted as separate records only if the paper did not provide a single overall measure for the psychometric outcomes for the entire study sample, consistent with other analyses [24].

Assessment of bias risk

For studies reporting diagnostic measures (e.g., sensitivity and specificity), reviewers rated study quality using the Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies, QUADAS-2, guidelines [33], which includes quality rating questions on the study’s patient selection, index test, reference standard, and flow and timing. For studies that did not include diagnostic accuracy measures, only relevant domains of QUADAS-2 were assessed, as appropriate (i.e., rating regarding the reference standard was not conducted). All extracted data were entered into an electronic questionnaire programmed in Qualtrics, and checked by another researcher (conducted by the same co-authors who screened citations, as well as co-author BK) to verify accuracy.

Data analyses

We calculated separate pooled summary estimates for each of the 37 substance use measures and also fitted separate models for each of the six psychometric outcomes for validity, reliability, and accuracy. For alpha, kappa, PPV and NPV, we pooled data across studies using DerSimonian-Laird random effects models, implemented in STATA version 13 (Colleges Station, TX) [34]. Random effects meta-analyses models, as opposed to fixed-effects models, are preferred for pooling data from diagnostic accuracy tests since heterogeneity is presumed to exists across these studies [35]. Random effects models, which are considered the default models used in meta-analyses for diagnostic accuracy tests, synthesize the psychometric outcomes from separate studies into a weighted average effect size (pooled summary estimate), using inverse variance weighting, based on sample size, while taking into account the extent of the variability of the effect sizes observed in separate studies [35]. Additionally, for sensitivity and specificity, we used hierarchical logistic regression models, implemented using the metandi command in STATA, to account for the correlation between the two measures (i.e., trade-off between sensitivity and specificity) [36,37,38]. Since metandi requires a minimum of four observations to conduct a meta-analysis, we pooled measures with less than four records for sensitivity and specificity outcomes using the random effects models described for other outcomes, and noted this alternate approach in the results, as appropriate.

Classification and evaluation of pooled estimates

Qualitatively, pooled summary estimates for alpha and kappa were classified as “excellent” for estimates that were > 0.89, “good” for estimates that were between 0.85–0.89, “moderate” for estimates that were between 0.80–0.84, “fair” for estimates that were between 0.75–0.79, or “unsatisfactory” for estimates below 0.75, consistent with other studies [24, 39].

Pooled summary estimates for sensitivity, specificity, positive predictive value and negative predictive value were classified as “excellent” for estimates that were > 0.89, “good” for estimates that were between 0.8–0.89, “moderate” for estimates that were between 0.6–0.79, and “low” for estimates that were < 0.6 [24, 40].

For each pooled psychometric summary estimate, we calculated I² statistics, which represents the percentage of total variation across studies, to assess heterogeneity. We considered pooled estimates as having low heterogeneity if I² 25%, moderate heterogeneity if I² 50%, and high heterogeneity if I² 75% [41]. We did not use standard meta-analyses tests for publication bias given the limitations of these tests for diagnostic test accuracy studies and due to the characteristics of our psychometric outcomes (e.g., truncated measures cannot fall below zero) [42]. As indicated in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy, using these tests are inappropriate because they will likely lead to a high false-positive rate for publication bias [35].

Results

Screening and study inclusion

Study screening and inclusion is summarized in Fig. 1. In brief, in the identification stage, we initially identified 7555 references in the initial search, of which, 208 were excluded for being duplicates. In the title and abstract review phase, reviewers excluded 5854 studies that were deemed ineligible. Full-text reviews were conducted for 1493 articles that were deemed eligible from title and abstract review. Of the full-text reviewed articles, 1105 studies were excluded for not meeting eligibility criteria. The most common reasons for exclusion were: scales or measures that were outside the scope of review (n = 386), lack of psychometric data on scales of interests (n = 140), lab or methods papers that were outside the scope of the review (n = 130), non-English language publications (n = 110), duplicate study (n = 98), psychometric outcomes that were outside the scope of review (n = 79). In total, there were 387 unique studies included in the data extraction phase containing sufficient data on the outcomes for 37 scales (Table 1).

Table 1 Substance use Measures/Scales identified in Systematic Review and Meta-analyzed

Full size table

Study characteristics

Table 2 presents characteristics of the studies included in this meta-analysis. As mentioned, studies published in English were included in this review, regardless of the language in which the scales were administered. Among the 387 studies included, the most those common language in which the scale/measure was conducted in was English (63%), followed by Spanish (9%), French (5%), Portuguese (3%), and Chinese (2%). A large proportion of studies were conducted in the United States (40%). The median sample size was 286 [Range = 9–50,049]. The vast majority of studies (83%) included men and women (n = 323). Additionally, 11% (n = 42) of the studies included study sample comprised only of men, while 5% (n = 20) studies included study samples comprised only of women. Most studies were published after 1999 (66%), with studies published between 2000 and 2009 accounting for 38% (n = 148) of the studies meta-analyzed, and studies published between 2010 and 2017 accounting for 28% (n = 110). Most studies involved a single study site 61%, while 39% were multi-site studies. Additionally, 72% of the studies involved convenience samples, 20% included random or probability based samples, and 7% had other or unclear sampling strategies.

Table 2 Pooled Summary Estimates

Full size table

Assessment of bias in study quality

The risk of bias in the four QUADAS 2 domains for each study included in this meta-analysis is presented in Supplementary Table 1. The distribution of the QUADAS 2 domains for the entire study is summarized in Fig. 2. Of the studies included, 58% of studies had a low risk of bias with respect to the patient population; 57% has low risk of bias in the index test domain, 48% has low risk of bias in the reference standard test domain, and 72% had low risk for the flow and timing. Overall, only 16% of studies had low risk of bias across all four of these QUADAS 2 domains.

Pooled summary estimates: overall findings

The pooled summary estimates of psychometric properties of substance use measures (which are described in Table 1) are quantitatively and qualitatively summarized in Tables 2 and 3, respectively. Overall, 65% of pooled estimates for alpha were in the range of fair-to-excellent; 44% of estimates for kappa were in the range of fair-to-excellent. In addition, 69, 97, 37 and 96% of pooled estimates for sensitivity, specificity, positive predictive value, and negative predictive value, respectively, were in the range of moderate-to-excellent (Fig. 3).

Table 3 Qualitative Interpretation of Pooled Estimates

Full size table

Self-reported measures that had all pooled estimates that were fair/moderate or better include the following: Alcohol Dependence Scale; Addiction Severity Index (ASI); ASI subscale for Alcohol; ASSIST; the Composite International Diagnostic Interview, including the original version, as well as version 2.1 and version 3; Drug Abuse Screen Test - 10 item scale; Drug Use Disorders Identification Test; Problem Oriented Screening Instrument for Teenagers; Severity of Dependence scale; Timeline Followback; and Chemical Use, Abuse, and Dependence. Biomarkers that had all pooled estimates that were fair/moderate or better include the following: Ethyl glucuronide; Phosphatidylethanol test; and the combined used of Carbohydrate deficient transferrin and Mean corpuscular volume. In general, we also observed high heterogeneity between studies for most pooled estimates.

Pooled summary estimates, by substance use measure

The pooled estimates and 95% confidence intervals for alpha, kappa, sensitivity, specificity, positive predictive value, and negative predictive value are shown in Table 2, respectively. Below we summarize the results of the pooled summary estimates alphabetically for each of the 37 substance use measures, grouping self-reported measures and biomarkers separately. The list of references for the studies meta-analyzed for each scale/measure is presented in Supplementary Table2.

Self-reported measures

Alcohol dependence scale (ADS)

The pooled alpha estimate for ADS (3 data points) was good: 0.90 (95%CI = 0.80–0.99) and there was high heterogeneity between studies (I² 98.9%). The pooled sensitivity estimate for ADS (2 data points) was excellent: 0.95 (95%CI = 0.90–1.00) and there was low heterogeneity between studies (I² 0%). The pooled specificity estimate (2 data points) was moderate: 0.64 (95%CI = 0.52–0.77) and there was moderate heterogeneity between studies (I² 60.1%). There was insufficient data to calculate the pooled PPV and NPV estimates for ADS.

Addiction Severity Index (ASI)

The pooled alpha estimate for ASI (3 data points) was good: 0.84 (95%CI = 0.81–0.87) and there was moderate heterogeneity between studies (I² 38.5%). There was insufficient data to calculate pooled kappa, sensitivity, specificity, PPV, and NPV estimates.

Addiction severity index-alcohol (alcohol sub-scale; ASI-A)

The pooled alpha estimate (18 data points) was moderate: 0.77 (95%CI = 0.73–0.81) and there was high heterogeneity between studies (I² 94.3%). The pooled sensitivity estimate for ASI-A (6 data points) was good: 0.83 (95%CI = 0.67–0.92) and there was high heterogeneity between studies (I² 87.6%). The pooled specificity estimate for ASI-A (6 data points) was moderate: 0.79 (95%CI = 0.67–0.88) and there was high heterogeneity between studies (I² 91.2%). There was insufficient data to calculate pooled kappa, PPV and NPV estimates for ASI-A.

Addiction severity index-drugs (drugs sub-scale; ASI-D)

The pooled alpha estimate for ASI-D (16 data points) was unsatisfactory: 0.68 (95%CI = 0.63–0.74) and there was high heterogeneity between studies (I² 95.6%). The pooled sensitivity estimate (5 data points) was good: 0.86 (95%CI = 0.83–0.89) and there was moderate heterogeneity between studies (I² 62.5%). The pooled specificity estimate (5 data points) was good: 0.85 (95%CI = 0.77–0.91) and there was high heterogeneity between studies (I² 86%). There was insufficient data to calculate the pooled kappa, PPV and NPV estimates.

The alcohol, smoking, and substance involvement screening test (ASSIST)

The pooled alpha estimate (7 data points) was good: 0.85 (95%CI = 0.80–0.91) and there was high heterogeneity between studies (I² 94%). The pooled sensitivity estimate (2 data points) was good: 0.83 (95%CI = 0.80–0.87) and there was low heterogeneity between studies (I² 0%). The pooled specificity estimate (2 data points) was moderate: 0.73 (95%CI = 0.57–0.88) and there was high heterogeneity between studies (I² 91%). There was insufficient data to calculate the pooled estimate for kappa, PPV, and NPV.

Alcohol use disorders identification test (AUDIT)

The pooled alpha estimate for AUDIT (80 data points) was moderate: 0.85 (95%CI = 0.83–0.87) and there was high heterogeneity between studies (I² 98%). The pooled kappa estimate for AUDIT (4 data points) was unsatisfactory: 0.46 (95%CI = 0.25–0.67) and there was high heterogeneity between studies (I² 0.99). The pooled sensitivity estimate for AUDIT (135 data points) was good: 0.86 (95%CI = 0.84–0.88) and there was high heterogeneity between studies (I² 97%). The pooled specificity estimate for AUDIT (135 data points) was good: 0.87 (95%CI = 0.85–0.89) and there was high heterogeneity between studies (I² 99%). The pooled PPV estimate for AUDIT (65 data points) was moderate: 0.61 (95%CI = 0.51–0.71) and there was high heterogeneity between studies (I² 99%). The pooled NPV estimate for AUDIT (54 data points) was excellent: 0.94 (95%CI = 0.93–0.95) and there was high heterogeneity between studies (I² 96%).

Alcohol use disorders identification Test-3 (AUDIT-3)

Alpha cannot be calculated for AUDIT-3 because it is a single-item measure. There was insufficient data to calculate the pooled estimate for kappa. The pooled sensitivity estimate for AUDIT-3 (22 data points) was good: 0.84 (95%CI = 0.80–0.88) and there was high heterogeneity between studies (I² 90%). The pooled specificity estimate for AUDIT-3 (22 data points) was good: 0.84 (95%CI = 0.75–0.90) and there was high heterogeneity between studies (I² 99%). The pooled PPV estimate for AUDIT-3 (9 data points) was moderate: 0.63 (95%CI = 0.49–0.77) and there was high heterogeneity between studies (I² 99%). The pooled NPV estimate (7 data points) was excellent: 0.94 (95%CI = 0.90–0.98) and there was high heterogeneity between studies (I² 95%).

Alcohol use disorders identification test-C (AUDIT-C)

The pooled alpha estimate for AUDIT-C (20 data points) was fair: 0.75 (95%CI = 0.70–0.80) and there was high heterogeneity between studies (I² 99%). The pooled kappa estimate for AUDIT-C (2 data points) was unsatisfactory: 0.41 (95%CI = 0.39–0.43) and there was low heterogeneity between studies (I² 0%). The pooled sensitivity estimate for AUDIT-C (45 data points) was good: 0.87 (95%CI = 0.84–0.90) and there was high heterogeneity between studies (I² 99%). The pooled specificity estimate for AUDIT-C (45 data points) was good: 0.84 (95%CI = 0.81–0.87) and there was high heterogeneity between studies (I² 99%). The pooled PPV estimate for AUDIT-C (22 data points) was low: 0.50 (95%CI = 0.39–0.60) and there was high heterogeneity between studies (I² 99%). The pooled NPV estimate for AUDIT-C (19 data points) was good: 0.88 (95%CI = 0.83–0.92) and there was high heterogeneity between studies (I² 99%).

Brief Michigan alcoholism screening test (B-MAST)

There was insufficient data to calculate the pooled estimate for B-MAST’s alpha and kappa. The pooled sensitivity estimate for B-MAST (21 data points) was low: 0.50 (95%CI = 0.38–0.62) and there was high heterogeneity between studies (I² 99%). The pooled specificity estimate for B-MAST (21 data points) was excellent: 0.97 (95%CI = 0.96–0.98) and there was high heterogeneity between studies (I² 97%). The pooled PPV estimate for B-MAST (3 data points) was moderate: 0.65 (95%CI = 0.38–0.93) and there was high heterogeneity between studies (I² 99%). The pooled NPV estimate for B-MAST (2 data points) was excellent: 0.90 (95%CI = 0.87–0.94) and there was moderate heterogeneity between studies (I² 33%).

Cut down, annoyed, guilty, eye-opener (CAGE)

The pooled alpha estimate for CAGE (22 data points) was unsatisfactory: 0.70 (95%CI = 0.65–0.75) and there was high heterogeneity between studies (I² 98%). The pooled kappa estimate for CAGE (3 data points) was unsatisfactory: 0.57 (95%CI = 0.34–0.81) and there was high heterogeneity between studies (I² 0.97). The pooled sensitivity estimate for CAGE (139 data points) was moderate: 0.70 (95%CI = 0.66–0.74) and there was high heterogeneity between studies (I² 98%). The pooled specificity estimate for CAGE (139 data points) was good: 0.90 (95%CI = 0.88–0.91) and there was high heterogeneity between studies (I² 99%). The pooled PPV estimate for CAGE (61 data points) was low: 0.51 (95%CI = 0.45–0.58) and there was high heterogeneity between studies (I² 99%). The pooled NPV estimate for CAGE (39 data points) was excellent: 0.91 (95%CI = 0.88–0.93) and there was high heterogeneity between studies (I² 97%).

Composite international diagnostic interview (CIDI), original version, version 2.1 and version 3

Alpha coefficients are not calculated for CIDI. The pooled kappa estimate for the original version of CIDI (2 data points) was moderate: 0.82 (95%CI = 0.61–1.02) and there was high heterogeneity between studies (I² 0.78). There was insufficient data to calculate the pooled estimate for sensitivity, specificity, PPV, and NPV for the original CIDI.

The pooled sensitivity estimate for CIDI version 2.1 (3 data points) was fair: 0.75 (95%CI = 0.69–0.81) and there was low heterogeneity between studies (I² 0.0%). The pooled specificity estimate for CIDI version 2.1 (3 data points) was good: 0.84 (95%CI = 0.69–1.00) and there was high heterogeneity between studies (I² 98.7%). There was insufficient data to calculate the pooled estimate for kappa, PPV, and NPV for CIDI version 2.1.

The pooled sensitivity estimate for CIDI version 3 (4 data points) was excellent: 0.91 (95%CI = 0.82–1.00) and there was moderate heterogeneity between studies (I² 48.1%). The pooled specificity estimate for CIDI version 3 (4 data points) was excellent: 0.99 (95%CI = 0.98–1.00) and there was low heterogeneity between studies (I² 0.0%). The pooled PPV estimate for CIDI version 3 (4 data points) was excellent: 0.91 (95%CI = 0.87–0.96) and there was low heterogeneity between studies (I² 0.0%). The pooled NPV estimate for CIDI version 3 (4 data points) was excellent: 0.99 (95%CI = 0.98–1.00) and there was low heterogeneity between studies (I² 0.0%). There was insufficient data to calculate the pooled estimate for kappa CIDI version 3.

Car, relax, alone, forget, friends, trouble (CRAFFT)

The pooled alpha estimate for CRAFFT (6 data points) was unsatisfactory: 0.69 (95%CI = 0.64–0.74) and there was high heterogeneity between studies (I² 83%). There was insufficient data to calculate the pooled estimate for kappa for CRAFFT. The pooled sensitivity estimate for CRAFFT (10 data points) was good: 0.90 (95%CI = 0.84–0.94) and there was high heterogeneity between studies (I² 97%). The pooled specificity estimate for CRAFFT (10 data points) was moderate: 0.76 (95%CI = 0.68–0.83) and there was high heterogeneity between studies (I² 97%). The pooled PPV estimate for CRAFFT (8 data points) was low: 0.57 (95%CI = 0.34–0.80) and there was high heterogeneity between studies (I² 99%). The pooled NPV estimate for CRAFFT (8 data points) was good: 0.86 (95%CI = 0.45–1.00) and there was high heterogeneity between studies (I² 99%).

Drug Abuse screen test (DAST)

The pooled alpha estimate for DAST (6 data points) was excellent: 0.94 (95%CI = 0.93–0.95) and there was low heterogeneity between studies (I² 0%). The pooled kappa estimate for DAST (2 data points) was moderate: 0.83 (95%CI = 0.58–1.00) and there was high heterogeneity between studies (I² 0.98). The pooled sensitivity estimate for DAST (7 data points) was good: 0.85 (95%CI = 0.74–0.92) and there was high heterogeneity between studies (I² 89%). The pooled specificity estimate for DAST (7 data points) was good: 0.84 (95%CI = 0.68–0.93) and there was high heterogeneity between studies (I² 97%). The pooled PPV estimate for DAST (5 data points) was low: 0.51 (95%CI = 0.32–0.70) and there was high heterogeneity between studies (I² 98%). The pooled NPV estimate for DAST (4 data points) was excellent: 0.95 (95%CI = 0.89–1.00) and there was high heterogeneity between studies (I² 81%).

Drug Abuse screen test - 10-item version (DAST-10)

The pooled alpha estimate DAST-10 (6 data points) was fair: 0.79 (95%CI = 0.68–0.89) and there was high heterogeneity between studies (I² 98%). There was insufficient data to calculate the pooled estimate for kappa for DAST-10. The pooled sensitivity estimate for DAST-10 (6 data points) was excellent: 0.90 (95%CI = 0.75–0.97) and there was high heterogeneity between studies (I² 95%). The pooled specificity estimate for DAST-10 (6 data points) was good: 0.82 (95%CI = 0.72–0.89) and there was high heterogeneity between studies (I² 92%). The pooled PPV estimate for DAST-10 (4 data points) was good: 0.80 (95%CI = 0.70–0.91) and there was high heterogeneity between studies (I² 99%). The pooled NPV estimate for DAST-10 (4 data points) was good: 0.86 (95%CI = 0.81–0.91) and there was moderate heterogeneity between studies (I² 40%).

Drug use disorders identification test (DUDIT)

The pooled alpha estimate for DUDIT (15 data points) was excellent: 0.92 (95%CI = 0.90–0.95) and there was high heterogeneity between studies (I² 96%). There was insufficient data to calculate the pooled kappa estimate for DUDIT. The pooled sensitivity estimate for DUDIT (12 data points) was excellent: 0.93 (95%CI = 0.89–0.96) and there was high heterogeneity between studies (I² 76%). The pooled specificity estimate for DUDIT (12 data points) was moderate: 0.79 (95%CI = 0.67–0.87) and there was high heterogeneity between studies (I² 96%). The pooled PPV estimate (5 data points) was moderate: 0.61 (95%CI = 0.34–0.87) and there was high heterogeneity between studies (I² 99%). The pooled NPV estimate (5 data points) was excellent: 0.92 (95%CI = 0.82–1.00) and there was high heterogeneity between studies (I² 78%).

Michigan alcohol screening test (MAST)

The pooled alpha estimate for MAST (8 data points) was moderate: 0.82 (95%CI = 0.78–0.86) and there was high heterogeneity between studies (I² 83%). The pooled kappa estimate for MAST (4 data points) was unsatisfactory: 0.69 (95%CI = 0.58–0.81) and there was high heterogeneity between studies (I² 0.88). The pooled sensitivity estimate for MAST (12 data points) was moderate: 0.70 (95%CI = 0.58–0.80) and there was high heterogeneity between studies (I² 95%). The pooled specificity estimate for MAST (12 data points) was good: 0.85 (95%CI = 0.77–0.91) and there was high heterogeneity between studies (I² 97%). The pooled PPV estimate for MAST (9 data points) was low: 0.51 (95%CI = 0.30–0.71) and there was high heterogeneity between studies (I² 98%). The pooled NPV estimate for MAST (6 data points) was good: 0.88 (95%CI = 0.82–0.94) and there was high heterogeneity between studies (I² 92%).

Problem oriented screening instrument for teenagers (POSIT)

The pooled alpha estimate for POSIT (2 data points) was good: 0.86 (95%CI = 0.73–0.98) and there was high heterogeneity between studies (I² 94%). The pooled sensitivity estimate for POSIT (3 data points) was good: 0.84 (95%CI = 0.72–0.96) and there was high heterogeneity between studies (I² 90%). The pooled specificity estimate for POSIT (3 data points) was good: 0.82 (95%CI = 0.75–0.90) and there was high heterogeneity between studies (I² 88%). There was insufficient data to calculate the pooled kappa, PPV, and NPV estimates for POSIT.

Self-administered alcoholism screening test (SAAST)

The pooled alpha estimate for SAAST (2 data points) was good: 0.89 (95%CI = 0.79–0.99) and there was high heterogeneity between studies (I² 95%). The pooled sensitivity estimate for SAAST (7 data points) was low: 0.52 (95%CI = 0.33–0.71) and there was high heterogeneity between studies (I² 98%). The pooled specificity estimate (7 data points) was good: 0.83 (95%CI = 0.76–0.90) and there was high heterogeneity between studies (I² 98%). The pooled PPV estimate for SAAST (6 data points) was low: 0.32 (95%CI = 0.22–0.42) and there was high heterogeneity between studies (I² 95%). The pooled NPV estimate for SAAST (6 data points) was excellent: 0.92 (95%CI = 0.89–0.95) and there was high heterogeneity between studies (I² 92%). There was insufficient data to calculate the pooled kappa estimates for SAAST.

Semi-structured assessment for drug dependence and alcoholism (SSADDA)

There are no alpha coefficients associated with semi-structures assessments such as SSADDA. The pooled kappa estimate for SSADDA (8 data points) was moderate: 0.84 (95%CI = 0.77–0.91) and there was high heterogeneity between studies (I² 0.97). There was insufficient data to calculate the pooled sensitivity, specificity, PPV and NPV estimates for SSADDA.

Severity of dependence (SDS)

The pooled alpha estimate for SDS (6 data points) was good: 0.86 (95%CI = 0.78–0.93) and there was high heterogeneity between studies (I² 95%). The pooled sensitivity estimate for SDS (6 data points) was good: 0.83 (95%CI = 0.76–0.90) and there was high heterogeneity between studies (I² 77%). The pooled specificity estimate (6 data points) was good: 0.84 (95%CI = 0.78–0.89) and there was moderate heterogeneity between studies (I² 44%). The pooled PPV estimate for SDS (3 data points) was good: 0.90 (95%CI = 0.86–0.94) and there was low heterogeneity between studies (I² 0%). The pooled NPV estimate for SDS (3 data points) was good: 0.83 (95%CI = 0.76–0.89) and there was low heterogeneity between studies (I² 3.5%). There was insufficient data to calculate the pooled kappa estimate for SDS.

Tolerance-annoyance cut down eye opener (T-ACE)

The pooled alpha estimate for T-ACE (2 data points) was unsatisfactory: 0.50 (95%CI = 0.47–0.52) and there was high heterogeneity between studies (I² 29%). The pooled sensitivity estimate for T-ACE (8 data points) was good: 0.83 (95%CI = 0.74–0.92) and there was high heterogeneity between studies (I² 96%). The pooled specificity estimate for T-ACE (8 data points) was moderate: 0.72 (95%CI = 0.65–0.79) and there was high heterogeneity between studies (I² 98%). The pooled PPV estimate for T-ACE (6 data points) was low: 0.35 (95%CI = 0.25–0.45) and there was high heterogeneity between studies (I² 99%). The pooled NPV estimate for T-ACE (2 data points) was good: 0.87 (95%CI = 0.62–1.00) and there was high heterogeneity between studies (I² 97%). There was insufficient data to calculate the pooled estimate for kappa for T-ACE.

Timeline Followback (TLFB)

There are no alpha coefficients associated with TLFB. The pooled kappa estimate for TLFB (3 data points) was good: 0.86 (95%CI = 0.81–0.91) and there was high heterogeneity between studies (I² 0.88). The pooled sensitivity estimate for TLFB (4 data points) was moderate: 0.80 (95%CI = 0.73–0.87) and there was moderate heterogeneity between studies (I² 63%). The pooled specificity estimate for TLFB (3 data points) was excellent: 0.97 (95%CI = 0.95–0.99) and there was low heterogeneity between studies (I² 0%). There was insufficient data to calculate the pooled estimate for PPV and NPV for TLFB.

Tolerance, worried, eye-opener, amnesia, cut down (TWEAK)

The pooled alpha estimate for TWEAK (3 data points) was unsatisfactory: 0.62 (95%CI = 0.55–0.69) and there was high heterogeneity between studies (I² 86%). The pooled sensitivity estimate for TWEAK (36 data points) was good: 0.85 (95%CI = 0.80–0.89) and there was high heterogeneity between studies (I² 96%). The pooled specificity estimate for TWEAK (36 data points) was good: 0.86 (95%CI = 0.82–0.90) and there was high heterogeneity between studies (I² 99%). The pooled PPV estimate for TWEAK (5 data points) was low: 0.43 (95%CI = 0.26–0.61) and there was high heterogeneity between studies (I² 99%). The pooled NPV estimate for TWEAK (2 data points) was good: 0.88 (95%CI = 0.70–1.00) and there was high heterogeneity between studies (I² 95%). There was insufficient data to calculate the pooled estimate for kappa for TWEAK.

The chemical use, Abuse, and dependence (CUAD)

The pooled alpha estimate for CUAD (3 data points) was excellent: 0.96 (95%CI = 0.94–0.98) and there was high heterogeneity between studies (I² 95%). There was insufficient data to calculate the pooled estimate for kappa, sensitivity, specificity, PPV, and NPV for CUAD.

Biomarkers

Alanine transaminase (ALT)

The pooled sensitivity estimate for ALT (32 data points) was low: 0.32 (95%CI = 0.24–0.40) and there was high heterogeneity between studies (I² 96.1%). The pooled specificity estimate for ALT (32 data points) was good: 0.88 (95%CI = 0.83–0.92) and there was high heterogeneity between studies (I² 95.8%). The pooled PPV estimate for ALT (7 data points) was low 0.37 (95%CI = 0.18–0.56) and there was high heterogeneity between studies (I² 96.1%). The pooled NPV estimate for ALT (4 data points) was moderate: 0.63 (95%CI = 0.42–0.85) and there was high heterogeneity between studies (I² 97.5%).

Aspartate transaminase (AST)

The pooled sensitivity estimate for AST (33 data points) was low: 0.48 (95%CI = 0.40–0.55) and there was high heterogeneity between studies (I² 97%). The pooled specificity estimate for AST (33 data points) was good: 0.86 (95%CI = 0.81–0.90) and there was high heterogeneity between studies (I² 97%). The pooled PPV estimate for AST (8 data points) was low: 0.42 (95%CI = 0.27–0.57) and there was high heterogeneity between studies (I² 93%). The pooled NPV estimate for AST (6 data points) was moderate: 0.69 (95%CI = 0.55–0.83) and there was high heterogeneity between studies (I² 95%).

Aspartate transaminase, alanine transaminase ratio (AST/ALT ratio)

The pooled sensitivity estimate for AST/ALT ratio (6 data points) was low: 0.34 (95%CI = 0.22–0.46) and there was high heterogeneity between studies (I² 96%). The pooled specificity estimate (4 data points) was moderate: 0.73 (95%CI = 0.52–0.94) and there was high heterogeneity between studies (I² 98%). There was insufficient data to calculate the pooled estimate for PPV and NPV.

Blood alcohol concentration (BAC)

The pooled sensitivity estimate for BAC (5 data points) was moderate: 0.64 (95%CI = 0.59–0.69) and there was moderate heterogeneity between studies (I² 44%). The pooled specificity estimate for BAC (5 data points) was moderate: 0.80 (95%CI = 0.72–0.87) and there was high heterogeneity between studies (I² 93%). The pooled PPV estimate for BAC (3 data points) was low: 0.60 (95%CI = 0.15–1.00) and there was high heterogeneity between studies (I² 98%). The pooled NPV estimate for BAC (3 data points) was moderate: 0.69 (95%CI = 0.52–0.86) and there was high heterogeneity between studies (I² 93%).

Carbohydrate deficient transferrin (CDT)

There are no alpha and kappa coefficients associated with biomarkers such as CDT. The pooled sensitivity estimate for CDT (8 data points) was low: 0.59 (95%CI = 0.43–0.73) and there was high heterogeneity between studies (I² 97%). The pooled specificity estimate for CDT (8 data points) was excellent: 0.96 (95%CI = 0.93–0.98) and there was moderate heterogeneity between studies (I² 72%). The pooled PPV estimate for CDT (6 data points) was good: 0.85 (95%CI = 0.74–0.97) and there was high heterogeneity between studies (I² 76%). The pooled NPV estimate for CDT (6 data points) was moderate: 0.79 (95%CI = 0.73–0.85) and there was high heterogeneity between studies (I² 96%).

Carbohydrate deficient transferrin-tech (CDTech)

There are no alpha and kappa coefficients associated with biomarkers such as CDTech. The pooled sensitivity estimate for CDTech (41 data points) was low: 0.54 (95%CI = 0.45–0.62) and there was high heterogeneity between studies (I² 99%). The pooled specificity estimate for CDTech (41 data points) was good: 0.89 (95%CI = 0.88–0.91) and there was high heterogeneity between studies (I² 88%). The pooled PPV estimate for CDTech (12 data points) was low: 0.52 (95%CI = 0.37–0.67) and there was high heterogeneity between studies (I² 95%). The pooled NPV estimate for CDTech (8 data points) was moderate: 0.80 (95%CI = 0.61–0.98) and there was high heterogeneity between studies (I² 99%).

Carbohydrate deficient transferrin with mean corpuscular volume (CDT with MCV)

There are no alpha and kappa coefficients associated with biomarkers such as CDT and MCV. The pooled sensitivity estimate for CDT with MCV (8 data points) was moderate: 0.74 (95%CI = 0.60–0.88) and there was high heterogeneity between studies (I² 98%). The pooled specificity estimate for CDT with MCV (4 data points) was excellent: 0.93 (95%CI = 0.91–0.95) and there was low heterogeneity between studies (I² 0%). The pooled PPV estimate for CDT with MCV (4 data points) was moderate: 0.74 (95%CI = 0.51–0.97) and there was high heterogeneity between studies (I² 98%). The pooled NPV estimate for CDT with MCV (4 data points) was excellent: 0.92 (95%CI = 0.83–1.00) and there was high heterogeneity between studies (I² 95%).

Gamma-Glutamyl Transferase (GGT)

There are no alpha and kappa coefficients associated with biomarkers such as GGT. The pooled sensitivity estimate for GGT (76 data points) was low: 0.57 (95%CI = 0.50–0.64) and there was high heterogeneity between studies (I² 99%). The pooled specificity estimate for GGT (76 data points) was good: 0.83 (95%CI = 0.78–0.86) and there was high heterogeneity between studies (I² 98%). The pooled PPV estimate for GGT (30 data points) was low: 0.43 (95%CI = 0.35–0.51) and there was high heterogeneity between studies (I² 97%). The pooled NPV estimate for GGT (23 data points) was good: 0.82 (95%CI = 0.70–0.94) and there was high heterogeneity between studies (I² 99%).

Gamma-Glutamyl Transferase with mean corpuscular volume (GGT with MCV)

There are no alpha and kappa coefficients associated with biomarkers such as GGT and MCV. The pooled sensitivity estimate for GGT with MCV (10 data points) was moderate: 0.64 (95%CI = 0.38–0.84) and there was high heterogeneity between studies (I² 99%). The pooled specificity estimate for GGT with MCV (10 data points) was good: 0.87 (95%CI = 0.76–0.93) and there was high heterogeneity between studies (I² 97%). The pooled PPV estimate for GGT with MCV (6 data points) was low: 0.47 (95%CI = 0.28–0.66) and there was high heterogeneity between studies (I² 98%). The pooled NPV estimate for GGT with MCV (6 data points) was good: 0.88 (95%CI = 0.81–0.95) and there was high heterogeneity between studies (I² 94%).

Ethyl glucuronide (EtG)

There are no alpha and kappa coefficients associated with biomarkers such as EtG. The pooled sensitivity estimate for EtG (6 data points) was good: 0.83 (95%CI = 0.61–0.94) and there was high heterogeneity between studies (I² 91%). The pooled specificity estimate for EtG (6 data points) was excellent: 0.95 (95%CI = 0.90–0.98) and there was high heterogeneity between studies (I² 66%). The pooled PPV estimate for EtG (2 data points) was moderate: 0.61 (95%CI = 0.39–0.84) and there was moderate heterogeneity between studies (I² 58%). The pooled NPV estimate for EtG (2 data points) was good: 0.86 (95%CI = 0.78–0.94) and there was moderate heterogeneity between studies (I² 60%).

Mean corpuscular volume (MCV)

There are no alpha and kappa coefficients associated with biomarkers such as MCV. The pooled sensitivity estimate for MCV (55 data points) was low: 0.39 (95%CI = 0.33–0.45) and there was high heterogeneity between studies (I² 97%). The pooled specificity estimate for MCV (55 data points) was excellent: 0.91 (95%CI = 0.88–0.93) and there was high heterogeneity between studies (I² 98%). The pooled PPV estimate for MCV (28 data points) was low: 0.48 (95%CI = 0.36–0.59) and there was high heterogeneity between studies (I² 98%). The pooled NPV estimate for MCV (22 data points) was moderate: 0.79 (95%CI = 0.73–0.86) and there was high heterogeneity between studies (I² 99%).

Percent carbohydrate deficient transferrin (%CDT)

The pooled sensitivity estimate for %CDT (40 data points) was low: 0.56 (95%CI = 0.47–0.65) and there was high heterogeneity between studies (I² 98.2%). The pooled specificity estimate for %CDT (40 data points) was 0.91, which is considered as excellent (95%CI = 0.88–0.94) and there was high heterogeneity between studies (I² 97%). The pooled PPV estimate for %CDT (13 data points) was low: 0.58 (95%CI = 0.38–0.78) and there was high heterogeneity between studies (I² 98.5%). The pooled NPV estimate for %CDT (13 data points) was good: 0.85 (95%CI = 0.78–0.92) and there was high heterogeneity between studies (I² 97.6%).

Phosphatidylethanol (PEth)

There are no alpha and kappa coefficients associated with biomarkers such as PEth. The pooled sensitivity estimate for PEth (7 data points) was good: 0.87 (95%CI = 0.79–0.96) and there was high heterogeneity between studies (I² 94%). The pooled specificity estimate for PEth (4 data points) was excellent: 0.94 (95%CI = 0.91–0.97) and there was moderate heterogeneity between studies (I² 31%). There was insufficient data to calculate the pooled estimate for PPV and NPV for PEth.

Discussion

In this systematic review and meta-analysis, we identified 387 unique papers that have published data on the validity, reliability and diagnostic accuracy of 37 scales for substance classes that are associated with HIV risk. We observed based on meta-analyzable data available, that fourteen of the thirty-seven measures/scales (38%) that had all pooled estimates consistently meet criteria for acceptability (e.g., ranging between fair/moderate-to-excellent), which included the following self-reported measures:

Alcohol Dependence Scale
Addiction Severity Index (ASI)
ASI subscale for Alcohol; ASSIST
Composite International Diagnostic Interview (version original, version 2.1, and version 3)
Drug Abuse Screen Test - 10 item scale
Drug Use Disorders Identification Test
Problem Oriented Screening Instrument for Teenagers
Severity of Dependence scale
Timeline Followback
Chemical Use, Abuse, and Dependence

Biomarkers that had all pooled estimates that were fair/moderate or better include the following:

Ethyl glucuronide
Phosphatidylethanol test
The combined used of Carbohydrate deficient transferrin and Mean corpuscular volume.

Taken together, our findings highlight the availability of a promising range of tools for researchers and practitioners when assessing substance use, particularly those working with classes of substances associated with HIV risk, such as heroin, methamphetamine, cocaine, ecstasy, and alcohol. Nevertheless, further research is needed to determine why some substance use measures do not consistently have acceptable psychometric properties across different studies.

Overall, while most of the self-reported scales had acceptable validity, most did not have acceptable reliability: 65% of pooled estimates for alpha were in the range of fair-to-excellent though only 44% of estimates for kappa were in the range of fair-to-excellent. Moreover, a greater proportion of the scales we identified and meta-analyzed were better at correctly identifying individuals who are truly not using substances/not problematic users among those truly without these conditions (specificity: 97% of summary estimates) and among those who were deemed as not having this condition in the scale (negative predictive value: 96%). In contrast to specificity and negative predictive value estimates, fewer scales had pooled estimates on sensitivity and positive predictive value that were in the fair-to-excellent range (69 and 37%, respectively). These may have implications in the application of these measures in different settings. For example, in the criminal justice system, it may be better to utilize measures that have high specificity and negative predictive properties if the priority is to avoid false-positive results. However, in health settings, it may be more ideal to use measures with better sensitivity and positivity to better capture individuals who may require further assessment for substance use disorder assessments and treatment referrals, as appropriate.

Overall, the studies identified in this review had administered scales in English, were conducted within in the United States, and were less commonly tested among exclusively-women samples (there were twice as many exclusively-men samples in comparison). These findings highlight the general lack of diversity in terms of language, setting, and study population for the studies reporting validity, reliability, and diagnostic accuracy on substance use measures. Given the high morbidity and mortality associated with substance use globally and for different risk populations, greater effort is needed to further evaluate the psychometric properties of substance use measures in such samples. This study also found that few papers on substance use psychometric properties are “low risk” across all QUADAS 2 domains (16%). This finding highlights the need to further study the validity, reliability, and diagnostic accuracy of substance use measures using studies designed with better methodological rigor to reduce risk of bias.

This present study has several limitations. First, our inclusion criteria may have excluded some potentially relevant studies on the psychometric properties of substance use measures that were not published in English. Hence, although we included measures that were not administered in English as long as they were published in English, our findings may not necessarily be generalizable to the psychometric properties of non-English measures that were not published in English. It should also be noted that our eligibility criteria likely favored the inclusion of studies that were conducted in settings where English proficiency was higher, which is correlated with countries with higher gross national income per capita [43]. Moreover, while our search strategy was developed to try and identify all the relevant studies, many publications that have calculated our psychometric properties of interest may not have language referencing the specific key words/terms in our strategy in their titles and/or abstracts. In particular, this may occur because the psychometric data of scales may not be considered a “primary outcome” of a study, and thus not be highlighted in the title or abstract (i.e., the relevant data are imbedded within the full-text only). Additionally, while we did not specifically seek out studies only among HIV-risk populations, per se, our study did focus on substance classes that have been associated with HIV risk, namely alcohol, stimulants (methamphetamine, amphetamine, cocaine, ecstasy), and heroin. Hence, our search may have missed studies on more general substance use measures that did not explicitly name our targeted substance classes. Furthermore, we were unable to calculate pooled estimates for some psychometric outcomes of several measures due to lack of published data or insufficient data, including for some widely used assessments previously shown to be valid and reliable, such as the DSM-IV diagnostic modules used in the US National Surveys of Drug Use and Health, the Diagnostic Interview schedule, and the AUDADIS [44,45,46]. Another limitation in our meta-analysis is related to our narrow definition of validity, which focused on internal validity as measured by Cronbach’s alpha values. We acknowledge that there are a range of other characteristics that examine validity that we did not include in our analysis such as criterion validity, predictive validity, and other psychometric properties [32]. Further research is needed to fill our gaps in knowledge on the psychometric properties of these substance use measures to enable pooled summary estimate calculations. In addition, we recognize the limitation from pooling alpha and kappa statistics from clinical and epidemiologic/community samples given how these statistical measures are margin-sensitive. Moreover, with respect to the synthesis of data on sensitivity and specificity, we acknowledge that some studies may have used imperfect gold-standards, which may lead to distorted values for the individual estimates for sensitivity and specificity. Therefore, it may be appropriate to refer to results as co-positivity and co-negativity, as suggested by Buck and Gart [47]. Finally, we also recognized that disease spectrum severity and prevalence can affect test performance for sensitivity and specificity [48, 49]. Our results should be interpreted with these limitations in mind.

To our knowledge, this is the first systematic review and meta-analysis involving the synthesis of psychometric data across different measures of substances that are associated with HIV risk. As mentioned, limited research has been conducted with respect with quantitatively pooling the psychometric characteristics of substance use measures. Our findings highlight the general strengths of many substance use measures with respect to their validity, reliability, and diagnostic accuracy across multiple studies/samples. To facilitate the dissemination of these findings, and provide researchers with a resource to identify validated, reliable, and accurate measures for substance use, we collaborated with members of the HIV Prevention Trials Network (HPTN) Substance Use Scientific Committee to develop a web-based tool, with the results of the pooled summary estimates presented in this study. The tool, named “Substance Use Measure Identification (SUMI) Tool” is available as a free resource in the HPTN's website (URL: https://www.hptn.org/researchtools).

Conclusion

In summary, researchers in the field of substance use should endeavor to conduct more validity, reliability, and diagnostic accuracy studies on measures to identify substance use and use disorders among more diverse settings and populations, and with more rigorous study designs. Ultimately, accurate identification of substance users and problematic substance use is a critical step in identifying individuals for substance use treatment and evaluating the effectiveness of treatment strategies. Hence, further evaluation of substance use measures is of great importance not only to the field of substance use research, but also substance use treatment. Given the substantial contribution of substance use to the global burden of disease [5], having robust data on the.

psychometric properties of substance use measure can help researchers identify the best tools to use in research studies, further enhancing the collection of more valid, reliable, accurate data to inform evidence-based responses to substance use.

Availability of data and materials

All data used in this meta-analyses have been previously published and accessible in the literature.

Abbreviations

%CDT:: % Carbohydrate deficient transferrin
ADS:: Alcohol Dependence Scale
ALT:: Alanine transaminase
ART:: Antiretroviral therapy
ASI:: Addiction Severity Index
ASI-A:: Addiction Severity Index-Alcohol (alcohol sub-scale)
ASI-D:: Addiction Severity Index-Drugs (drugs sub-scale)
ASSIST:: The Alcohol, Smoking, and Substance Involvement Screening Test
AST:: Aspartate transaminase
AST/ALT:: Aspartate transaminase, Alanine transaminase ratio
AUDADIS:: Alcohol Use Disorder and Associated Disabilities Interview Schedule
AUDIT:: Alcohol Use Disorders Identification Test
AUDIT-3:: Alcohol Use Disorders Identification Test - Question 3
AUDIT-C:: Alcohol Use Disorders Identification Test - C
B-MAST:: Brief Michigan Alcoholism Screening Test
BAC:: Blood alcohol concentration
CAGE:: Cut down, Annoyed, Guilty, Eye-opener
CDT:: Carbohydrate deficient transferrin
CDTech:: CDTech
CDT + MCV:: Carbohydrate deficient transferrin + Mean corpuscular volume
CIDI:: Composite International Diagnostic Interview
CRAFFT:: Car, Relax, Alone, Forget, Friends, Trouble
CUAD:: The Chemical Use, Abuse, and Dependence
DALY:: Disability-adjusted life year
DAST:: Drug Abuse Screen Test
DAST-10:: Drug Abuse Screen Test – 10 item
DSM:: Diagnostic and Statistical Manual of Mental Disorders
DUDIT:: Drug Use Disorders Identification Test
GGT:: Gamma-Glutamyl Transferase
GGT + MCV:: Gamma-Glutamyl Transferase + Mean corpuscular volume
HIV:: Human immunodeficiency virus
EtG:: Ethyl glucuronide
MAST:: Michigan Alcohol Screening Test
MCV:: Mean corpuscular volume
MDMA:: 3,4-methylenedioxy-methamphetamine
MeSH :: Medical Subject Headings
MLIS:: Master’s degree in library and information science
NPV:: Negative predictive value
PEth:: Phosphatidylethanol
PLWH:: People living with HIV
POSIT:: Problem Oriented Screening Instrument for Teenagers
PPV:: Positive predictive value
QUADAS-2:: Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies
SAAST:: Self-Administered Alcoholism Screening Test
SSADDA:: Semi-Structured Assessment for Drug Dependence and Alcoholism
SDS:: Severity of Dependence
TACE:: Tolerance-Annoyance Cut Down Eye Opener
TLFB:: Timeline Followback
TWEAK:: Tolerance, Worried, Eye-Opener, Amnesia, Cut down

References

United Nations Office on Drugs and Crime. World Drug Report 2017. Vienna: United Nations Office on Drugs and Crime; 2017. p. 2017.
Book Google Scholar
World Health Organization. Management of Substance Abuse: Alcohol: World Health Organization; 2017 [Available from: http://www.who.int/substance_abuse/facts/alcohol/en/.
Google Scholar
G. B. D. Disease Injury Incidence Prevalence Collaborators. Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet. 2017;390(10100):1211–59.
Article Google Scholar
Degenhardt L, Whiteford HA, Ferrari AJ, Baxter AJ, Charlson FJ, Hall WD, et al. Global burden of disease attributable to illicit drug use and dependence: findings from the global burden of disease study 2010. Lancet. 2013;382(9904):1564–74.
Article PubMed Google Scholar
G. B. D. Risk Factors Collaborators. Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990-2015: a systematic analysis for the global burden of disease study 2015. Lancet. 2016;388(10053):1659–724.
Article Google Scholar
Shoptaw S, Montgomery B, Williams CT, El-Bassel N, Aramrattana A, Metsch L, et al. Not just the needle: the state of HIV-prevention science among substance users and future directions. J Acquir Immune Defic Syndr. 2013;63(Suppl 2):S174–8.
Article PubMed PubMed Central Google Scholar
Rowe C, Santos GM, McFarland W, Wilson EC. Prevalence and correlates of substance use among trans female youth ages 16-24 years in the San Francisco Bay Area. Drug Alcohol Depend. 2015;147:160–6.
Article PubMed Google Scholar
Santos GM, Coffin PO, Das M, Matheson T, DeMicco E, Raiford JL, et al. Dose-response associations between number and frequency of substance use and high-risk sexual behaviors among HIV-negative substance-using men who have sex with men (SUMSM) in San Francisco. J Acquir Immune Defic Syndr. 2013;63(4):540–4.
Article PubMed PubMed Central Google Scholar
Colfax G, Santos GM, Chu P, Vittinghoff E, Pluddemann A, Kumar S, et al. Amphetamine-group substances and HIV. Lancet. 2010;376(9739):458–74.
Article PubMed Google Scholar
Santos GM, Das M, Colfax GN. Interventions for non-injection substance use among US men who have sex with men: what is needed. AIDS Behav. 2011;15(Suppl 1):S51–6.
Article PubMed Google Scholar
Strathdee SA, Shoptaw S, Dyer TP, Quan VM, Aramrattana A. Substance use scientific committee of the HIVPTN. Towards combination HIV prevention for injection drug users: addressing addictophobia, apathy and inattention. Curr Opin HIV AIDS. 2012;7(4):320–5.
Article PubMed PubMed Central Google Scholar
Ostrow DG, Plankey MW, Cox C, Li X, Shoptaw S, Jacobson LP, et al. Specific sex drug combinations contribute to the majority of recent HIV seroconversions among MSM in the MACS. J Acquir Immune Defic Syndr. 2009;51(3):349–55.
Article PubMed PubMed Central Google Scholar
Koblin BA, Husnik MJ, Colfax G, Huang Y, Madison M, Mayer K, et al. Risk factors for HIV infection among men who have sex with men. AIDS. 2006;20(5):731–9.
Article PubMed Google Scholar
Kerr T, Shannon K, Ti L, Strathdee S, Hayashi K, Nguyen P, et al. Sex work and HIV incidence among people who inject drugs. AIDS. 2016;30(4):627–34.
Article PubMed Google Scholar
Strathdee SA, Galai N, Safaiean M, Celentano DD, Vlahov D, Johnson L, et al. Sex differences in risk factors for hiv seroconversion among injection drug users: a 10-year perspective. Arch Intern Med. 2001;161(10):1281–8.
Article CAS PubMed Google Scholar
Hinkin CH, Barclay TR, Castellon SA, Levine AJ, Durvasula RS, Marion SD, et al. Drug use and medication adherence among HIV-1 infected individuals. AIDS Behav. 2007;11(2):185–94.
Article PubMed PubMed Central Google Scholar
DeLorenze GN, Weisner C, Tsai AL, Satre DD, Quesenberry CP Jr. Excess mortality among HIV-infected patients diagnosed with substance use dependence or abuse receiving care in a fully integrated medical care program. Alcohol Clin Exp Res. 2011;35(2):203–10.
Article PubMed Google Scholar
Chander G, Himelhoch S, Moore RD. Substance abuse and psychiatric disorders in HIV-positive patients: epidemiology and impact on antiretroviral therapy. Drugs. 2006;66(6):769–89.
Article PubMed Google Scholar
Dhalla S, Zumbo BD, Poole G. A review of the psychometric properties of the CRAFFT instrument: 1999-2010. Curr Drug Abuse Rev. 2011;4(1):57–64.
Article PubMed Google Scholar
Berman AH, Bergman H, Palmstierna T, Schlyter F. Evaluation of the drug use disorders identification test (DUDIT) in criminal justice and detoxification settings and in a Swedish population sample. Eur Addict Res. 2005;11(1):22–31.
Article PubMed Google Scholar
Berner MM, Kriston L, Bentele M, Harter M. The alcohol use disorders identification test for detecting at-risk drinking: a systematic review and meta-analysis. Journal of studies on alcohol and drugs. 2007;68(3):461–73.
Article PubMed Google Scholar
Substance Abuse and Mental Health Services Administration (SAMHSA). The Role of Biomarkers in the Treatment of Alcohol Use Disorders. SAMHSA Advisory. 2012;11(2):1–8. .
Manea L, Gilbody S, McMillan D. A diagnostic meta-analysis of the patient health Questionnaire-9 (PHQ-9) algorithm scoring method as a screen for depression. Gen Hosp Psychiatry. 2015;37(1):67–75.
Article PubMed Google Scholar
Stockings E, Degenhardt L, Lee YY, Mihalopoulos C, Liu A, Hobbs M, et al. Symptom screening scales for detecting major depressive disorder in children and adolescents: a systematic review and meta-analysis of reliability, validity and diagnostic utility. J Affect Disord. 2015;174:447–63.
Article PubMed Google Scholar
Mitchell AJ, Coyne JC. Do ultra-short screening instruments accurately detect depression in primary care? A pooled analysis and meta-analysis of 22 studies. Br J Gen Pract. 2007;57(535):144–51.
PubMed PubMed Central Google Scholar
Scaini S, Battaglia M, Beidel DC, Ogliari A. A meta-analysis of the cross-cultural psychometric properties of the social phobia and anxiety inventory for children (SPAI-C). J Anx Disord. 2012;26(1):182–8.
Article Google Scholar
Newton AS, Soleimani A, Kirkland SW, Gokiert RJ. A systematic review of instruments to identify mental health and substance use problems among children in the emergency department. Acad Emerg Med Off J Soc Acad Emerg Med. 2017;24(5):552–68.
Article Google Scholar
Newton AS, Gokiert R, Mabood N, Ata N, Dong K, Ali S, et al. Instruments to detect alcohol and other drug misuse in the emergency department: a systematic review. Pediatrics. 2011;128(1):e180–92.
Article PubMed Google Scholar
Mitchell AJ, Bird V, Rizzo M, Hussain S, Meader N. Accuracy of one or two simple questions to identify alcohol-use disorder in primary care: a meta-analysis. Br J Gen Pract. 2014;64(624):e408–18.
Article PubMed PubMed Central Google Scholar
Dhalla S, Kopec JA. The CAGE questionnaire for alcohol misuse: a review of reliability and validity studies. Clin Invest Med. 2007;30(1):33–41.
Article PubMed Google Scholar
Allen JP, Reinert DF, Volk RJ. The alcohol use disorders identification test: an aid to recognition of alcohol problems in primary care patients. Prev Med. 2001;33(5):428–33.
Article CAS PubMed Google Scholar
Boateng GO, Neilands TB, Frongillo EA, Melgar-Quinonez HR, Young SL. Best practices for developing and validating scales for health, social, and behavioral research: a primer. Front Public Health. 2018;6:149.
Article PubMed PubMed Central Google Scholar
Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.
Article PubMed Google Scholar
Harris RJ, Bradburn MJ, Deeks JJ, Harbord RM, Altman D, Sterne JA. Metan: fixed- and random-effects meta-analysis. Stata J. 2008;8(1):3–28.
Article Google Scholar
Macaskill P, Gatsonis C, Deeks JJ, Harbord RM, Takwoingi Y. Chapter 10: Analysing and Presenting Results. In: Deeks JJ, Bossuyt PM, Gatsonis C, editors. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 10; 2010.
Google Scholar
Freeman K, Taylor-Phillips S, Connock M, Court R, Tsertsvadze A, Shyangdan D, et al. Test accuracy of drug and antibody assays for predicting response to antitumour necrosis factor treatment in Crohn's disease: a systematic review and meta-analysis. BMJ Open. 2017;7(6):e014581.
Article PubMed PubMed Central Google Scholar
Harbord RM. Metandi: meta-analysis of diagnostic accuracy using hierarchical logistic regression. Stata J. 2009;9(2):211–29.
Article Google Scholar
Harbord RM, Deeks JJ, Egger M, Whiting P, Sterne JA. A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics. 2007;8(2):239–51.
Article PubMed Google Scholar
Ponterotto JG, Ruckdeschel DE. An overview of coefficient alpha and a reliability matrix for estimating adequacy of internal consistency coefficients with psychological research measures. Percept Mot Skills. 2007;105(3 Pt 1):997–1014.
Article PubMed Google Scholar
Andrews JA, Lewinsohn PM, Hops H, Roberts RE. Psychometric properties of scales for the measurement of psychosocial variables associated with depression in adolescence. Psychol Rep. 1993;73(3 Pt 1):1019–46.
CAS PubMed Google Scholar
Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. Bmj. 2003;327(7414):557–60.
Article PubMed PubMed Central Google Scholar
Deeks JJ, Macaskill P, Irwig L. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58(9):882–93.
Article PubMed Google Scholar
McCormick C. Countries with Better English Have Better Economies. Harv Bus Rev. 2013;2013(11): 1–4. .
Substance Abuse and Mental Health Services Administration. Results from the 2013 National Survey on Drug Use and Health: Summary of National Findings,. Rockville, MD; 2014. Contract No.: Publication No. (SMA). p. 14–4863.
Google Scholar
Robins LN, Helzer JE, Croughan J, Ratcliff KS. National Institute of Mental Health diagnostic interview schedule. Its history, characteristics, and validity. Arch Gen Psychiatry. 1981;38(4):381–9.
Article CAS PubMed Google Scholar
Grant BF, Goldstein RB, Smith SM, Jung J, Zhang H, Chou SP, et al. The alcohol use disorder and associated disabilities interview Schedule-5 (AUDADIS-5): reliability of substance use and psychiatric disorder modules in a general population sample. Drug Alcohol Depend. 2015;148:27–33.
Article PubMed Google Scholar
Buck AA, Gart JJ. Comparison of a screening test and a reference test in epidemiologic studies. I. Indices of agreement and their relation to prevalence. Am J Epidemiol. 1966;83(3):586–92.
Article CAS PubMed Google Scholar
Schmidt RL, Factor RE. Understanding sources of bias in diagnostic accuracy studies. Arch Pathol Lab Med. 2013;137(4):558–65.
Article PubMed Google Scholar
Bentley TG, Catanzaro A, Ganiats TG. Implications of the impact of prevalence on test thresholds and outcomes: lessons from tuberculosis. BMC Res Notes. 2012;5:563.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We would like to thank Evans Whitaker, MD, MLIS from University of California San Francisco library for his assistance with the development and execution of the search strategy. We also thank the members of the HPTN Substance Use Scientific Committee for the feedback they provided on this project.

Funding

This study was supported by HPTN, which receives its funding from three NIH Institutes: the National Institute of Allergy and Infectious Diseases, the National Institute of Mental Health and the National Institute on Drug Abuse (Grant # UM1 AI068619). No funding bodies had any role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Department of Community Health Systems, University of California San Francisco, 25 Van Ness Avenue, Suite 500, San Francisco, CA, 94102, USA
Glenn-Milo Santos
Center for Public Health Research, San Francisco Department of Public Health, San Francisco, CA, USA
Glenn-Milo Santos, Poonam Patel, Divya Subramanian, Foram Choksi & Brian Kang
Division of Global Public Health, University of California San Diego, San Diego, CA, USA
Steffanie A. Strathdee & Danielle Horyniak
School of Social Work, Columbia University, New York, NY, USA
Nabila El-Bassel, Charlotte McCullagh & Phillip Marotta
Burnet Institute, Melbourne, VIC, Australia
Danielle Horyniak
Monash University, School of Public Health and Preventive Medicine, Melbourne, VIC, Australia
Danielle Horyniak
Department of Family Medicine and Psychiatry and Biobehavioral Sciences, University of California Los Angeles, Los Angeles, CA, USA
Ryan R. Cook & Steven Shoptaw
Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
Isabel Allen

Authors

Glenn-Milo Santos
View author publications
You can also search for this author in PubMed Google Scholar
Steffanie A. Strathdee
View author publications
You can also search for this author in PubMed Google Scholar
Nabila El-Bassel
View author publications
You can also search for this author in PubMed Google Scholar
Poonam Patel
View author publications
You can also search for this author in PubMed Google Scholar
Divya Subramanian
View author publications
You can also search for this author in PubMed Google Scholar
Danielle Horyniak
View author publications
You can also search for this author in PubMed Google Scholar
Ryan R. Cook
View author publications
You can also search for this author in PubMed Google Scholar
Charlotte McCullagh
View author publications
You can also search for this author in PubMed Google Scholar
Phillip Marotta
View author publications
You can also search for this author in PubMed Google Scholar
Foram Choksi
View author publications
You can also search for this author in PubMed Google Scholar
Brian Kang
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Allen
View author publications
You can also search for this author in PubMed Google Scholar
Steven Shoptaw
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

GMS performed the data analysis, interpreted the data, and let the preparation of the manuscript. SAS, NE, SS, designed the study with GMS, and contributed to data interpretation and revising the manuscript critically for important intellectual content. PP, DS, DH, RC, CM, PM, FC, BK performed the systematic search, and data extraction, and contributed to data interpretation and revising the manuscript critically for important intellectual content. IA provided input on the data analysis and revise the manuscript critically for important intellectual content. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Glenn-Milo Santos.

Ethics declarations

Ethics approval and consent to participate

This study involved only analysis of data from published scientific literature; we did not collect any primary data.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1.

Additional file 2: Table S1.

Characteristics and Risk of Bias Studies Included in Meta-Analyses. Table S2. References of Studies Meta-Analyzed, by Scale.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Santos, GM., Strathdee, S.A., El-Bassel, N. et al. Psychometric properties of measures of substance use: a systematic review and meta-analysis of reliability, validity and diagnostic test accuracy. BMC Med Res Methodol 20, 106 (2020). https://doi.org/10.1186/s12874-020-00963-7

Download citation

Received: 22 July 2019
Accepted: 30 March 2020
Published: 07 May 2020
DOI: https://doi.org/10.1186/s12874-020-00963-7