COVID-19-related medical research: a meta-research and critical appraisal

Background Since the start of the COVID-19 outbreak, a large number of COVID-19-related papers have been published. However, concerns about the risk of expedited science have been raised. We aimed at reviewing and categorizing COVID-19-related medical research and to critically appraise peer-reviewed original articles. Methods The data sources were Pubmed, Cochrane COVID-19 register study, arXiv, medRxiv and bioRxiv, from 01/11/2019 to 01/05/2020. Peer-reviewed and preprints publications related to COVID-19 were included, written in English or Chinese. No limitations were placed on study design. Reviewers screened and categorized studies according to i) publication type, ii) country of publication, and iii) topics covered. Original articles were critically appraised using validated quality assessment tools. Results Among the 11,452 publications identified, 10,516 met the inclusion criteria, among which 7468 (71.0%) were peer-reviewed articles. Among these, 4190 publications (56.1%) did not include any data or analytics (comprising expert opinion pieces). Overall, the most represented topics were infectious disease (n = 2326, 22.1%), epidemiology (n = 1802, 17.1%), and global health (n = 1602, 15.2%). The top five publishing countries were China (25.8%), United States (22.3%), United Kingdom (8.8%), Italy (8.1%) and India (3.4%). The dynamic of publication showed that the exponential growth of COVID-19 peer-reviewed articles was mainly driven by publications without original data (mean 261.5 articles ± 51.1 per week) as compared with original articles (mean of 69.3 ± 22.3 articles per week). Original articles including patient data accounted for 713 (9.5%) of peer-reviewed studies. A total of 576 original articles (80.8%) showed intermediate to high risk of bias. Last, except for simulation studies that mainly used large-scale open data, the median number of patients enrolled was of 102 (IQR = 37–337). Conclusions Since the beginning of the COVID-19 pandemic, the majority of research is composed by publications without original data. Peer-reviewed original articles with data showed a high risk of bias and included a limited number of patients. Together, these findings underscore the urgent need to strike a balance between the velocity and quality of research, and to cautiously consider medical information and clinical applicability in a pressing, pandemic context. Systematic review registration https://osf.io/5zjyx/ Supplementary Information The online version contains supplementary material available at 10.1186/s12874-020-01190-w.

List of the topics addressed by all COVID-19-related medical articles Table 2. MetaQAT tool for simulation studies Table 3. AXIS tool for cross-sectional studies  Table 5. Newcastle-Ottawa Scale for cohort studies Table 6. Newcastle-Ottawa Scale for case-control studies Table 7. QUADAS-2 tool for diagnostic studies Table 8. QUIPS tool for prognostic studies Table 9. ROBINS-I tool for non-randomized interventional studies Table 10. Cochrane Risk-of-Bias (RoB 2) tool for randomized controlled trials

Rationale
3 Describe the rationale for the review in the context of what is already known.

3
Objectives 4 Provide an explicit statement of questions being addressed with reference to participants, interventions, comparisons, outcomes, and study design (PICOS).

METHODS
Protocol and registration 5 Indicate if a review protocol exists, if and where it can be accessed (e.g., Web address), and, if available, provide registration information including registration number.

5
Eligibility criteria 6 Specify study characteristics (e.g., PICOS, length of follow-up) and report characteristics (e.g., years considered, language, publication status) used as criteria for eligibility, giving rationale.

5
Information sources 7 Describe all information sources (e.g., databases with dates of coverage, contact with study authors to identify additional studies) in the search and date last searched.

5
Search 8 Present full electronic search strategy for at least one database, including any limits used, such that it could be repeated.

5
Study selection 9 State the process for selecting studies (i.e., screening, eligibility, included in systematic review, and, if applicable, included in the meta-analysis).

5, 6
Data collection process 10 Describe method of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any processes for obtaining and confirming data from investigators.

5, 6
Data items 11 List and define all variables for which data were sought (e.g., PICOS, funding sources) and any assumptions and simplifications made. 6

Risk of bias in individual studies
12 Describe methods used for assessing risk of bias of individual studies (including specification of whether this was done at the study or outcome level), and how this information is to be used in any data synthesis.

Search strategy
Pubmed and the Cochrane COVID-19 study register were used to identify peer-reviewed articles. BioRxiv, medRxiv and arXiv data sources were used to identify preprints. We defined different search strategies according to the data sources. The protocol of the study is available at: https://osf.io/5zjyx/.

Risk of bias tools: description
To critically appraise the COVID-19 original articles, we used several tools to address all study designs. The table below presents each tool and how they were ultimately categorized to depict the overall risk of bias for each study.

AXIS for cross-sectional studies
General indication: Question 5, 6 and 7 don't apply to census studies in theory.

AXIS for cross-sectional studies critical appraisal
Introduction Items Answers Comment 1) Were the aims/objectives of the study clear?   Columns "Sum" = sum of points of each item: Selection (0 to 4), Comparability (0 to 2) and Outcome (0 to 3).

Items Answers
1) Representativeness of the exposed cohort Is there concern that the target condition as defined by the reference standard does not match the review question?
A) Low B) High C) Unclear

Flow and timing
Was there an appropriate interval between index test(s) and reference standard?
Did all patients receive a reference standard?

A) Yes B) No C) Unclear
Did patients receive the same reference standard?
Were all patients included in the analysis?
Could the patient flow have introduced bias?
A) Low B) High C) Unclear Table 8. QUIPS for prognostic studies Biases Issues to consider for judging overall rating of "Risk of bias" Judgement

Study Participation
Goal: To judge the risk of selection bias YES/NO Source of target population The source population or population of interest is adequately described for key characteristics Method used to identify problem The sampling frame and recruitment are adequately described, possibly including methods to identify the sample, place of recruitment, and period of recruitment Inclusion and exclusion criteria Inclusion and exclusion criteria are adequately described

Adequate study participation
There is adequate participation in the study by eligible individuals

Baseline characteristics
The baseline study sample is adequately described for key characteristics

Summary Study Participatio n
The study sample represents the population of interest on key characteristics, sufficient to limit potential bias of the observed relationship between the prognostic factor and outcome

Study Attrition Goal: To just the risk of attrition bias Proportion of baseline sample available for analysis
Response rate is adequate and is > 80%

Attempts to collect information on participants who dropped out
Attempts to collect information on participants who dropped out of the study are described

Reasons and potential impact of subjects lost to follow up
Reasons for loss to follow up are described Outcome and prognostic factor information on those lost to follow up Participants lost to follow up are adequately described for key characteristics There are no important differences between key characteristics and outcomes in participants who completed the study and those who did not

Summary Study Attrition
Loss to follow-up is not associated with key characteristics sufficient to limit potential bias to the observed relationship between the prognostic factor and the outcome OVERALL  The method of outcome measurement used in valid and reliable to limit misclassification bias

Method and setting of Outcome Measurement
The method and setting of outcome measurement is the same for all study participants

Summary
Outcome of interest is adequately measured in study participants to sufficiently limit potential bias OVERALL RISK OF BIAS (low/intermediate/high)

Study Confounding
Goal: To judge the risk of bias due to confounding

Important Confounders measured
All important confounders are measured

Definition of the confounding factor
Clear definitions of the important confounders measured are provided

Method and setting of Confounding Measurement
The method and setting of confounding measurement are the same for all study participants Appropriate accounting for confounding Important potential confounders are accounted for in the study design Important potential confounders are accounted for in the analysis

Study Confounding Summary
Important potential confounders are appropriately accounted for, limiting potential bias with respect to the relationship between PF and outcome OVERALL RISK OF BIAS (low/intermediate/high)

Statistical Analysis and
Reporting Goal: To judge the risk of bias related to the statistical analysis and presentation of results

Presentation of analytical strategy
There is sufficient presentation of data to assess the adequacy of the analysis

Model development strategy
The strategy for model building is appropriate and is based on a conceptual framework or model. The selected statistical model is adequate for the design of the study

Reporting of results
There is a description of the association of the prognostic factor and the outcome, including information about the statistical significance Continuous variables are reported or cut-off points are used There is no selective reporting of results

Statistical Analysis and Reporting Summary
The statistical analysis is appropriate for the design of the study, limiting potential for presentation of invalid or spurious results

Table 9. Risk Of Bias In Non-randomized Studies of Interventions (ROBINS-I) tool for nonrandomized interventional studies
General

Cochrane Risk-of-Bias (RoB 2) tool for randomized controlled trials
General indications: RoB 2 is structured into five bias domains: bias arising from the randomization process, bias due to deviations from intended interventions (effect of assignment to intervention and effect of adhering to intervention), bias due to missing outcome data, bias in measurement of the outcome and bias in selection of the reported result. Answers for each item in each domain were: 'Yes', 'Probably Yes', 'Probably No', 'No' and 'No information'. Answers for risk-of-bias judgement in each domain were: 'Low', 'High', 'Some concerns'. Answers for overall risk-of-bias judgement were: 'Low', 'High', 'Some concerns'.

1) Was the allocation sequence random?
'Yes' if a random component was used in the sequence generation process. Examples include computer-generated random numbers; reference to a random number table; coin tossing; shuffling cards or envelopes; throwing dice; or drawing lots. Minimization is generally implemented with a random element (at least when the scores are equal), so an allocation sequence that is generated using minimization should generally be considered to be random. 'No' if no random element was used in generating the allocation sequence or the sequence is predictable. Examples include alternation; methods based on dates (of birth or admission); patient record numbers; allocation decisions made by clinicians or participants; allocation based on the availability of the intervention; or any other systematic or haphazard method. 'No information' if the only information about randomization methods is a statement that the study is randomized.
In some situations a judgement may be made to answer 'Probably no' or 'Probably yes'. For example, in the context of a large trial run by an experienced clinical trials unit, absence of specific information about generation of the randomization sequence, in a paper published in a journal with rigorously enforced word count limits, is likely to result in a response of 'Probably yes' rather than 'No information'. Alternatively, if other (contemporary) trials by the same investigator team have clearly used non-random sequences, it might be reasonable to assume that the current study was done using similar methods.

2) Was the allocation sequence concealed until participants were enrolled and assigned to interventions?
'Yes' if the trial used any form of remote or centrally administered method to allocate interventions to participants, where the process of allocation is controlled by an external unit or organization, independent of the enrolment personnel (e.g. independent central pharmacy, telephone or internet-based randomization service providers). Answer 'Yes' if envelopes or drug containers were used appropriately. Envelopes should be opaque, sequentially numbered, sealed with a tamper-proof seal and opened only after the envelope has been irreversibly assigned to the participant. Drug containers should be sequentially numbered and of identical appearance, and dispensed or administered only after they have been irreversibly assigned to the participant. This level of detail is rarely provided in reports, and a judgement may be required to justify an answer of 'Probably yes' or 'Probably no'. 'No' if there is reason to suspect that the enrolling investigator or the participant had knowledge of the forthcoming allocation.

3) Did baseline differences between intervention groups suggest a problem with the randomization process?
Note that differences that are compatible with chance do not lead to a risk of bias. A small number of differences identified as 'statistically significant' at the conventional 0.05 threshold should usually be considered to be compatible with chance.
'No' if no imbalances are apparent or if any observed imbalances are compatible with chance. 'Yes' if there are imbalances that indicate problems with the randomization process, including: (1) substantial differences between intervention group sizes, compared with the intended allocation ratio; or (2) a substantial excess in statistically significant differences in baseline characteristics between intervention groups, beyond that expected by chance; or (3) imbalance in one or more key prognostic factors, or baseline measures of outcome variables, that is very unlikely to be due to chance and for which the between-group difference is big enough to result in bias in the intervention effect estimate. Also answer 'Yes' if there are other reasons to suspect that the randomization process was problematic: (4) excessive similarity in baseline characteristics that is not compatible with chance.
'No information' when there is no useful baseline information available (e.g. abstracts, or studies that reported only baseline characteristics of participants in the final analysis).
The answer to this question should not influence answers to questions 1.1 or 1.2. For example, if the trial has large baseline imbalances, but authors report adequate randomization methods, questions 1.1 and 1.2 should still be answered on the basis of the reported adequate methods, and any concerns about the imbalance should be raised in the answer to the question 1.3 and reflected in the domain-level risk-of-bias judgement.

Risk-of-bias judgement
Low / High / Some concerns

1) Were participants aware of their assigned intervention during the trial?
If participants are aware of their assigned intervention it is more likely that health-related behaviours will differ between the intervention groups. Blinding participants, most commonly through use of a placebo or sham intervention, may prevent such differences. If participants experienced side effects or toxicities that they knew to be specific to one of the interventions, answer this question 'Yes' or 'Probably yes'.

2) Were carers and people delivering the interventions aware of participants' assigned intervention during the trial?
If carers or people delivering the interventions are aware of the assigned intervention then its implementation, or administration of non-protocol interventions, may differ between the intervention groups. Blinding may prevent such differences. If participants experienced side effects or toxicities that carers or people delivering the interventions knew to be specific to one of the interventions, answer question 'Yes' or 'Probably yes'. If randomized allocation was not concealed, then it is likely that carers and people delivering the interventions were aware of participants' assigned intervention during the trial.

3) If Y/PY/NI to 2.1 or 2.2: Were there deviations from the intended intervention that arose because of the trial context?
For the effect of assignment to intervention, this domain assesses problems that arise when changes from assigned intervention that are inconsistent with the trial protocol arose because of the trial context. We use the term trial context to refer to effects of recruitment and engagement activities on trial participants and when trial personnel (carers or people delivering the interventions) undermine the implementation of the trial protocol in ways that would not happen outside the trial. For example, the process of securing informed consent may lead participants subsequently assigned to the comparator group to feel unlucky and therefore seek the experimental intervention, or other interventions that improve their prognosis.
Answer 'Yes' or 'Probably yes' only if there is evidence, or strong reason to believe, that the trial context led to failure to implement the protocol interventions or to implementation of interventions not allowed by the protocol.
Answer 'No' or 'Probably no' if there were changes from assigned intervention that are inconsistent with the trial protocol, such as non-adherence to intervention, but these are consistent with what could occur outside the trial context. Answer 'No' or 'Probably no' for changes to intervention that are consistent with the trial protocol, for example cessation of a drug intervention because of acute toxicity or use of additional interventions whose aim is to treat consequences of one of the intended interventions.
If blinding is compromised because participants report side effects or toxicities that are specific to one of the interventions, answer 'Yes' or 'Probably yes' only if there were changes from assigned intervention that are inconsistent with the trial protocol and arose because of the trial context. The answer 'No information' may be appropriate, because trialists do not always report whether deviations arose because of the trial context.

4) If Y/PY to 2.3: Were these deviations likely to have affected the outcome?
Changes from assigned intervention that are inconsistent with the trial protocol and arose because of the trial context will impact on the intervention effect estimate if they affect the outcome, but not otherwise.

5) If Y/PY/NI to 2.4: Were these deviations from intended intervention balanced between groups?
Changes from assigned intervention that are inconsistent with the trial protocol and arose because of the trial context are more likely to impact on the intervention effect estimate if they are not balanced between the intervention groups.

6) Was an appropriate analysis used to estimate the effect of assignment to intervention?
Both intention-to-treat (ITT) analyses and modified intention-totreat (mITT) analyses excluding participants with missing outcome data should be considered appropriate. Both naïve 'per-protocol' analyses (excluding trial participants who did not receive their assigned intervention) and 'as treated' analyses (in which trial participants are grouped according to the intervention that they received, rather than according to their assigned intervention) should be considered inappropriate. Analyses excluding eligible trial participants post-randomization should also be considered inappropriate, but post-randomization exclusions of ineligible participants (when eligibility was not confirmed until after randomization, and could not have been influenced by intervention group assignment) can be considered appropriate.

7)
If N/PN/NI to 2.6: Was there potential for a substantial impact (on the result) of the failure to analyse participants in the group to which they were randomized?
This question addresses whether the number of participants who were analysed in the wrong intervention group, or excluded from the analysis, was sufficient that there could have been a substantial impact on the result. It is not possible to specify a precise rule: there may be potential for substantial impact even if fewer than 5% of participants were analysed in the wrong group or excluded, if the outcome is rare or if exclusions are strongly related to prognostic factors.

Risk-of-bias judgement
Low / High / Some concerns If participants are aware of their assigned intervention it is more likely that health-related behaviours will differ between the intervention groups. Blinding participants, most commonly through use of a placebo or sham intervention, may prevent such differences. If participants experienced side effects or toxicities that they knew to be specific to one of the interventions, answer this question 'Yes' or 'Probably yes'.

2) Were carers and people delivering the interventions aware of participants' assigned intervention during the trial?
If carers or people delivering the interventions are aware of the assigned intervention then its implementation, or administration of non-protocol interventions, may differ between the intervention groups. Blinding may prevent such differences. If participants experienced side effects or toxicities that carers or people delivering the interventions knew to be specific to one of the interventions, answer 'Yes' or 'Probably yes'. If randomized allocation was not concealed, then it is likely that carers and people delivering the interventions were aware of participants' assigned intervention during the trial.

3) [If applicable:] If Y/PY/NI to 2.1 or 2.2: Were important non-protocol interventions balanced across intervention groups?
This question is asked only if the preliminary considerations specify that the assessment will address imbalance of important nonprotocol interventions between intervention groups. Important nonprotocol interventions are the additional interventions or exposures that: (1) are inconsistent with the trial protocol; (2) trial participants might receive with or after starting their assigned intervention; and (3) are prognostic for the outcome. Risk of bias will be higher if there is imbalance in such interventions between the intervention groups.

4) [If applicable:] Were there failures in implementing the intervention that could have affected the outcome?
This question is asked only if the preliminary considerations specify that the assessment will address failures in implementing the intervention that could have affected the outcome. Risk of bias will be higher if the intervention was not implemented as intended by, for example, the health care professionals delivering care. Answer 'No' or 'Probably no' if implementation of the intervention was successful for most participants.

5) [If applicable:] Was there nonadherence to the assigned intervention regimen that could have affected participants' outcomes?
This question is asked only if the preliminary considerations specify that the assessment will address non-adherence that could have affected participants' outcomes. Non-adherence includes imperfect compliance with a sustained intervention, cessation of intervention, crossovers to the comparator intervention and switches to another active intervention. Consider available information on the proportion of study participants who continued with their assigned intervention throughout follow up, and answer 'Yes' or 'Probably yes' if the proportion who did not adhere is high enough to raise concerns. Answer 'No' for studies of interventions that are administered once, so that imperfect adherence is not possible, and all or most participants received the assigned intervention.

6) If N/PN/NI to 2.3, or Y/PY/NI to 2.4 or 2.5: Was an appropriate analysis used to estimate the effect of adhering to the intervention?
Both ' naïve 'per-protocol' analyses (excluding trial participants who did not receive their allocated intervention) and 'as treated' analyses (comparing trial participants according to the intervention they actually received) will usually be inappropriate for estimating the effect of adhering to intervention (the 'per-protocol' effect). However, it is possible to use data from a randomized trial to derive an unbiased estimate of the effect of adhering to intervention. Examples of appropriate methods include: (1) instrumental variable analyses to estimate the effect of receiving the assigned intervention in trials in which a single intervention, administered only at baseline and with all-or-nothing adherence, is compared with standard care; and (2) inverse probability weighting to adjust for censoring of participants who cease adherence to their assigned intervention, in trials of sustained treatment strategies. These methods depend on strong assumptions, which should be appropriate and justified if the answer to this question is 'Yes' or 'Probably yes'. It is possible that a paper reports an analysis based on such methods without reporting information on the deviations from intended intervention, but it would be hard to judge such an analysis to be appropriate in the absence of such information.
If an important non-protocol intervention was administered to all participants in one intervention group, adjustments cannot be made to overcome this.
Some examples of analysis strategies that would not be appropriate to estimate the effect of adhering to intervention are (i) 'Intention to treat (ITT) analysis', (ii) 'per protocol analysis', (iii) 'astreated analysis', (iv) 'analysis by treatment received'.

Risk-of-bias judgement
Low / High / Some concerns Domain 3: Risk of bias due to missing outcome data 1) Were data for this outcome available for all, or nearly all, participants randomized?
The appropriate study population for an analysis of the intention to treat effect is all randomized participants. "Nearly all" should be interpreted as that the number of participants with missing outcome data is sufficiently small that their outcomes, whatever they were, could have made no important difference to the estimated effect of intervention. For continuous outcomes, availability of data from 95% of the participants will often be sufficient. For dichotomous outcomes, the proportion required is directly linked to the risk of the event. If the observed number of events is much greater than the number of participants with missing outcome data, the bias would necessarily be small. Only answer 'No information' if the trial report provides no information about the extent of missing outcome data. This situation will usually lead to a judgement that there is a high risk of bias due to missing outcome data. Note that imputed data should be regarded as missing data, and not considered as 'outcome data' in the context of this question.

2) If N/PN/NI to 3.1: Is there evidence that the result was not biased by missing outcome data?
Evidence that the result was not biased by missing outcome data may come from: (1) analysis methods that correct for bias; or (2) sensitivity analyses showing that results are little changed under a range of plausible assumptions about the relationship between missingness in the outcome and its true value. However, imputing the outcome variable, either through methods such as 'lastobservation-carried-forward' or via multiple imputation based only on intervention group, should not be assumed to correct for bias due to missing outcome data.

3) If N/PN to 3.2: Could missingness in the outcome depend on its true value?
If loss to follow up, or withdrawal from the study, could be related to participants' health status, then it is possible that missingness in the outcome was influenced by its true value. However, if all missing outcome data occurred for documented reasons that are unrelated to the outcome then the risk of bias due to missing outcome data will be low (for example, failure of a measuring device or interruptions to routine data collection). In time-to-event analyses, participants censored during trial followup, for example because they withdrew from the study, should be regarded as having missing outcome data, even though some of their follow up is included in the analysis. Note that such participants may be shown as included in analyses in CONSORT flow diagrams.

4) If Y/PY/NI to 3.3: Is it likely that missingness in the outcome depended on its true value?
This question distinguishes between situations in which (i) missingness in the outcome could depend on its true value (assessed as 'Some concerns') from those in which (ii) it is likely that missingness in the outcome depended on its true value (assessed as 'High risk of bias'). Five reasons for answering 'Yes' are: 1. Differences between intervention groups in the proportions of missing outcome data. If there is a difference between the effects of the experimental and comparator interventions on the outcome, and the missingness in the outcome is influenced by its true value, then the proportions of missing outcome data are likely to differ between intervention groups. Such a difference suggests a risk of bias due to missing outcome data, because the trial result will be sensitive to missingness in the outcome being related to its true value. For time-to-event-data, the analogue is that rates of censoring (loss to follow-up) differ between the intervention groups.

2.
Reported reasons for missing outcome data provide evidence that missingness in the outcome depends on its true value; 3. Reported reasons for missing outcome data differ between the intervention groups; 4. The circumstances of the trial make it likely that missingness in the outcome depends on its true value. For example, in trials of interventions to treat schizophrenia it is widely understood that continuing symptoms make drop out more likely.

5.
In time-to-event analyses, participants' follow up is censored when they stop or change their assigned intervention, for example because of drug toxicity or, in cancer trials, when participants switch to second-line chemotherapy. Answer 'No' if the analysis accounted for participant characteristics that are likely to explain the relationship between missingness in the outcome and its true value.

Risk-of-bias judgement
Low / High / Some concerns

Domain 4: Risk of bias in measurement of the outcome 1) Was the method of measuring the outcome inappropriate?
This question aims to identify methods of outcome measurement (data collection) that are unsuitable for the outcome they are intended to evaluate. The question does not aim to assess whether the choice of outcome being evaluated was sensible (e.g. because it is a surrogate or proxy for the main outcome of interest). In most circumstances, for pre-specified outcomes, the answer to this question will be 'No' or 'Probably no'. Answer 'Yes' or 'Probably yes' if the method of measuring the outcome is inappropriate, for example because: (1) it is unlikely to be sensitive to plausible intervention effects (e.g. important ranges of outcome values fall outside levels that are detectable using the measurement method); or (2) the measurement instrument has been demonstrated to have poor validity.

2) Could measurement or ascertainment of the outcome have differed between intervention groups?
Comparable methods of outcome measurement (data collection) involve the same measurement methods and thresholds, used at comparable time points. Differences between intervention groups may arise because of 'diagnostic detection bias' in the context of passive collection of outcome data, or if an intervention involves additional visits to a healthcare provider, leading to additional opportunities for outcome events to be identified.

3) If N/PN/NI to 4.1 and 4.2: Were outcome assessors aware of the intervention received by study participants?
Answer 'No' if outcome assessors were blinded to intervention status. For participant-reported outcomes, the outcome assessor is the study participant.

4) If Y/PY/NI to 4.3: Could assessment of the outcome have been influenced by knowledge of intervention received?
Knowledge of the assigned intervention could influence participant-reported outcomes (such as level of pain), observerreported outcomes involving some judgement, and intervention provider decision outcomes. They are unlikely to influence observer-reported outcomes that do not involve judgement, for example all-cause mortality.

5) If Y/PY/NI to 4.4: Is it likely that assessment of the outcome was influenced by knowledge of intervention received?
This question distinguishes between situations in which (i) knowledge of intervention status could have influenced outcome assessment but there is no reason to believe that it did (assessed as 'Some concerns') from those in which (ii) knowledge of intervention status was likely to influence outcome assessment (assessed as 'High'). When there are strong levels of belief in either beneficial or harmful effects of the intervention, it is more likely that the outcome was influenced by knowledge of the intervention received. Examples may include patient-reported symptoms in trials of homeopathy, or assessments of recovery of function by a physiotherapist who delivered the intervention.

Risk-of-bias judgement
Low / High / Some concerns Domain 5: Risk of bias in selection of the reported result 1) Were the data that produced this result analysed in accordance with a pre-specified analysis plan that was finalized before unblinded outcome data were available for analysis?
If the researchers' pre-specified intentions are available in sufficient detail, then planned outcome measurements and analyses can be compared with those presented in the published report(s). To avoid the possibility of selection of the reported result, finalization of the analysis intentions must precede availability of unblinded outcome data to the trial investigators. Changes to analysis plans that were made before unblinded outcome data were available, or that were clearly unrelated to the results (e.g. due to a broken machine making data collection impossible) do not raise concerns about bias in selection of the reported result.
2) Is the numerical result being assessed likely to have been selected, on the basis of the results, from multiple eligible outcome measurements (e.g. scales, definitions, time points) within the outcome domain?
Answer 'Yes' or 'Probably yes' if: There is clear evidence (usually through examination of a trial protocol or statistical analysis plan) that a domain was measured in multiple eligible ways, but data for only one or a subset of measures is fully reported (without justification), and the fully reported result is likely to have been selected on the basis of the results. Selection on the basis of the results can arise from a desire for findings to be newsworthy, sufficiently noteworthy to merit publication, or to confirm a prior hypothesis. For example, trialists who have a preconception, or vested interest in showing, that an experimental intervention is beneficial may be inclined to report outcome measurements selectively that are favourable to the experimental intervention.
Answer 'No' or 'Probably no' if: There is clear evidence (usually through examination of a trial protocol or statistical analysis plan) that all eligible reported results for the outcome domain correspond to all intended outcome measurements. Or There is only one possible way in which the outcome domain can be measured (hence there is no opportunity to select from multiple measures). Or Outcome measurements are inconsistent across different reports on the same trial, but the trialists have provided the reason for the inconsistency and it is not related to the nature of the results.
Answer 'No information' if: Analysis intentions are not available, or the analysis intentions are not reported in sufficient detail to enable an assessment, and there is more than one way in which the outcome domain could have been measured.
3) Is the numerical result being assessed likely to have been selected, on the basis of the results, from multiple eligible analyses of the data? Answer 'Yes' or 'Probably yes' if: There is clear evidence (usually through examination of a trial protocol or statistical analysis plan) that a measurement was analysed in multiple eligible ways, but data for only one or a subset of analyses is fully reported (without justification), and the fully reported result is likely to have been selected on the basis of the results. Selection on the basis of the results arises from a desire for findings to be newsworthy, sufficiently noteworthy to merit publication, or to confirm a prior hypothesis. For example, trialists who have a preconception or vested interest in showing that an experimental intervention is beneficial may be inclined to selectively report analyses that are favourable to the experimental intervention.
Answer 'No' or 'Probably no' if: There is clear evidence (usually through examination of a trial protocol or statistical analysis plan) that all eligible reported results for the outcome measurement correspond to all intended analyses. Or There is only one possible way in which the outcome measurement can be analysed (hence there is no opportunity to select from multiple analyses). Or Analyses are inconsistent across different reports on the same trial, but the trialists have provided the reason for the inconsistency and it is not related to the nature of the results.
Answer 'No information' if: Analysis intentions are not available, or the analysis intentions are not reported in sufficient detail to enable an assessment, and there is more than one way in which the outcome measurement could have been analysed.

Risk-of-bias judgement
Low / High / Some concerns

Risk-of-bias judgement
Low risk of bias: the study is judged to be at low risk of bias for all domains for this result.
Some concerns: the study is judged to raise some concerns in at least one domain for this result, but not to be at high risk of bias for any domain.
High risk of bias: The study is judged to be at high risk of bias in at least one domain for this result. Or the study is judged to have some concerns for multiple domains in a way that substantially lowers confidence in the result.