Skip to main content

Table 7 Summary of challenges presented along with possible solutions

From: Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C)

Challenge

Possible solution(s)

Source-specific variability in data availability

• Cluster data sources based on relevant study variables and eliminate those with insufficient data.

• Investigate possible temporal missingness patterns and evidence of MNAR data.

• Potentially leverage relevant techniques such as multiple imputation and inverse probability weighting to handle remaining missing data.

Unreconciled drug exposure intervals

• Aggregate contiguous drug exposure intervals into single drug eras.

• Residual open-ended intervals may not allow for time-varying analysis and may only be suitable for analysis as point exposures.

Absence of baseline medical history

• Perform a sensitivity analysis to understand the impact of EHR-continuity on the estimand.

• Consider incorporating prognostic factors proximal to the outcome into the model.

Limited availability of out-of-hospital mortality data

• Consider a sensitivity analysis on censoring time for discharged patients.

• Employ competing risk analysis analysis with discharge and in-hospital mortality as competing risks.

Previous medical history carried forward in EHR data

• Calculate the number of events recorded per day throughout the visit for an outcome of interest.

• Determine if treatment preceded the outcome or if it is an artifact.