Datasets
We will illustrate our proposed measures of irregularity with the following two datasets.
Pre-specified visit times: TARGet kids!
The TARGet Kids! study enrolls healthy children aged 0–5 years and follows them until age 18, with the aim of investigating the relationship between early life exposures and later health problems including obesity, micronutrient deficiencies, and developmental problems [10]. Well-child visits are recommended at ages 2, 4, 6, 9, 12, 15, 18, 24 months, and then every 12 months afterwards, with vaccinations occurring at ages 2, 4, 6, 12, 15, 18 months. Parents also bring their children for “sick” visits as needed. Individuals are recruited and enrolled by research assistants who approach them at well-child visits. In general, most well-child visits did not occur prior to the expected visit schedule because the physician could not bill for an early visit, and vaccinations could only occur once a child reaches a specific age. For example, the Measles, Mumps and Rubella (MMR) vaccine cannot be administered until a child is 12 months old.
No pre-specified visit times: child systemic lupus Erythematosus study
The child lupus study was a retrospective inception cohort study of patients who were diagnosed with childhood-onset Systemic Lupus Erythematosus (cSLE) and followed at a single center with a dedicated cSLE clinic. This cohort was followed from childhood into adulthood. Visit dates ranged from January 1st, 1985 to September 30th, 2011. Individuals are followed at least once every 6 months; however, visit frequency depended on the severity of their disease. The primary objective of this study was to assess differences in disease activity trajectories among all cSLE patients.
Measures for quantifying the extent of visit irregularity
The following measures can be used to assess the extent of visit irregularity and help inform the modelling approach for the outcome. They can also help determine whether observed visits can be viewed as repeated measures subject to missingness. The proposed measures are based on techniques used to explore missing data. In a repeated measures design, summarizing missing values begins by recording the percentage of missing values at each pre-specified visit time. In addition, predictors of being observed at a pre-specified visit time can be evaluated using a regression model (e.g. logistic regression). We adapt these concepts to the context of irregular data. We consider studies with pre-specified visit times in the protocol, and studies which do not pre-specify visit times in the protocol.
Pre-specified visit times
We propose constructing bins around pre-specified visit times. Let the time frame of interest be (0, τ), and let Tj denote the jth pre-specified visit time (j = 1, 2,...k). The jth bin is given by the interval (Lj, Rj), where Lj and Rj are chosen to specify the left and right cut-points of the jth bin respectively (Fig. 1). We require that Rj < Lj + 1 (for all values of j) so that bins do not overlap, and that Lj < Tj < Rj. These bins can be used to calculate summary statistics such as the proportions of individuals with 0, 1, and > 1 visits per bin.
Bin widths should be specified according to clinical context as appropriate. For example, the HbA1C blood test measures blood glucose levels from the previous 3 months (levels are known to be stable during this time period [11]), and hence bin widths should not be less than this. Bins can have different widths across the study period to account for known patterns in visit intensity (e.g. more frequent visits in the winter). Another approach to specifying bin widths is to use the percentage of the time gap between the pre-specified visit times (Tj). For example, 10% of the gap implies that Lj = Tj - 0.1(Tj – Tj-1), and Rj = Tj + 0.1(Tj + 1 – Tj). When there is no obvious choice of bin widths, reporting on varying bin widths can be helpful.
In perfect repeated measures, all individuals have 1 visit in a bin (regardless of bin width) and no individuals have 0 or > 1 visits per bin. Thus the proportion of individuals with 0 or > 1 visits per bin are 0 and the proportion of individuals with 1 visit per bin is 1. Figure 2 illustrates the visit timings for a random subset of 20 individuals from a perfect repeated measures simulated dataset with 100 observations and five pre-specified visit times (2, 4, 6, 8, 10 months). As the levels of missingness increase, the proportion of individuals with 0 visits per bin increases. As irregularity increases, the proportion of individuals with > 1 visit per bin increases.
The R code for plotting visiting patterns for a random subset of individuals and the mean proportions of individuals with 0, 1, and > 1 visits per bin uses the “IrregLong” package in CRAN [12] and is presented in the Appendix.
No pre-specified visit times
We construct adjacent bins across the entire study period. Bin widths can be determined by clinical context or known visiting patterns (e.g. fewer visits later on in follow-up could be accommodated by wider bins). The jth bin is given by the interval (Lj, Rj), where Lj and Rj are chosen to specify the left boundary and right boundary of the jth bin respectively (Fig. 3).
The mean proportions of individuals with 0, 1, and > 1 visits per bin can be obtained by varying the number of bins (as the number of bins increases, bin widths decrease). These values can be used to judge the extent of irregularity by assessing whether or not they are consistent with values that would result from repeated measures. The larger the disparity of these values from repeated measures values suggests the greater the extent of irregularity. To evaluate this, the first step is to plot the mean proportions of individuals with 0, 1, and > 1 visits per bin as a function of bin width. The next step is to identify the bin width that yields the largest mean proportion of individuals with 1 visit per bin (i.e. in perfect repeated measures, all individuals have 1 visit per bin). At this bin width, determine if either the mean proportions of individuals with 0 or > 1 visits per bin are 0. If the mean proportion of individuals with > 1 visit per bin is not 0, this indicates a degree of irregularity. If the mean proportion of individuals with > 1 visit per bin is 0 and the mean proportion of individuals with 0 visits per bin is not 0, this suggests the data can be viewed as repeated measures with missingness. This comparison can be supplemented by identifying the largest bin width such that the mean proportion of individuals with > 1 visit per bin is 0, and evaluating whether the mean proportions of individuals with 0 and 1 visits per bin are 0. If at the largest such bin width, the mean proportion of individuals with 0 visits per bin is 0 and the mean proportion of individuals with 1 visit per bin is not 0, this suggests the data can be treated as repeated measures.
Censoring
Both left and right censoring should be considered when using bins to explore visit irregularity. Individuals may enter the study after the first pre-specified visit time, and the dataset may be closed before they have the opportunity to attend all the follow-up visits. In cases where censoring is administrative and unlikely to lead to bias, we may wish to measure irregularity separately from censoring. This can be done by specifying an “at-risk” set of individuals for each bin (i.e. individuals who are under follow-up for all times in the bin) then using just these individuals to estimate the proportions of 0, 1, and > 1 visits per bin. Individuals who are lost to follow-up (rather than administratively censored) can still be at-risk beyond their last visit. However, individuals should not be considered in calculations for bins representing times before they entered the study or after the dataset was closed.