Accuracy and precision of variance components in occupational posture recordings: a simulation study of different data collection strategies

Background Information on exposure variability, expressed as exposure variance components, is of vital use in occupational epidemiology, including informed risk control and efficient study design. While accurate and precise estimates of the variance components are desirable in such cases, very little research has been devoted to understanding the performance of data sampling strategies designed specifically to determine the size and structure of exposure variability. The aim of this study was to investigate the accuracy and precision of estimators of between-subjects, between-days and within-day variance components obtained by sampling strategies differing with respect to number of subjects, total sampling time per subject, number of days per subject and the size of individual sampling periods. Methods Minute-by-minute values of average elevation, percentage time above 90° and percentage time below 15° were calculated in a data set consisting of measurements of right upper arm elevation during four full shifts from each of 23 car mechanics. Based on this parent data, bootstrapping was used to simulate sampling with 80 different combinations of the number of subjects (10, 20), total sampling time per subject (60, 120, 240, 480 minutes), number of days per subject (2, 4), and size of sampling periods (blocks) within days (1, 15, 60, 240 minutes). Accuracy (absence of bias) and precision (prediction intervals) of the variance component estimators were assessed for each simulated sampling strategy. Results Sampling in small blocks within days resulted in essentially unbiased variance components. For a specific total sampling time per subject, and in particular if this time was small, increasing the block size resulted in an increasing bias, primarily of the between-days and the within-days variance components. Prediction intervals were in general wide, and even more so at larger block sizes. Distributing sampling time across more days gave in general more precise variance component estimates, but also reduced accuracy in some cases. Conclusions Variance components estimated from small samples of exposure data within working days may be both inaccurate and imprecise, in particular if sampling is laid out in large consecutive time blocks. In order to estimate variance components with a satisfying accuracy and precision, for instance for arriving at trustworthy power calculations in a planned intervention study, larger samples of data will be required than for estimating an exposure mean value with a corresponding certainty.


Background
In occupational studies, increasing attention is paid to understanding exposure variability expressed as variance components, both for the purpose of focusing surveillance and intervention on appropriate targets [1,2] and in order to design efficient exposure assessment strategies for epidemiologic studies and intervention research [3][4][5]. In the context of biomechanical exposures, Mathiassen et al. [6,7] have proposed that variance components can be used as measures of physical variation in a task, job or occupation, and thus meet the need for variables describing this essential aspect in assessments of risks of developing musculoskeletal disorders. Variance components are derived from statistical random effects models [8], by which the total variance in data is partitioned into estimated variance components associated with different random factors, i.e. sources of variability, in the model. In occupational exposure studies, typical random factors are subjects and working days, and the corresponding variance components are referred to as the between-subjects variance and the between-days (within-subject) variance [1]. Fitting random effects models to exposure data has mainly been practiced in chemical exposure assessment [1,[9][10][11], but a number of studies have used such models for biomechanical exposures as well [3,4,6,[12][13][14]. Since variance components can be the primary exposure or outcome measure of a study, rather than just a tool for obtaining another variable of interest (for instance the precision of an estimated mean exposure), the issue of accuracy (absence of bias) and precision of different sampling strategies for determining variance components needs to be addressed. Few previous studies have been devoted to this issue. Mathiassen et al. [4] used bootstrapping to construct confidence intervals for the between-subjects variance, between-days variance and residual (within-day) variance in electromyography (EMG) data from cyclic assembly work. In another study, Mathiassen et al. [6] used theoretical formulas to construct confidence intervals for variance components of EMG measurements from subjects performing a constrained work task. These studies pointed to a considerable imprecision of the estimated variance components, in particular when exposure samples were few and short. For occupational posture recordings, which are the focus of this paper, it is common to collect data for only smaller parts of a working day (e.g. [15][16][17][18]) even if continuous recordings of a full work shift do occur (e.g. [19][20][21][22][23]).
Standard random effects models assume that the modelled random effects are uncorrelated. Probably, this assumption is often violated in occupational settings since exposures close in time tend to be more similar than exposures far apart in time [24,25]. In a study of upper arm postures in three occupations, Liv et al. [14] showed that the assumption of independent errors was less severely violated if a data sample was distributed among several short time periods during a shift than if the sample comprised fewer or only one longer time period. Due to autocorrelation, the observed variance of a mean exposure estimate was also larger than expected from theoretical formulae based on the posture variance components. Similar effects of autocorrelation can also be expected for sample estimates of variance components based on standard random effects models [14]. To our knowledge, this issue has not been investigated in the occupational exposure literature, and so the accuracy and precision of variance components derived by standard procedures under different data collection scenarios are only superficially understood.
The aim of this study was to investigate and discuss the accuracy and precision of estimators of variance components for upper arm elevation when data are collected using different sampling strategies, and to suggest and apply a bootstrap approach for investigating sampling performance in this context.

Methods
The data used in this study were collected for an epidemiologic study of the relationship between upper arm elevation and shoulder disorders among house painters, car mechanics and machinists [21,26,27]. We chose to limit the present study to data from the car mechanics in order to limit the results of the study to manageable amounts. The data set was intended to consist of measurements of right upper arm elevation for five full shifts in each of 25 car mechanics. Posture data were collected by means of the Abduflex equipment [28], which at a frequency of 1 Hz recorded the angle of the upper arm with respect to the line of gravity in six 15°intervals from 0°to 90°, and a seventh interval for angles above 90°. After exclusion of two subjects with less than four measured working days, the data set comprised measurements from 23 subjects. For subjects with five working days, four days were randomly selected in order to obtain a balanced data set for further processing. Some short periods of missing data within recorded working days occurred, in all 299 minutes out of a total of 126,824 minutes, and they were replaced using linear interpolation [14].
The recordings from the different working days were of different duration, ranging from 240 minutes to 721. Most working days were close to 480 minutes. In a block bootstrap procedure, blocks of 30 minutes of simulated data were added to all days shorter than 480 minutes by resampling from the available data of the same day, until the day contained at least 480 minutes of data. All working days longer than 480 minutes were truncated at 480 minutes. This padding and truncating procedure has been described in detail and validated in a previous paper [14]. The described resulting data set was balanced and consisted of 480 minutes of upper arm elevation recordings from each of four days in each of 23 subjects.
From this data set, we calculated minute-by-minute values of three exposure variables: average elevation, percentage time above 90°(percentage of time spent with an arm elevation larger than 90°), and percentage time below 15°(percentage of time spent with an arm elevation less than 15°). Prior to assessing average angles, each Abduflex interval recording was replaced by the central angle of that interval, except for angles above 90°that were assigned the value 105°. Figure 1 shows an example of the three posture variables during a working day of a car mechanic.

Statistical model
The three "parent" exposure data sets (one data set for each exposure variable) were analyzed using a standard hierarchical random effects model as follows: i=1, 2,.., a j=1, 2, 3, 4 k=1,2,.., n where y ijk is the k:th observation of exposure on the i:th subject on that subject's j:th working day, μ is the true mean exposure of the population, α i is the random effect on exposure of the i:th subject, β j(i) is the random effect on exposure of the j:th day within person i, and ε k(ij) (the error term) is the random error for observation k within day j for person i. The variables α i , β j(i) and ε k(ij) are assumed to be independent and normally distributed with zero mean and variances σ 2 BS (the between-subjects variance) , σ 2 BD (the between-days variance) and σ 2 WD (the within-day variance between 1-minute exposure values), respectively -for all i, j and k. To examine the assumptions of the model, values of α i , β j(i) and ε k(ij) were predicted for each exposure variable [8]. Variance components and mean exposures were estimated using the restricted maximum likelihood method [8]. The autocorrelation function of lag 1, 2, 3, 4, 5, and 10 minutes was estimated for data within each working day and these values were then averaged across working days and subjects for each lag of the autocorrelation function. Autocorrelation describes the similarity of observations separated by a specific time span (the lag) in terms of a standard Pearson product-moment correlation coefficient. Hence, autocorrelation ranges between −1 and 1, where a value of 0 implies no correlation at all, and −1 and 1 show perfect predictability of data ahead in time.

Sampling strategies
By simulation, we investigated a selection of sampling strategies defined by the number of sampled subjects (n s ), the total sampling time collected from each subject (t tot , in minutes), the number of working days per subject (n d ) among which the sampled time periods were distributed, and the size of sampled time blocks within each working day (t b , in minutes). Blocks were dispersed across the day at random, or using a fixed time interval schedule. Table 1 summarizes the investigated sampling strategies. For example, one sampling strategy used n s = 10 subjects, t tot = 120 minutes, n d = 2 days, t b = 15 minutes and random sampling. This implied sampling 120 minutes from each of 10 subjects, evenly allocated to two days. The 60 minutes from each day were sampled in four blocks, each 15 minutes long, and the blocks were randomly allocated within the day, with no overlap.
In the corresponding fixed interval strategy, the first of the four blocks was randomly positioned within the first 105 minute of the working day, i.e. one fourth of the non-sampled time during that day. The remaining three blocks were distributed evenly across the day with a fixed time interval of 105 minutes between each block. All 128 (2x4x2x4x2) possible combinations of the parameter values in Table 1 were considered for each of the three posture variables. However, several of the combinations do not represent conceivable sampling strategies. For example, the block size cannot exceed the total sampling time per subject. In addition, when sampling one block per day, any fixed interval strategy is identical to the corresponding random strategy and therefore redundant. After rejecting impossible and redundant combinations, 80 sampling strategies remained. The performance of all 80 strategies was determined for each of the three posture variables using a bootstrap procedure.

Bootstrapping
To investigate the performance of the 80 sampling strategies, non-parametric bootstrapping was performed on 3-level hierarchical data sets, levels being subjects, days and measurements within days [29]. Five thousand bootstrap runs were carried out for each combination of strategy and posture variable. In Liv et al. [14], 10000 bootstrap runs were used for investigating the precision of mean values, but estimating variance components is more computationally intensive, and therefore we settled for 5000 bootstrap runs that seemed sufficient to ensure stable estimates. In each bootstrap repeat, n s subjects were selected with replacement from the parent data set. Within each selected subject, n d days were then selected without replacement, and from within each selected working day t tot /n d minutes were selected without replacement, using the block size prescribed by the sampling strategy. Thus, each bootstrap run resulted in a simulated data set, the variance components of which were estimated using the random effects model explained above (equation (1)). For each sampling strategy and variable, the accuracy of the estimators of the three variance components was expressed as bias. The bias was calculated by subtracting the mean of the estimates of that variance component across all 5000 bootstrap runs from the corresponding "true" variance component as estimated from the "parent" data set. As a measure of the precision of each variance component, a 90% prediction interval was estimated using the 5-and 95-percentiles of the empirical distribution of the 5000 bootstrap estimates of that variance component. A future estimate of a particular variance component will fall within the prediction interval with a probability of 90%. All simulation and estimation procedures were performed in the program R [30]; restricted maximum likelihood estimates of the variance components were obtained by the function lmer4.

Results
For all three exposure variables, the total variability of the "parent" data set was dominated by the within-day variance component ( Table 2; data also reported in Liv et al. 2011). The autocorrelation at lag 1 ranged between 0.51 and 0.55 for the three exposure variables, which demonstrates that the assumption of independence in the error term of the statistical model (equation 1) was violated.
In Table 3, accuracy and precision of the estimated variance components for percentage time above 90°are presented for the subset of all investigated sampling strategies for which n s = 10 or 20, n d = 2 or 4, t tot = 120 or 480 minutes and t b = 1, 15 or 60 minutes. A table containing results from all investigated sampling strategies and exposure variables is provided as an Additional file 1: Liv_complete_results.pdf.
For strategies with a block size of 1 minute, bias was absent or very small. In general, when block size increased, a stronger negative bias appeared for the within-day variance, and a stronger positive bias for the between-days variance ( Table 3; illustrated in Figure 2). This effect was more pronounced with a smaller total sample size per subject ( Figure 2). The accuracy of the between-subjects variance was not affected by the sampling strategy to the same extent as the variance between and within days.       a, b, c: between-subjects, between-days, and within-day variance. n s , number of subjects; t tot , total sampling time per subject (minutes); n d , number of days per subject; t b , size of sampling blocks (minutes); r, random sampling; f, fixed interval sampling. Bias and prediction intervals are presented relative to the "true" variance components of the parent data set, i.e. σ 2 BS = 3.0, σ 2 BD = 4.0 and σ 2 WD = 164.7 (cf. Table 2). Thus, the result in bold and italics tells that for the sampling strategy (n s , t tot , n d , t b ) = (10, 120, 4, 1) with random distribution of blocks, σ 2 BS was downward biased by 0.2, i.e. its average estimated value was 2.8, and the 90% prediction interval, presented as [−3.0, 4.1], reached from 3.0-3.0 to 3.0 + 4.1, i.e. from 0.0 to 7.1.
As expected, precision increased (i.e. the 90% prediction interval of the variance components narrowed) when more data was sampled (Table 3). At a particular total sampling time, t tot , the prediction interval widened, i.e. the variance components were less precise, for larger block sizes. This effect appears clearly in Figure 2. In general, distributing a particular sampling time across four days (n d = 4) led to better precision for all three variance components than when distributing the same sampling time across two days (n d = 2). However, distributing sampling time across more days also resulted in less accurate variance components when the block size was large. We did not find obvious systematic differences between variance components obtained by random and fixed-interval sampling. Very similar patterns were found for the other two exposure variables.

Discussion
In the present study, different sampling strategies were evaluated with respect to accuracy and precision of variance components for three posture variables for car mechanics. The results showed the consequences of violating theoretical assumptions behind the random effects model; inaccurate results were caused by individual samples being time-dependent within working days (autocorrelation). The present study used a bootstrap method for investigating the performance of sampling strategies that is also applicable in other occupational settings and for other exposure variables.
The present study showed that sampling data in large time blocks may lead to inaccuracy and imprecision in estimates of variance components. This was particularly prominent for strategies where small fractions of working days were sampled. Variance component estimates were particularly biased for strategies with small sample sizes and large block sizes. For percentage time above 90°, negative biases of up to 26% of the size of the "true" within-day variance component and positive biases of up to 1110% of the size of the "true" between-days variance component were observed. However, in many cases the total error in the variance component estimates was dominated by imprecision rather than inaccuracy. For the within-day variance component, which was estimated at 164.7 in the original data set, the 90% prediction interval ranged from 137 to 191 for the sampling strategy giving the best precision (n s = 20, t tot = 480, n d = 4, t b = 1). For the within-day variance component, the median width of the 90% prediction Figure 2 Estimated variance component mean values (error bars: 90% prediction intervals) resulting from simulations of six different sampling strategies assessing percentage time above 90°. All six strategies used random sampling from n s = 10 subjects, approached for n d = 2 days each. Green circles, blue squares and red triangles show within-day, between-days and between-subjects variance, respectively. Unfilled and filled symbols show strategies with t tot = 120 minutes and t tot = 240 minutes, respectively. Red dashed, blue solid and green dotted lines represent the "true" within-day, between-days and between-subjects variance, respectively, according to the parent data set.
interval across all investigated sampling strategies was 89 while the median bias of the variance component was 1.4. Thus, our results suggested that imprecision will often be a more serious problem than inaccuracy for studies of the sizes simulated here. The results further indicated that the sample sizes investigated by us might not be sufficient to retrieve variance components with a satisfying precision. In occupational epidemiology, variance components are required for designing efficient exposure measurement strategies, and when combined with information on costs associated with data collection, they give a basis for deciding on efficient budget allocation [31][32][33][34]. Variance components can guide the selection of targets for interventions to reduce suspected hazardous exposures [2]; they are used in assessments of clinical reliability [35], and they are necessary inputs in conventional power analysis of, for instance, studies addressing exposure differences between groups or effects of an intervention [4,6]. The present paper clearly illustrates that the results of these applications of estimated variance components can be very uncertain, in particular if the estimates have been based on short and continuous exposure samples. This caveat is rarely addressed in the literature. Estimated variances are known to follow a positively skewed distribution; this was apparent even in the present study. Hence, a sample estimate of a variance is more likely too small compared to the true value than too large. Using an estimated variance in a power analysis of a planned intervention study will therefore more often lead to too "optimistic" (small) predictions of the necessary study size than to too "pessimistic". The error may be considerable, as illustrated by the wide prediction intervals on variances in the present study. In order to account for variance estimation uncertainty in power analyses, some authors have suggested to use the 80 th percentile of the expected distribution of variance estimates as an input rather than the actual variance estimate [36]. A particular challenge appears if variance components per se are the exposure variables of interest, for instance in studies of exposure variation [6,7]. Variances are not normally distributed, and a conventional power analysis, which requires data to have this property, is not applicable. Developing power analysis procedures for studies addressing variance components is an interesting issue for further research.
The epidemiologic study, for which the data was originally sampled, attempted a random collection of subjects and working weeks [21,26,27]. The present data is therefore a likely representative sample of car mechanics' exposure to elevated upper arms. The original study also included measurements on house painters and machinists [21,26,27]. The car mechanics, spending on average 4.7% time with the right arm elevated above 90°, worked more with elevated arms than machinists (1.6% time >90°), but less than house painters (8.8% time >90°). The size of exposure variability in the three groups differed in a similar fashion; the car mechanics showed more variability than the machinists did, but less than the house painters did. The autocorrelation function for percentage time above 90°a t lag 1 was 0.31 for machinists and 0.46 for house painters, compared to an autocorrelation of 0.52 for the car mechanics [14]. This implies violation of the assumption of independence in the error term also for machinists and house painters. This leads us to believe that the principal effects of sampling strategy on variance component estimators shown in the present study are relevant also to data collections of other exposure variables and in other occupational groups. The magnitude of these effects, however, probably varies between variables and occupational groups, and our numerical results should therefore be applied outside the group of car mechanics only with great caution. Variability of upper arm elevation has been reported in the literature for other occupations [6,22,37], but the posture variables were different from the ones used in the present study and accuracy and precision of the reported variance components were not explored. Consistent with our findings, estimates of variance components were shown to be associated with considerable imprecision in previous studies on muscle activity during assembly work [3,4] and on posture and electromyography data from short-cycle manual handling [6].
The present study determined exposure variability between and within subjects using a random effects model, which assumes that effects are uncorrelated. Results showed that this assumption was violated since the car mechanics exhibited considerable autocorrelation between measurements within a working day. As demonstrated by David [38], the ordinary sample variance estimator, ∑ i x i − x ð Þ 2 = n−1 ð Þ, underestimates the population variance if observations are not independent and if the sample size is not large. An equivalent effect can be expected on the estimator for within-day variance. This is a likely explanation why within-day variance estimates were inaccurate for sampling strategies with larger block sizes, while they were not for strategies with block size 1, where observations will be (close to) unaffected by autocorrelation. Since variance components are partitions of the total (constant) variance present in the data, a negative bias in the within-day variance estimate propagates to the other variance components, in particular showing up as a positive bias in the between-days variance. When block size increases, the time span between the observations in the sample decreases. Hence, the sample will be more autocorrelated, which leads to a larger bias. We believe that increased autocorrelation explains the occasional larger bias of variance components estimated by strategies where a particular sampling time was distributed across four days rather than two. This will lead to a smaller sampling time per day andif the block size is largeto a more dominant effect of autocorrelation.
Violations of other assumptions of the random effects model (equation 1) than independence also occurred. Visual inspections of plots of predicted values of the random effects and their residuals suggested that the assumption of constant variance across subjects and days was violated in some cases. The residuals also had positively skewed distributions. Although the model in equation (1) makes no assumptions of the distributional form of the random effects, the REML estimators that we used to estimate the variance components assume normal distributions. However, REML estimators are identical to ANOVA estimators when data is balanced [39], and as ANOVA estimators are not based on distributional assumptions we do not consider this to be a problem. We did not transform the exposure data because we could not identify a transformation that improved the fit of the random effects model to any noticeable extent.
More complex random effects models are available that can model time dependence in the error term by incorporating correlation structures according to different time series models, like autoregressive models (AR) or moving average models (MA) [40]. A successful fit of such a model might result in unbiased measures of variability between and within subjects. However, for data sets as large as that used in the present study (23 subjects*4 days*480 minutes = 44160 observations), computations with very large variance and covariance matrices will be involved. Thus, it may not be possible to fit these models. Moreover, identifying a reasonable model of the time dependence of our posture variables is not a trivial matter.
A parametric model for the structure of the arm elevation data was not available so parametric bootstrapping [41] was not an option. While procedures for non-parametric bootstrapping for hierarchical data have mainly been discussed in the context of two-level data sets [29,[42][43][44], a recent paper by Ren et al. [45] addressed non-parametric bootstrapping for data sets with three levels or more. The paper concluded that units at the first level (here subjects) should be selected with replacement while units at the two lower levels (here days and quanta within days) should be selected without replacement; this was the procedure used in the present paper.

Conclusions
If exposure data are autocorrelated within days, which is probably a common case for biomechanical exposures in occupational settings, limited sampling may lead to inaccuracy in estimated variance components, in addition to a large imprecision. Applying a larger total sample size can improve both accuracy and precision, and further improvements can be obtained by distributing samples well across time.
Since inaccuracy and imprecision of variance component estimators is an issue in limited exposure sampling, occupational research and practice addressing variance components for descriptive, epidemiologic or intervention purposes may face the need for studies of a considerable sizelarger than commonly doneif this inaccuracy and imprecision is to be reduced to levels normally pursued in studies addressing mean exposures.
Our findings lead us to the following guidance for sampling strategies addressing variance components: A larger data sample will be required to reach a satisfying precision of variance components than when the purpose is to estimate a mean value with good precision. Distributing a within-day sample across the whole day and in small blocks will lead to variance component estimators with a better accuracy and precision than if the sample is collected in larger time blocks. Increasing the total within-day sample size increases the accuracy and precision of variance component estimators and reduces the adverse effects of sampling in larger blocks. Distributing a certain total sample size across more days may result in more precise but in some cases also less accurate variance component estimators than if the sample is distributed across fewer days.