Optimizing cost-efficiency in mean exposure assessment - cost functions reconsidered

Background Reliable exposure data is a vital concern in medical epidemiology and intervention studies. The present study addresses the needs of the medical researcher to spend monetary resources devoted to exposure assessment with an optimal cost-efficiency, i.e. obtain the best possible statistical performance at a specified budget. A few previous studies have suggested mathematical optimization procedures based on very simple cost models; this study extends the methodology to cover even non-linear cost scenarios. Methods Statistical performance, i.e. efficiency, was assessed in terms of the precision of an exposure mean value, as determined in a hierarchical, nested measurement model with three stages. Total costs were assessed using a corresponding three-stage cost model, allowing costs at each stage to vary non-linearly with the number of measurements according to a power function. Using these models, procedures for identifying the optimally cost-efficient allocation of measurements under a constrained budget were developed, and applied on 225 scenarios combining different sizes of unit costs, cost function exponents, and exposure variance components. Results Explicit mathematical rules for identifying optimal allocation could be developed when cost functions were linear, while non-linear cost functions implied that parts of or the entire optimization procedure had to be carried out using numerical methods. For many of the 225 scenarios, the optimal strategy consisted in measuring on only one occasion from each of as many subjects as allowed by the budget. Significant deviations from this principle occurred if costs for recruiting subjects were large compared to costs for setting up measurement occasions, and, at the same time, the between-subjects to within-subject variance ratio was small. In these cases, non-linearities had a profound influence on the optimal allocation and on the eventual size of the exposure data set. Conclusions The analysis procedures developed in the present study can be used for informed design of exposure assessment strategies, provided that data are available on exposure variability and the costs of collecting and processing data. The present shortage of empirical evidence on costs and appropriate cost functions however impedes general conclusions on optimal exposure measurement strategies in different epidemiologic scenarios.


Background
Reliable exposure assessment is a vital concern in medical epidemiology and intervention research. In occupational as well as public health studies, exposure is often monitored using equipment that allows data to be collected at a high resolution for long periods and on repeated occasions (e.g. [1][2][3][4]). A considerable emphasis has been put on developing and applying methods for analyzing sources of exposure variability in such data, in terms of so-called variance components [5][6][7][8]. As an example, variance components pertaining to, e.g. companies, occupations, subjects, days within subjects, and exposure samples within days have been determined for a large number of airborne, dermal, and biomechanical exposures in working life (e.g. [2,3,[9][10][11][12][13][14][15]). These variance components have been utilized as a remedy for identifying targets for surveillance, intervention and prevention [6,16,17], as well as for designing effective exposure assessment strategies producing information at a desired level of precision. While an extensive literature deals with the consequences of random exposure variability to bias and precision in exposure-outcome relationships [18][19][20][21][22], some attention has also been paid to the use of variance components for estimating sampling needs in studies examining compliance with exposure limits [6], and in studies comparing groups [12] or conditions [13] as in an intervention scenario. In the latter case, the requirement for reliable exposure data can be expressed as a need to obtain estimates of the mean exposure of individuals or groups with a sufficient precision to arrive at a confidence interval of acceptable size, or secure an acceptable statistical power in a specified hypothesis test. Generalized formulae are available for estimating statistical efficiency, i.e. the relationship between the precision of a mean exposure estimate, on the one hand, and, on the other, the size of relevant variance components, and the number of measurements at the corresponding sampling stages [23,24]. The most frequently applied measurement model is hierarchical and random with two or three nested stages, for instance subjects and days within subjects [2,25,26]; subjects, days within subjects and samples within days [12,27]; or groups, subjects within groups, and days within subjects [28]. A few attempts have been made to apply more complicated models, e.g. including crossed (non-nested) components related to the distribution of measurement days among subjects [29] or associated with methodological variance [11]. Also, mixed models including fixed determinants of exposure in addition to random effects are in increasing use [13,[30][31][32][33].
Some studies have been devoted particularly to understanding the effects on the precision of an estimated group mean exposure of allocating measurement efforts in different ways between and within subjects [12], between occupational recordings and data processing [11], or across time within a measurement day [34,35]. This had led to a number of principles for statistically efficient exposure assessment, i.e. measurement strategies that perform well at a specified investment of measurement resources, or, equivalently, yield a specified performance with comparatively small measurement efforts [12,34]. As one trivial conclusion, more data generally leads to better statistical performance, and furthermore, efficiency increases if measurements are allocated to higher sampling stages in the hierarchical model [23].
At the same time, more measurements inevitably imply larger monetary costs. While budget constraints are the pragmatic reality in most exposure assessments, surprisingly few studies have addressed the issue of how to design a measurement strategy so as to give the best possible statistical efficiency at the available monetary resources [36]. This endeavor is not equivalent to addressing statistical efficiency per se, as introduced above, since measurements at different stages may entail different costs. For instance, increasing the number of groups may be considerably more expensive than collecting data from more subjects in an existing group; and the process of identifying and approaching a new subject may be more expensive than achieving more measurements from a subject already in the sample population. Also, different measurement instruments yielding the same exposure variables may imply different costs, in particular if the risk of measurement failures is acknowledged [37]. Of the limited literature devoted to efficiency and cost in data collection, some studies compare a selection of measurement strategies in order to identify the one superior in cost-efficiency [38][39][40][41]. A few studies take on the more challenging task of determining the optimally cost-efficient strategy at a certain budget, on the basis of specified costs for collecting data at different stages, and specified sizes of the corresponding variance components. The general significance of examining cost-efficiency in data collection is illustrated by previous studies appearing in a variety of research areas, including occupational hygiene [38], environmental medicine [39,42,43], clinical chemistry [44], and nutrition [45].
Basically, optimization in the case of exposure assessment strives to identify data collection strategies at the frontier of possible relationships between cost and statistical efficiency (figure 1).

ance Varia
Possible, but less favorable s 2 * Frontier N t ibl Total cost c * Not possible Figure 1 The notion of optimal cost-efficiency. The horizontal axis illustrates the total cost associated with an exposure measurement strategy, and the vertical axis shows the variance of the resulting mean exposure. The frontier curve illustrates the minimal obtainable variance at each level of spending, i.e. the best possible statistical performance, e.g. s 2 μ *, at a particular total cost, e. g. c*. Strategies above the frontier are, in principle, possible, but do not yield an optimal performance. No strategies occur below the frontier.
Previous optimization studies have addressed hierarchical models with two [45][46][47] or three [43,44,47] stages, as well as the optimal allocation of measurements between two alternative yet correlated instruments for data collection [42,48,49]. All these studies have, however, assumed that the price of one measurement unit at each stage is constant, implying that costs increase in a linear fashion at that stage, proportionally to the number of samples. Only in an appendix of the paper by Duan and Mage [42], an empirical example appears of the quite likely case that costs may vary with the number of measurements; for instance that subjects recruited late in a study may require more time for being persuaded, and thus entail larger labor costs, than subjects signing up immediately. Also, in his textbook on sampling strategies, Cochran [47] reports some non-linear cost functions in other areas of data collection, and additional examples appear in Groves [50]. In addition, the cited cost-efficiency studies do not, in general, consider whether the identified optimal strategies are feasible under the constraints dictated by a specified, yet limited budget.
Thus, the present paper is devoted to deriving methods for optimizing exposure assessment strategies, in terms of offering the best possible trade-off between total costs and statistical efficiency. In contrast to previous literature, this study explores optimal cost-efficiency even when cost functions are not linear and budget constraints apply, and the study also identifies alternative optimization procedures in those cases where analytical closed-form solutions cannot be developed.
First, the paper presents a general theoretical model of cost and efficiency when assessing exposure mean values in occupational groups, including some theoretical results based on that model. Then, the general model is simplified, and procedures are derived for identifying optimally cost-efficient exposure assessment strategies, depending on the shapes of cost functions. These results are illustrated by numerical examples. A general discussion on the representativeness and sensitivity of the suggested optimization procedures concludes the paper.

Methods
A framework for cost-efficient exposure assessment Exploring cost-efficiency at an ordinal level only requires a specification of the properties of the mathematical function associating each exposure assessment strategy with its stated statistical objective. If, however, the goal of the cost-efficiency analysis is to compare or optimize strategies in explicit, quantitative terms, specific functional forms need be identified that parameterize objectives and costs. This is a necessary requirement when aiming at the (occasionally more than one) strategy that maximizes efficiency among the large selection of possible assessment strategies entailing a particular cost.
Thus, three major issues must be considered as part of a quantitative analysis of cost-efficient resource consumption: (1) why resources are used, i.e. the objective of collecting data, (2) how much resources are required to fulfil the objective, expressed in terms of unit-costs, and (3) whether the intended strategy for resource consumption is feasible. When examining cost-efficient assessments of group mean exposure we thus need to know (1) the relationship between the group mean and the assessment strategy, as reflected by what is usually referred to as the objective function, (2) the amount of monetary resources required to realise a particular assessment strategy, expressed by the cost function, and (3) the amount of monetary resources at our disposal, as reflected by the budget constraint.

The objective function -precision of the mean
For a hierarchical three-stage balanced data set (subjects, occasions within subject, samples within occasion), the group mean exposure, μ, can be estimated using a "mean of means" approach [23] as: Where x k(ij) is an individual exposure sample, collected from subject i on occasion j; n s is the number of subjects included in the data set; n d is the number of distinct measurement occasions, for instance days, per subject; and n q is the number of samples, or quanta, per measurement occasion. Accordingly, averaging is made across quanta within each occasion, then across occasions within each subject, and finally across subjects.
A general formula for determining the variance of this group mean exposure estimate, s 2 μ , has been proposed and applied by several authors [12,23,44,47]. This objective function takes the form: s 2 BS , s 2 BD , and s 2 WD are the variances between subjects, between measurement occasions within each subject, and between quanta within occasions, respectively. The size of a quantum can be defined as convenient, and previous studies have used quanta of, for instance, one minute [34,51], one work cycle [11,13,52,53], several consecutive work cycles [12,54], and one hour [55]. Thus, equation (1) gives an estimate of the precision of a group mean exposure resulting from a particular measurement strategy in terms of subjects, occasions and quanta, in a setting with known components of exposure variability.

The cost function
While all cost functions suggested in the literature have been linear, the cost associated with collecting n q quanta on each of n d occasions for each of n s subjects can be assessed even in a non-linear case, provided that information is available on the "capability" to recruit subjects, that is, the amount of resources needed for recruiting any specific number of subjects, and the equivalent capabilities for setting up measurement occasions within each subject and collecting quanta within each occasion.
Assume first that these three capabilities are all homogeneous of degree k, in the sense that if all resources are multiplied by a certain factor, x (x > 1), output will increase by x k . This is a common assumption in economics addressing non-linear production capabilities. For example, if k = 1 and resources allocated to the process of recruiting subjects are doubled, then the number of subjects recruited will also double; this is simple proportional linearity. In the case of k = 0.5, doubled recruitment resources would lead to an increase in the number of recruited subjects by a factor 2 1/2 = √ 2 . Assume further that the resources needed for setting up n d measurement occasions, each containing n q quanta, do not depend on the subject from whom data are collected, and the resources needed to collect n q quanta on a particular measurement occasion for a particular subject are independent of occasion and subject.
The first of these two assumed capability properties allows cost functions for recruiting subjects, c s , setting up measurement occasions within each subject, c d , and collecting measurement quanta within each occasion, c q , to be expressed as: c s (n s ) = π s · n α s ; c d (n d ) = π d · n β d ; and c q (n q ) = π q · n γ q , where the π-values are the costs for obtaining one measurement unit at each stage of data collection, socalled unit costs, and a, b and g are parameters, all larger than 0, describing the shape of a power relationship between the number of measurement units and costs.
The relationship between the value(s) of π and the exponents a, b and g can be illustrated by examining the cost functions. If, for instance, a = 1, the cost of recruiting n s subjects is c s (n s ) = π s ⋅n s , i.e. the cost increases in direct proportion to the number of subjects. In this case, π s is the one-unit cost (c s (1) = π s ), as well as the marginal cost of recruiting any additional subject (∂c s /∂n s = π s ). If α ≠ 1, π s is still the one-unit cost, but the marginal cost is now ∂c s ∂n s = π s · α · n α−1 s . Thus, if a > 1, the marginal cost of including an additional subject increases with the number of subjects, while it decreases when 0 <a < 1.
The second capability property assumed above implies that the total cost of collecting a data set including n s subjects each observed for n d occasions, each containing n q quanta can be stated as c s (n s ) + n s c d (n d ) + n s n d c q (n q ), which equals: c(n s , n d , n q ) = π s · n α s + π d · n s · n β d + π q · n s · n d · n γ q (2) This cost function presents a generalisation of previously suggested linear cost functions [43,44,46] by permitting both linear and non-linear relationships between the sample size at different stages of data collection and the cost of obtaining data. With (a, b, g) = (1,1,1), equation (2) takes the customary linear form used in previous studies. Notably, equation (2) only expresses the variable costs associated with measurement; possible fixed costs, which do not depend on the number of samples, need to be added to give the total cost of collecting the data set, but will not affect the optimization procedures developed below [41,43].

The general optimization problem
If a data collection is allowed to consume a total budget R (after possible reduction by fixed costs), combinations of n s , n d and n q that optimize the output, i.e. minimize the resulting variance of the estimated mean exposure, can be retrieved by solving the following optimization problem: with respect to n s , n d , n q ; subject to the constraint: c(n s , n d , n q ) = π s · n α s + π d · n s · n β d + π q · n s · n d · n γ q ≤ R, n s ≥ 1; n d ≥ 1; n q ≥ 1.
Due to the non-linear property of this three-variable equation system, explicit solutions for optimization can be derived only in exceptional cases. Moreover, solutions to a three-variable problem are difficult to illustrate graphically. Therefore, the following analysis will be limited to cases in which the number of quanta, n q , within each measurement occasion is not a choice variable. This situation occurs for instance when exposure is assessed for complete days, or when the within-day schedule of data sampling cannot or should not be manipulated for reasons of logistics or feasibility.

The two-variable reduction
Given a predetermined number of sampled quanta within each measurement occasion, the general optimization problem above is reduced to the two-variable problem of identifying optimal values of n s and n d . This allows graphical illustrations of the problem and its solutions. It also opens for further simplification into one-variable optimisation problems, which in many cases can be solved explicitly, as shown in the results section.
The two-variable problem takes the form: with respect to n s , n d ; subject to the constraint: In these equations, the terms s 2 μWD = s 2 WD /n q and c q = π q · n γ q have been substituted into the three-variable expressions of mean exposure variance (equation (1)) and cost (equation (2)), respectively. This notation emphasizes that the specific variance of an exposure estimate obtained at one measurement occasion, s 2 μWD , and the cost of collecting data within each occasion, c q , are no longer allowed to vary.
In principle, the two-variable problem can be solved by applying constrained optimization techniques, i.e. by employing the problem's Lagrange function (e.g. [56]). As an alternative, the budget constraint, equation (4), can be substituted into the objective function, equation (3), so as to get a new objective function, which expresses the variance s 2 μ (n s , n d ) as a function of only one variable, be it either n s or n d . This approach relies on the prerequisite that any solution to the optimization problem entails that the entire budget R is consumed. In that case, the budget constraint (equation (4)) can be replaced by an equality: Isolating n s or n d from equation (4a), followed by substitution into equation (3), yields a one-variable objective function, s 2 μ (n i ), with i = s or i = d. This function can be examined using standard methodologies for identifying and illustrating possible local minima within a specified choice set. The resulting optimal value of either n s or n d can then be entered into the budget constraint to get the optimal value of the other variable.

The one-variable substitution approach
The core challenge in the substitution approach outlined in the previous section is to identify that exposure assessment strategy in the choice set defined by the budget constraint for which the objective function, i.e. equation (3) with substituted n s or n d , has its minimal value. This can, in principle, be accomplished by determining the derivative of the objective function and finding its roots. Figure 2 illustrates four principally different cases of how the objective variance function may look as a function of invested resources. At the lower boundary of the choice set, all resources are spent on one unit of n i , and at the upper boundary on as many n i as allowed by the budget, n i,max . Thus, if i = s, these two boundaries correspond to allocating as many measurement occasions as possible to one subject, and obtaining measurements at one occasion from as many subjects as possible.
As a general procedure, the optimal n i for a given budget can be found by comparing the performance obtained: (1) at the lower boundary of the choice set, i.e. using n i = 1, (2) at the upper boundary of the choice set, i.e. with n i = n i,max , and (3) entering values of n i , if any, in the interior of the choice set, 1 ≤ n i ≤ n i,max , for which s 2 μ (n i ) = 0. Thus, examining the properties of the objective function, s 2 μ (n i ), at the boundaries of the choice set is an appropriate first step for identifying the optimal allocation of resources. Provided that the objective function has one unique minimum, i.e. that the objective function is convex (I1, E1 and E2 in figure 2), a necessary, but also sufficient, condition for the optimum to be internal Principally different cases of local extremes of the one-variable objective variance function. The boundaries of possible resource investment, i.e. the choice set, are given by n i = 1 and n i = n i,max . In I1, the variance function has a local minimum at n i = n i *; this is an interior optimal solution with minimal variance. For I2, the variance function also has an interior zero derivative, at n i = n i°, but this solution maximizes the variance and is therefore not useful. In cases E1 and E2, the local extreme of the variance function lies below and above the choice set, respectively. In these cases, minimal variance is obtained at the lower (E1) and upper (E2) choice set boundary.
(case I1) is that s 2 μ (n i = 1) < 0 and s 2 μ (n i = n i,max ) > 0. The exact location of the internal minimum can then be retrieved in a second step. The basic shape of the objective function can be determined by examining its second-order derivative. If this derivative is positive, the function is convex; if not it is concave (case I2), and the optimal strategy will be at one of the choice set boundaries.
If a convex objective function does not have an internal minimum, as in cases E1 and E2 in figure 2, the optimal strategy is represented by the boundary of the choice set. In case E1, which occurs if s 2 μ (n i = 1) > 0, the optimal strategy is to set n i = 1, that is, collect data from only one subject (if i = s), or having only one measurement occasion per subject (if i = d). Case E2 is characterized by a decreasing objective function at n i = n max , i.e.s 2 μ (n i = n i,max ) < 0. In this case, if i = s, the best choice will be to measure as many subjects as possible and hence only one occasion per subject, or, if i = d, to collect data for as many occasions as possible from only one subject.

Results
Below, procedures for determining optimal sampling strategies are developed using the one-variable substitution approach described above. Procedures will be stratified according to the sizes of a and b, which determine the shape of the cost function (equation (4a)), and hence the form of the substituted objective function,s 2 μ (n i ). For each combination of a and b, the objective function is examined, and the boundaries of the choice set determined. Procedures for determining whether the objective function is convex (cases I1, E1 and E2 in figure 2) or concave (case I2) are described where needed. For convex functions, explicit rules are, if possible, developed for when (case I1) and when not (cases E1, E2) the optimal measurement allocation occurs within the choice set. Finally, procedures for identifying an optimal sampling strategy inside the choice set (case I1) are described.
In this case, the marginal costs of including another subject or measurement occasion are both independent of the number of previously included subjects and occasions. Thus, the cost function is linear at both of these stages.

Case A; substitution and objective function
With a = b = 1, the budget constraint (equation 4a) can be expressed as: Substituting this expression for n d in equation (3) gives the corresponding objective function: Taking the derivative with respect to n s yields: Setting A = (s 2 BD + s 2 μWD ) · π s · (π d + c q ), equation (7) can be expressed as: (7a) This one-variable objective function is convex in n s , since the derivative of equation (7a) is positive for all n s in the choice set.

Case A; boundaries of the choice set
With a = b = 1, the choice set boundaries in terms of n s are n s = 1 and n s = n s,max = R π s + π d + c q ; the latter obtained by setting n d = 1 in the budget constraint, equation (4a), and solving for n s . At n s = 1, equation (7a) takes the form: Thus, a positive derivative at n s = 1 occurs when: This gives a necessary and sufficient condition that the optimal allocation of measurements is obtained with n s = 1, and hence with n d = n d,max = R − π s (π d + c q ) measurement occasions per subject.
At the other boundary, n s = n s,max = R π s + π d + c q , the derivative of the objective function is: This derivative is negative only when the first term in the numerator is negative, i.e.A − s 2 BS · (π d + c q ) 2 < 0, or rearranged: This is the necessary and sufficient condition for the optimal allocation being to choose the maximal affordable number of subjects, n s,max = R π s + π d + c q , and measure on one occasion for each of these. Notably, condition (9) is independent of the budget R. Also, unless s 2 BS is zero, the condition is always valid if π s = 0, that is if the recruitment of subjects does not lead to any costs. Under case A, this implies that all measurement occasions entail the same cost, π d +c q , irrespective of how they are allocated between subjects. Thus, in this highly simplified case [38,39], the optimal strategy is always to measure on one occasion from each of as many subjects as allowed by the budget.

Case A; optimization inside the choice set
Setting the derivative of the variance function (7a) equal to zero yields: If this optimal value of n s is an interior solution, i.
, the corresponding number of measurement occasions per subject can be obtained by substitution of equation (10) into equation (4a): Thus, in this case the optimal number of measurement occasions per subject does not depend on the budget R.
The explicit solution derived above for the optimal set (n s , n d ) can lead to non-integer values of one or both numbers. Since both are, by nature, discrete, a post-hoc procedure may be necessary in which integer sets of (n s , n d ) close to the mathematically derived solution are entered into the budget constraint (equation (4)) to check that they are affordable, and into the objective function (equation (3)) to evaluate their statistical performance. For instance, if an interior n s determined by equation (10) is not an integer, the nearest larger and smaller integers are identified, and for each of those, at least two associated integer values of n d are determined that are larger and smaller than the value of n d derived by equation (11). The resulting affordable sets of (n s , n d ) are then examined to identify the one resulting in the smallest mean exposure variance. Table 1 summarizes the derived procedures for optimizing cost efficiency in case A, together with procedures for the other cases, as derived below.
Case B entails constant marginal costs in the recruitment of new subjects but either increasing or decreasing marginal costs for organizing measurement occasions.

Case B; substitution and objective function
In case B, the one-variable problem is most easily solved if the objective function is rearranged so that n s is expressed as a function of n d . From the budget constraint, equation (4a), n s is isolated as: The corresponding objective function is: And its derivative: The objective function (equation (13)) is always convex for b ≥ 2. For 1 <b < 2 it is convex if If none of these inequalities are fulfilled, the optimal measurement strategy will correspond to one of the choice set boundaries.

Case B; boundaries of the choice set
The choice set boundaries in terms of n d are n d = 1 and n d = n d,max . The latter is found by setting n s = 1 in the budget constraint, equation (4a), and rearrange  (4), at n s = 1, that is: n d,max is then the largest value of n d for which c(1, n d ) ≤ R. Figure 3 illustrates an example of this procedure, for three different combinations of (π s , π d , c q ) and two different levels of b, which will reappear in the collection of numerical examples.
At n d = 1, the derivative of the objective function, i.e. equation (14), is equal to: which is positive under the following condition: Thus, for parameter sets obeying this inequality, the optimal sample allocation is to measure for one occasion on each of n s,max = R π s + π d + c q subjects.
At the other boundary, n d = n d,max , the sign of the derivative of the objective function must be obtained by entering the numerically determined value of n d,max in equation (14). A negative s 2 μ (n d,max ) is then a necessary and sufficient condition for the optimal measurement strategy to be to choose one subject and measure record from that subject on n d,max occasions.

Case B; optimization inside the choice set
The objective function, equation (13), cannot be minimized using analytical methods, since s 2 μ (n d ) = 0 (cf. equation (14)) does not have a closed-form solution. Thus, a possible interior optimum must be located by entering all values of n d in the interval [1, n d,max ] into the objective function and locate the minimal result. The corresponding optimal value of n s can be found by entering the identified optimal value of n d in equation (12).
In case C, all measurement occasions for a particular subject can be organized at the same cost, while the cost of recruiting additional subjects changes with their numbers.

Case C; substitution and objective function
In case C, the one-variable problem is most easily solved if the objective function is rearranged to express n d as a function of n s . Isolating n d in the budget constraint, equation (4a), gives And hence the objective variance function in terms of n s is: Taking the derivate with respect to n s yields: Setting A = (s 2 BD + s 2 μWD ) · π s · (π d + c q ), this can be expressed as: It is straightforward to verify that this function is convex in n s and, hence, has one unique minimum.

Case C; boundaries of the choice set
The choice set boundaries in this case are n s = 1 and n s = n s,max . The latter is found by setting n d = 1 in the budget constraint, equation (4a), and solving for n s . This leads to the equation: n α s,max + which does not have a closed-form solution. Thus, similar to the determination of n d,max in case B above, n s,max must be determined by entering increasing values of n s in the cost function c(n s , 1) = π s · n α s + (π d + c q ) · n s until reaching the largest value of n s for which c(n s , 1) ≤ R.
At the boundary n s = 1, the derivative of the objective function, c.f. equation 18a, is: A necessary and sufficient condition for choosing n s = 1, and hence n d = R − π s π d + c q (cf. equation (16); if necessary truncated to the nearest smaller integer), is derived by rearranging the inequality s 2 μ (1) > 0 to give: At the other boundary, n s,max , the sign of the derivative of the objective function must be determined numerically by entering the n s,max identified above into equation (18a). If the sign is negative, n s,max is the optimal number of subjects, and each should be recorded for one occasion.

Case C; optimization inside the choice set
In case C, the equation s 2 μ (n s ) = 0 (cf. equation (18a)) has no closed-form solution. Thus, an interior solution to the optimization must be identified by entering all n s in the interval [1, n s,max ] into the objective function, i.e. equation (17), and locate the minimal variance. After having identified the optimal n s , the corresponding n d can be found by solving equation (16).

Case D: a≠1, b≠1
In case D, neither n s nor n d can be expressed as a function of the other on basis of the budget constraint. Thus, a one-variable problem cannot be formulated in explicit terms, and, consequently, no analytical expressions can be developed, neither for the derivative of the objective function, nor for boundary conditions, nor for possible interior solutions. Therefore, the optimal choice of the number of subjects and measurement occasions has to be identified by means of a numerical procedure, such as the following: (1) For n s = 1, the cost function, equation (4), is c(1, n d ) = π s + π d · n β d + n d · c q . In this function, increasing n d -values are entered, up to largest possible value, n d,max , for which c(1, n d ) ≤ R; (2) The values (n s , n d ) = (1, n d,max1 ) are entered into the objective function, equation (3) These two steps are repeated for n s = 2, corresponding to the cost functionc(2, n d ) = π s · 2 α + 2 · (π d · n β d + n d · c q ), thus obtaining the value of s 2 μ (2, n d,max 2 ) (4) Subsequent values of s 2 μ (n s , n d,max ns ) are derived using this same procedure for stepwise increasing n s , until reaching the largest possible n s allowed by the budget.
(5) By inspecting the set of values of s 2 μ (n s , n d,max ns ), which all entail costs as close as possible to the budget constraint R, the combination of n s and n d offering the smallest variance can be identified. Figure 4 illustrates the numerical procedure for identifying the maximal possible value of n d at increasing values of n s , and the resulting variance of the exposure mean. Since the values of n s and n d are discrete, and hence even the corresponding total cost c(n s , n d ), it may happen that the optimal measurement strategy does not consume the entire budget R. For instance, the optimal strategy (n s , n d ) = (5, 12) identified in figure 4 only utilizes 98.3% of the allowed resources.

Numerical examples
Using the procedures developed above, optimal sampling strategies were identified for 225 scenarios representing different combinations of costs and variance components, and different marginal costs of recruiting new subjects and organizing more measurement occasions, as expressed through a and b (table 2). Unit costs π s , π d and c q were selected to illustrate large, medium and small costs of recruiting subjects relative to obtaining measurements on each of them, and the sets of variance components s 2 BS , s 2 BD and s 2 μWD represent large, medium and small between-subjects to within-subject variance ratios. Parameter values were chosen so that the total cost of assessing the exposure of one subject at one occasion (cf. equation (4)) as well as the resulting mean exposure variance (cf. equation (3)) takes the same numerical value (22) in all scenarios. In all scenarios, the budget R was constrained at 500 (arbitrary units). In median, the 225 strategies utilized 97.9% of the allowed budget (5 th -95 th percentile range: 92.3% to 100.0%).
As illustrated in table 2, the optimally cost-efficient strategy in many scenarios is to obtain data on one occasion from as many subjects as possible. In particular, this applies when s 2 BS is "large" relative to s 2 BD and s 2 μWD (table 2c), and even when s 2 BS is similar to (s 2 BD +s 2 μWD ) if π s is also equal to or smaller than (π d +c q ) (table 2b). In these cases, the principle of measuring from as many subjects as possible is valid irrespective of whether cost functions are linear or not, i.e. irrespective of the sizes of a and b.
Considerable deviations from the principle of collecting data from as many subjects as possible do, however, occur; the most extreme examples appearing when s 2 BS is "small" relative to s 2 BD and s 2 μWD and π s is "large" compared to (π d +c q ) and a is "large" (bottom right corner of table 2a). The combination of a "small" variance between subjects and "large" costs associated with recruiting subjects also leads to the optimal sampling strategy being particularly sensitive to non-linearities in costs. Thus, with (s 2 BS , s 2 BD , s 2 μWD ) = (2, 10, 10) and (π s , π d , c q ) = (20, 1, 1), a linear cost function implies an optimal sampling strategy of (n s , n d ) = (13,9) (table 2a), while the deviations of a and b from 1 illustrated in table 2 result in optimal strategies (n s , n d ) ranging from (5,12) to (49,5), and corresponding variances s 2 μ between 0.12 and 0.73. In contrast, with (s 2 BS , s 2 BD , s 2 μWD ) = (20, 1, 1) and (π s , π d , c q ) = (2, 10, 10) (table  2c), the most extreme non-linear cost functions lead to sampling strategies, (n s , n d ) = (24, 1) and (n s , n d ) = (17, 1), which do not deviate much from the optimal strategy in the linear case, (n s , n d ) = (22,1), and only result in moderate differences in variance.
While not illustrated in table 2, a larger total budget leads to a wider occurrence of the optimal strategy being to collect data on one occasion per subject. Thus, with a budget of 500, 135 of the 225 scenarios illustrated in table 2 imply that data should be collected according to this principle; if the budget is increased to 1000, this count increases to 139. However, in 3 cases the optimal strategy changes in the opposite direction, i. e. into collecting data on more than one occasion per subject. This was caused by irregularities due to the effect of n s and n d needing to be integers. With a decreasing budget, one-occasion-per-subject optima get rarer, as expected, but irregularities occur more often.
Even if non-linearities in cost functions may not affect the principle of how to allocate measurements at many combinations of unit costs and variance components, the size of a is always important to the eventual size of the data set, and therefore to the precision of the eventual mean exposure estimate. In contrast, the size of b is only important if the optimal strategy implies, or is close to implying, measurements from more than one occasion per subject, that is when s 2 BS is "small" relative to s 2 BD and s 2 μWD (table 2a), but even when s 2 BS is similar to (s 2 BD +s 2 μWD ) if π s is also larger than (π d +c q ) (table  2b). This is an expected result, since the cost of setting up measurement occasions is independent of b at n d = 1 (cf. equation (4)). Thus, when analyzing whether an intended exposure assessment strategy, constrained by budgets, will lead to a sufficient statistical performance, access to a valid estimate of a is generally more important than knowing the exact size of b.
While the size of b is not always important to size of the optimal data set, the best statistical performance at any specific combination of (s 2 BS , s 2 BD , s 2 μWD ) and (π s , π d , c q ) will always be obtained with small sizes of a and b; in table 2 exemplified by (a, b) = (0.50, 0.50). This is a reasonable result, since small a and b entail small marginal costs of including more subjects and more measurement occasions.
Although not illustrated in table 2, the effects on statistical performance of deviating from the optimal choice of (n s , n d ), but still using the entire budget, were also investigated. In certain cases, deviations did not lead to any particular reduction of performance.  Table 2 Optimal sampling strategies (n s , n d ) and the resulting mean exposure variance s 2 μ (cf. equation (3)) at different combinations of variance components (s 2 BS ,s 2 BD ,s 2 μWD ; sections a-c), unit costs (π s , π d , c q ), and exponents α and β describing the shape of the relationship between costs and number of measurements (cf. equation (4) strategy is to choose (n s , n d ) = (21,9), resulting in a variance of 0.20 (cf . table 2a). However, all strategies with n s in the range between 15 and 32, and corresponding values of n d,maxns ranging from 15 to 4 as allowed by the budget, resulted in variances of 0.22 or less, except for the strategy (30, 4) which gave a variance of 0.23 because it only managed to utilize 92% of the available budget. In other cases, performance was more sensitive to non-optimal choices of (n s , n d ). Again using (s 2 BS , s 2 BD , s 2 μWD ) = (2, 10, 10) and (a, b) = (0.75, 0.75), the optimal strategy with (π s , π d , c q ) = (2, 10, 10) is now (n s , n d ) = (13, 2), resulting in a variance of 0.92 (table 2a). In this case, all strategies allowed by the budget besides the nearest neighbour, (n s , n d ) = (12, 2), gave variances of 1.09 or more, i.e. at least 18% larger than the optimum.

Discussion
As illustrated by the numerical examples in table 2, a large ratio of between-subjects to within-subject variance generally implies that the optimal allocation principle is to collect data on one occasion from as many subjects as allowed by the budget. This also applies when between-subjects and within-subject variances are of similar size, unless the unit cost of recruiting subjects is large relative to that of setting up measurement occasions. In these cases, non-linearity in the cost functions does not influence the optimal allocation principle; only the eventual size of the data set allowed by budgets. However, at a large relative recruitment cost combined with a small between-subjects to within-subject variance ratio, and in particular if the total budget is also small, the optimal sampling strategy may consist in approaching only a few subjects on several occasions each, and the strategy is very sensitive to non-linearities in cost functions. Non-linearities in subject recruitment costs always have a clear influence on the size of the optimal data set, while non-linearities in costs for setting up measurement occasions are important only in cases when the optimal strategy includes multiple measurements per subject.

Statistical model
The present study investigated a hierarchical, nested measurement model with three stages as used in a majority of previous studies of the effects of random measurement error on statistical properties and efficiency in exposure assessment (e.g. [2,12,[26][27][28]). Even though the application exemplified in the paper refers to subjects, measurement occasions within subjects, and measurement units within occasions, the generic results are applicable also to other sources of exposure variability that can be described by a hierarchical model. This includes the case of data processing and analysis adding "post-sampling" costs and also some methodological variance to each collected exposure sample, thus modifying the sizes of c q (equation (4)) and s 2 μWD (equation (3)), respectively. Also, the present study addressed, as most other studies, the case of balanced data sampling, i.e. that the same number of measurement units are collected during each of the same number of occasions from each subject [23]. While the assumption of a balanced, hierarchical model facilitates mathematical derivation of optimal measurement strategies, cost-efficiency needs to be investigated even for more complicated models, for instance designs including crossed components [11,29]. In particular, the effects of unbalancedness, which is probably a very frequent incident in epidemiologic research, need to be addressed in further studies. Unbalancedness has been shown both mathematically [23,57] and empirically [58] to reduce statistical efficiency, and will thus also influence costefficiency.
During the last decade, powerful statistical techniques have been developed to analyse exposure variability and its determinants using so-called mixed-effect modelling [30][31][32][33]59]. While mixed model analyses have Table 2 Optimal sampling strategies (n s , n d ) and the resulting mean exposure variance s 2 μ (cf. equation (3)) at different combinations of variance components (s 2 BS ,s 2 BD ,s 2 μWD ; sections a-c), unit costs (π s , π d , c q ), and exponents α and β describing the shape of the relationship between costs and number of measurements (cf. equation (4) In all cases, the available budget R was set at 500. The determination of the boldface strategy in section a was illustrated in figure 4.
predominantly been used to identify exposure targets for effective prevention and intervention, they also represent a challenging opportunity to develop exposure assessment strategies that are both "cheap" and statistically efficient.
As an example, several occupational studies have proposed or implemented the idea of estimating full-shift job exposures by combining observed or self-reported time proportions of tasks in the job with task exposures from a data base [60][61][62][63][64][65]. In some studies, the task-based estimates appeared easy to obtain and, at the same time, well correlated with "true" job exposures (e.g. [66]), while other studies indicate that task-based procedures can also be grossly inefficient [64,65]. Some attention has been given to developing mathematical principles for assessing the statistical performance of task-based exposure modelling [34,67], but no studies have so-far, to our knowledge, addressed if task-based assessment can, indeed, be costefficient as compared to direct measurement of job exposures, and if so, on which conditions. A similar concern can be raised with respect to other techniques for combining exposure information from different sources into a "hybrid" estimate of some exposure metric [68]. The approach can be statistically informative [68], but might also entail costs to the extent that the trade-off between efficiency and resource consumption is disadvantageous as compared to measuring "true" exposures directly.

Statistical performance criterion
The present study addressed the objective of obtaining a precise estimate of the exposure mean value in a group of subjects (cf. equation (3)), the reason being that precision of the mean is a decisive factor for the usefulness of exposure surveys, and for statistical power in studies comparing conditions and groups. Other measures of statistical performance will, however, be of interest in other types of epidemiologic research, and thus need attention in future cost-efficiency research. A particularly important example is the size of bias and/or precision in a regression of outcome on exposure [19][20][21][22].
Since both bias and precision can, under a number of assumptions, be expressed as mathematical functions of variance components and the number of measurements [18], it might be possible to develop closed-form solutions to the problem of finding optimally cost-efficient measurement strategies, but this has not so-far been pursued. Another example that an exposure assessment strategy may have another purpose than producing a satisfying group exposure mean is standard surveillance of compliance with occupational exposure limits (OEL). First, the assessment focuses on individuals rather than groups, and second, the strategy needs assure that both the individual mean and the probability that single exposure values exceed the OEL is determined with a satisfying certainty [16,17]. Still another relevant measure of statistical performance for several purposes is the size of the standard reliability coefficient (ICC), i.e. the relationship between exposure variability in data sets with and without (random) measurement error [41]. Obviously, both for regression metrics, exceedance, and ICCs, optimally cost-efficient exposure assessment strategies may deviate from those driven by the objective of obtaining precise exposure means, as illustrated by two studies on optimal measurement allocation in reliability studies [69,70].
A particularly challenging situation comes up if the exposure assessment strategy has two simultaneous, yet conflicting objectives. For instance, the researcher may, at the same time, wish to get a precise estimate of a group mean exposure, but also a good estimate of exposure variance components between and within workers. This is a likely scenario if the specific exposure variability of the addressed occupational group is a priori insufficiently known, and the exposure data collection is viewed as an opportunity to get updated data on this variability, together with a documentation of the group mean exposure. Determination of variance components requires, as a minimum, duplicate samples at each stage of the measurement model [5], and this may often not be an optimally cost-efficient strategy if the objective is to get a precise group mean (cf. table 2a-c; cases with n d = 1). Thus, the researcher faces the decision of whether a certain loss in information on the group mean is an acceptable "price" of getting some information on exposure variability. While the numerical tradeoff between these two types of information, conditional on a restricted budget, may be resolved in future research, the final decision of which sampling allocation to prefer is an issue beyond mathematical procedures.

Recruitment capabilities and cost functions
While presenting a novel approach in allowing recruitment capabilities and, as a consequence, the corresponding cost functions to be non-linear, the present study only addressed the case when non-linearities can be expressed using homogeneous functions. This type of non-linear production capabilities is often assumed in economics research, but other types of mathematical relationships may, obviously, be appropriate. Even cost functions that do not follow monotonous mathematical rules may apply, as illustrated by the example in Duan and Mage [42], where the basic shape of the cost function changes with the number of measurements, and by some examples in Cochran's excellent textbook [47]. We claim a strong need to bring forward more empirical evidence to suggest the appropriate shape of cost functions in exposure assessment; and if power relationships are, indeed, supported, to indicate reasonable sizes of the exponents a and b. Hypothetically, the recruitment of subjects could entail increasing marginal costs (a>1), as if additional time has to be devoted to persuading initially reluctant participants, but also decreasing costs (a<1), as if the first subjects are hard to recruit but their skeptic colleagues, taking after them, will then readily participate. Also, both increasing and decreasing marginal costs for organizing measurement occasions can be envisaged, as if a measurement equipment wears down over time and needs to be in place longer to provide a certain amount of data (b >1), or if a subject gets more and more accustomed to measurement preparations and thus less time consuming (b <1). As a tentative conjecture, however, considerable deviations of a from 1 are more likely to occur than deviations of b. In addition to the need for empirical data describing the shape of cost functions, information is also required concerning the size of unit costs for measuring at different stages; very little data has been reported in occupational or environmental epidemiology [37,43]. This stands in a striking contrast to the abundance of data on variance components for a multitude of occupational and environmental exposures, showing that the size of and relationship between exposure variabilities at different stages of measurement, e.g. subjects and occasions within subjects, differ widely between settings and exposure agents [3,[9][10][11]25,71,72].
In the present study, optimization procedures were developed using a total cost model including only variable cost components (equation (4)). Other studies have addressed even fixed costs, i.e. costs that do not depend on the number of measurements [41,43]. While fixed costs are, under a constrained budget, decisive to the resources left for allocating measurements, they cancel out in the course of the mathematical differentiation associated with the optimization procedure, and thus will not affect the eventual optimal allocation strategy [43]. It is, however, important to notice that the optimization procedures in the present paper all refer to budgets where possible fixed costs have already been accounted for.

Analytical vs. numerical optimization
A complete closed-form mathematical solution to costefficiency optimization was possible only when cost functions were linear, i.e. (a, b) = (1, 1), and in this case the allocation algorithms were consistent with previous studies [43,44,46,47]. When either a or b deviated from 1, neither the choice set boundaries nor an internal optimum could be explicitly determined, and if both deviated together, all optimization steps had to be performed using numerical methods. This suggests that explicit, formal expressions defining cost-efficient measurement allocations may only be obtainable if both cost functions and expressions of statistical performance are mathematically very simple. Thus, numerical optimization procedures might be the only alternative if, for instance, the objective (in casu variance) function contains not only nested components [11,29], or if the cost model does not express a straight-forward relationship with the number of measurements [42]. This points to the idea of basing all optimization on numerical methods and ignore explicit solutions even in those cases where they do exist. However, we believe that mathematical expressions as developed in this paper may still be helpful as a screening tool for deciding whether the optimal strategy needs further (numerical) consideration, or whether it is merely situated at the boundary of the choice set, as in those frequent cases where as many subjects as possible should be measured on one occasion each (cf. table 2).

The basic cost model
One important result of the present investigation was that for many combinations of unit costs and variance components, non-linear cost functions did not change the general principle stated by a linear model: to measure from as many subjects as possible on one occasion each (cf. table 2). Thus, under these particular circumstances, the principle of how to optimize exposure assessment was not sensitive to the cost model, even if the eventual size of the data set allowed by budget constraints was influenced by non-linearities in subject recruitment costs. At other combinations of variance components and unit costs, in particular when betweensubject variability was small compared to within-subject variability and subject recruitment costs at the same time were large compared to costs for setting up measurement occasions, non-linearities did, however, strongly affect both the optimal allocation principle and the eventual statistical performance. While, as mentioned above, examples of small between-to within-subject ratios of variance are abundant in the literature, relative sizes of unit costs are largely unknown, and thus we do not consider it justified so-far to form an opinion on the actual occurrence of such sensitive scenarios.

Uncertainties in input parameters
The procedures developed in the present study for identifying optimal exposure assessment strategies, whether analytical or numerical, rely on known values of unit costs, exponents in the cost function, and variance components. However, in a specific epidemiologic study, all of these inputs need be based on estimates associated with some degree of uncertainty. Thus, the derived "optimal" exposure assessment strategy will, in itself, be uncertain. Similar to the issue of cost function sensitivity discussed above, the principle of how to optimize exposure assessment seem, however, to be very robust to changes in unit costs and variance components when between-subject variability is large compared to withinsubject variability and subject recruitment costs are small or similar to costs for setting up measurement occasions (table 2). Even the size of the eventual data set is robust to changes in exposure variability, as long as recruitment costs are small (table 2). If, however, recruitment unit costs are large, both the allocation and size of the optimal strategy is highly sensitive to the size of variance components, especially if recruitment costs accelerate with the number of subjects (a>1).
Even when closed-form solutions are available for estimating the optimal choice of subjects and measurement occasions (equations (10) and (11)), a corresponding analytical expression of the uncertainty of these estimates may not be readily available. Optimization using numerical procedures evidently precludes any explicit mathematical representation of uncertainty. Thus, systematic analyses of the stability of optimized strategies to fluctuations in input variables need to be performed by numerical methods. Different approaches may then be viable, including Monte Carlo procedures (e.g. [73]), which will, however, require estimates of the distributions of input variables; and large-scale resampling from empirical distributions as in bootstrapping [74]. Bootstrapping has been used successfully to address uncertainty in several occupational studies addressing exposure sampling efficiency [27,53,75], and is especially useful in cases when analytical methods are unavailable [12] or when assumptions underlying the analytical models are probably violated [35,54]. Bootstrap-based analysis of uncertainty has also been used successfully in health economics [76]. However, bootstrapping requires access to -preferably large -empirical data sets that can be used to represent the distributions of necessary variables. In the case of cost-efficiency optimization, this implies that extensive data, not available at present, are needed on unit costs, exponents in the cost function, and exposure variance components.

Deviations from the optimal strategy
For pragmatic reasons, exposure assessments in working life will rarely be carried out as planned (e.g. [37]). Thus, an intended optimal strategy may, in effect, be realized by collecting numbers of measurement units at different stages that deviate from the optimal choice, even if the total budget is still consumed. Presumably, the most likely deviations to occur appear in the form of slight departures from a completely balanced data set; for instance that some measurement occasions fail for some subjects but are compensated by more occasions from others. As noted from the numerical examples (table 2), statistical performance seems to be considerably more sensitive to non-optimal strategies at some combinations of variance components, unit costs and cost function exponents than at others. However, this result concerns only non-optimal strategies that are still balanced. The effects of unbalanced reallocations of measurements, which still consume the allowed budget, need to be determined in future studies. When facing scenarios that will be sensitive to deviations from the optimal strategy, we suggest, however, preparing for likely departures by designing an intentional oversampling.

Comparing cost-efficiencies Comparing measurement allocations
Some previous studies on cost-efficient data collection have been devoted to comparing two or more alternative measurement strategies with respect to cost and efficiency, rather than identifying an optimal strategy. Thus, Armstrong compared the properties of two different instruments for retrieving the same exposure data [40,41], while Lemasters et al. [38] and Shukla et al. [39] devoted their studies to comparing different allocations of measurements using the same instrument. In the two latter studies, probably none of the compared strategies were optimal, but they were meant to represent feasible strategies in terms of e.g. logistics and selection constraints. The comparison approach to cost-efficiency analysis is considerably easier to deal with from a mathematical viewpoint than optimization as addressed in the present paper. A mere comparison also allows for both cost and output variance functions that cannot be addressed by analytical optimization procedures. Abstaining from optimization may thus represent a pragmatic level of analysis in cases where the principal objective is to decide for one of a number of possible exposure assessment strategies rather than determining an absolute optimum.

Comparing measurement instruments
While, as mentioned, some previous studies have addressed the issue of comparing the cost-efficiency of two alternative methods for obtaining the same exposure variable(s) [40,41], no attempts have been made on comparing two instruments in terms of their optimal performance under a constrained budget. This is an issue of obvious importance to a researcher or practitioner facing a decision on investments in new equipment or staff. For many occupational and environmental exposures, several alternative measurement instruments are available. For instance, working postures can be recorded using self-reports, observations and direct measurement tools [77,78]; i.e. methods associated with different costs and different statistical performance [79,80]. The procedures developed in the present paper can be used to identify an optimal measurement strategy for each method separately, including the resulting statistical performance, on which basis a comparison can be made. In this case, it is particularly important to acknowledge fixed costs with either method, since they determine the budget left for optimization.

Conclusion
In the present study, we demonstrated that non-linearities in costs functions can have a significant influence on the principle of how to optimally allocate measurements between subjects and occasions within subjects. This happens if costs for recruiting subjects are large compared to costs for setting up measurement occasions, and, at the same time, the between-subjects to within-subject variance ratio is small. If, on the other hand, the between-subjects variance is larger than or similar to the within-subject variance, non-linearities do not, in general, change the supremacy of measuring at one occasion from each of as many subjects as allowed by the budget. This principle applies in particular if the budget is large. Irrespective of the extent of exposure variability, however, non-linear subject recruitment costs will affect the eventual size of the exposure data sample, and hence the precision of the resulting exposure mean value.
We noted a remarkable scarcity of empirical data on appropriate approximations of cost functions in exposure assessment, as well as on the sizes of costs pertaining to different measurement stages, for instance subjects and occasions within subjects.
Thus, in epidemiologic research requiring reliable exposure mean values, we suggest that exposure assessment strategies are discussed a priori, using the procedures developed in the present paper on educated estimates of relevant variance components, unit costs, and cost function shapes. This should lead to informed decisions on measurement strategies that pursue an optimal use of monetary resources, with due consideration as to whether the obtainable statistical performance is sufficient.