Detecting and correcting the bias of unmeasured factors using perturbation analysis: a datamining approach
 WenChung Lee^{1, 2}Email author
DOI: 10.1186/147122881418
© Lee; licensee BioMed Central Ltd. 2014
Received: 8 May 2013
Accepted: 27 January 2014
Published: 5 February 2014
Abstract
Background
The randomized controlled study is the goldstandard research method in biomedicine. In contrast, the validity of a (nonrandomized) observational study is often questioned because of unknown/unmeasured factors, which may have confounding and/or effectmodifying potential.
Methods
In this paper, the author proposes a perturbation test to detect the bias of unmeasured factors and a perturbation adjustment to correct for such bias. The proposed method circumvents the problem of measuring unknowns by collecting the perturbations of unmeasured factors instead. Specifically, a perturbation is a variable that is readily available (or can be measured easily) and is potentially associated, though perhaps only very weakly, with unmeasured factors. The author conducted extensive computer simulations to provide a proof of concept.
Results
Computer simulations show that, as the number of perturbation variables increases from data mining, the power of the perturbation test increased progressively, up to nearly 100%. In addition, after the perturbation adjustment, the bias decreased progressively, down to nearly 0%.
Conclusions
The datamining perturbation analysis described here is recommended for use in detecting and correcting the bias of unmeasured factors in observational studies.
Keywords
Epidemiologic methods Confounding Data mining Effect modification Bias StandardizationBackground
Correcting the bias of a factor with the confounding and effectmodifying potential shown in Figure 1 presents no major challenge. Here, techniques such as standardization should work well [1]. To perform the correction, factors with biasing potential must be identified and measured in the study. However, this is often not possible due to limited knowledge of what these factors might be or, if we have knowledge of them, the cost constraint of actually measuring them.
This paper presents a novel method, termed perturbation analysis, to detect and correct the bias of unmeasured factors. The method circumvents the problem of measuring unknowns by collecting the perturbations of unmeasured factors instead. A perturbation variable (PV) is a variable that is readily available, or can be measured easily, and is potentially associated, though perhaps only very weakly, with U (Figure 1). Note that a PV is associated with E and D only through U (Figure 1). If this is not the case, then the variable by itself is a classical confounder for the ED relationship and can be adjusted for as such.
As an example, E is asbestos, D is lung cancer, and U is smoking status (unmeasured in the study). Then, PV can be anything not known to be associated with asbestos exposure and lung cancer, but may be associated with smoking status (causally or noncausally, directly or indirectly, positively or negatively), such as personality traits, finger color, breath odor, accessibility to convenience stores, internet usage records, driving records, etc. As another example, E is electromagnetic radiation, D is childhood leukemia, but U is utterly unknown (or perhaps nonexistent). Here, we may try virtually any variable.
However, care must be taken not to include any variable that is associated with the collider of the ED association. A collider, the U′ in Figure 1, is an effect/consequence of both E and D [1]. Controlling a collider (or its perturbations) can aggravate the bias instead of reducing it. To avoid this, one can collect only those PVs measured before D occurs. If all the PVs in a study precede D, the causal temporality principle dictates that no PV can be associated with the colliders of the ED association.
The central tenet of the proposed perturbation analysis is to collect a great number of PVs, i.e., hundreds, thousands, or even more. The quickest way to obtain large numbers of admissible PVs is to put in all the questionnaires and laboratory data that has been collected or measured before D occurs. Another possibility is through record linkage of the study subjects to large existing databases, e.g., data pertaining to health insurance, traffic violations, internet usage, etc., where a great number of variables can be found or defined preceding the study outcome [2]. If the subjects in one study are also taking part in genomewide association studies, the wealth of genomic data (thousands or even millions of genetic markers) could then provide yet another rich source for admissible PVs, particularly because genes can be considered to precede any outcome studied. Essentially, the method represents a datamining approach.
Methods
Bias of unmeasured factors
The numerator in [1] represents the total number of subjects who would have contracted the disease if the whole population were exposed, whereas the denominator is the total number of diseased subjects if the whole population were to be unexposed. As such, the index of SRR represents the causal effect of the exposure in the population at large. However, an observational study with U unmeasured does not permit a calculation of the index.
A hypothetical population with positive/negative confounding U
Level of U (i)  Population number (m_{ i })  Positive confounding  Negative confounding  

Exposure prevalence (p_{ i })  Disease risk among the unexposed (${\mathit{r}}_{\mathit{i}}^{\mathit{u}}$)  Relative risk (${\mathit{r}}_{\mathit{i}}^{\mathit{e}}/{\mathit{r}}_{\mathit{i}}^{\mathit{u}}$)  Exposure prevalence (p_{ i })  Disease risk among the unexposed (${\mathit{r}}_{\mathit{i}}^{\mathit{u}}$)  Relative risk (${\mathit{r}}_{\mathit{i}}^{\mathit{e}}/{\mathit{r}}_{\mathit{i}}^{\mathit{u}}$)  
1  2,500  0.76  0.6667  1.2632  0.24  0.4737  1.7593 
2  2,500  0.60  0.4000  1.1667  0.40  0.3333  2.4000 
3  2,500  0.40  0.3333  1.2000  0.60  0.2000  2.3333 
4  2,500  0.24  0.1579  1.0556  0.76  0.1667  1.8947 
Total  10,000  Crude relative risk (crude RR) = 1.75  Crude relative risk (crude RR) = 1.53  
Standardized relative risk (SRR) = 1.20  Standardized relative risk (SRR) = 2.06 
A hypothetical population without bias
Level of U (i)  Population number (m_{ i })  U is associated with neither E nor D  U is not associated with E  U is not associated with D  

Exposure prevalence (p_{ i })  Disease risk among the unexposed (${\mathit{r}}_{\mathit{i}}^{\mathit{u}}$)  Relative risk (${\mathit{r}}_{\mathit{i}}^{\mathit{e}}/{\mathit{r}}_{\mathit{i}}^{\mathit{u}}$)  Exposure prevalence (p_{ i })  Disease risk among the unexposed (${\mathit{r}}_{\mathit{i}}^{\mathit{u}}$)  Relative risk (${\mathit{r}}_{\mathit{i}}^{\mathit{e}}/{\mathit{r}}_{\mathit{i}}^{\mathit{u}}$)  Exposure prevalence (p_{ i })  Disease risk among the unexposed (${\mathit{r}}_{\mathit{i}}^{\mathit{u}}$)  Relative risk (${\mathit{r}}_{\mathit{i}}^{\mathit{e}}/{\mathit{r}}_{\mathit{i}}^{\mathit{u}}$)  
1  3,000  0.40  0.30  1.50  0.40  0.40  1.54  0.80  0.30  1.50 
2  2,500  0.40  0.30  1.50  0.40  0.30  1.52  0.60  0.30  1.50 
3  2,500  0.40  0.30  1.50  0.40  0.25  1.45  0.40  0.30  1.50 
4  2,000  0.40  0.30  1.50  0.40  0.20  1.42  0.20  0.30  1.50 
Total  10,000  Crude relative risk (crude RR) = 1.50  Crude relative risk (crude RR) = 1.50  Crude relative risk (crude RR) = 1.50  
Standardized relative risk (SRR) = 1.50  Standardized relative risk (SRR) = 1.50  Standardized relative risk (SRR) = 1.50 
Note that the above analysis of bias (the presence/absence of bias and its direction, if present) is in agreement with what was predicted from the potentialoutcome model [4, 5].
Effects of the adjustment of a binary perturbation variable
where, at the j th level of the PV, n_{ j } is the number of subjects, ${\mathit{s}}_{\mathit{j}}^{\mathit{e}}$ is the disease risk for an exposed subject [${\mathit{s}}_{\mathit{j}}^{\mathit{e}}=Pr(\mathrm{D}=1\left\mathrm{E}=1,\mathrm{\text{PV}}=\mathit{j})\right.$], and ${\mathit{s}}_{\mathit{j}}^{\mathit{u}}$ is that for an unexposed subject [${\mathit{s}}_{\mathit{j}}^{\mathit{u}}=Pr(\mathrm{D}=1\left\mathrm{E}=0,\mathrm{\text{PV}}=\mathit{j})\right.$].
Theoretical analysis
where $\mathit{\sigma}{\text{'}}_{{\mathrm{\text{EOR}}}^{1},\phantom{\rule{0.25em}{0ex}}{\mathrm{\text{DRR}}}^{\mathit{e}}}$ and $\mathit{\sigma}{\text{'}}_{\mathrm{\text{EOR}},\phantom{\rule{0.25em}{0ex}}{\mathrm{\text{DRR}}}^{\mathit{u}}}$ again are weighted covariances (the primes indicate that they do not adopt the same weights as in the previous ${\mathit{\sigma}}_{{\mathrm{\text{EOR}}}^{1},{\mathrm{\text{DRR}}}^{\mathit{e}}}$ and ${\mathit{\sigma}}_{\mathrm{\text{EOR}},\phantom{\rule{0.25em}{0ex}}{\mathrm{\text{DRR}}}^{\mathit{u}}}$, respectively), f_{PV} is the ‘variance fraction’ of the PV: ${\mathit{f}}_{\mathrm{\text{PV}}}=\frac{\mathrm{\text{variance in the prevalence of PV across different levels of U}}}{\mathrm{total\; variance}}=\frac{{\mathit{\sigma}}_{\mathrm{\text{PV}}}^{2}}{{\mathit{\mu}}_{\mathrm{\text{PV}}}\times \left(1{\mathit{\mu}}_{\mathrm{\text{PV}}}\right)}$, and a and b are two positive constants of less interest.
From [4], we see that adjusting for a PV where f_{PV} = 0 (an uninformative PV) is not useful: E(log adjusted RR) = log crude RR. However, adjusting for a PV with f_{PV} > 0 (an informative PV) will, on average, push the log adjusted RR away from the log crude RR. Moreover, the direction of this movement correctly indicates where the unknown log SRR might be, i.e., in general we have E(log adjusted RR) < log crude RR if SRR < crude RR (positive confounding) and E(log adjusted RR) > log crude RR if SRR > crude RR (negative confounding). On the other hand, if U is creating no bias from the outset (${\mathit{\sigma}}_{{\mathrm{\text{EOR}}}^{1},\phantom{\rule{0.25em}{0ex}}{\mathrm{\text{DRR}}}^{\mathit{e}}}={\mathit{\sigma}}_{\mathrm{\text{EOR}},\phantom{\rule{0.25em}{0ex}}{\mathrm{\text{DRR}}}^{\mathit{u}}}=\mathit{\sigma}{\text{'}}_{{\mathrm{\text{EOR}}}^{1},\phantom{\rule{0.25em}{0ex}}{\mathrm{\text{DRR}}}^{\mathit{e}}}=\mathit{\sigma}{\text{'}}_{\mathrm{\text{EOR}},\phantom{\rule{0.25em}{0ex}}{\mathrm{\text{DRR}}}^{\mathit{u}}}=0$), there is no need for any further adjustment because the crude RR is already the soughtafter SRR. From [4], we see that in this case, adjusting for a PV (even if f_{PV} > 0) will not perturb the crude RR.
Results
Simulation studies
In situations where the prevalence of PV is distributed as a mixture of beta distributions, the results were basically the same (Additional file 2: Tables S6–S10).
Perturbation analysis using a panel of perturbation variables
As shown in the previous section, adjusting for an informative PV will produce an adjusted RR that is a little closer, on average, to the unknown SRR than the crude RR is. With only one PV, such a minuscule bias reduction may be unremarkable. However, one can construct a powerful perturbation test (described below) to test whether the study at hand is suffering from the bias of unmeasured factors if one can collect large numbers of PVs. Furthermore, one can perform a perturbation adjustment (also described below) to significantly reduce, if not completely eliminate, that bias.
Note that the PVs to be used can be in any measurement scale. For example, for a categorical variable with a total of five levels, one can create a total of four dummy variables as four separate PVs in the perturbation analysis. A continuous variable counts as one PV, but to extract more information, one can categorize and dummycode the variable to input more PVs. Alternatively, one can input the variable itself, along with its square, its cube, and so on. Furthermore, interaction terms (product terms) of any subset of already collected PVs by themselves also count as new PVs. It does not matter if some of the PVs, collected or created, are correlated with one another to some degree, as neither the perturbation test nor the perturbation adjustment needs an independence assumption. Additionally, in order to use the method, one does not need to know anything (parameter or function) related to U, such as L, m_{ i }, p_{ i }, q_{ i }, ${\mathit{r}}_{\mathit{i}}^{\mathit{u}}$, ${\mathit{r}}_{\mathit{i}}^{\mathit{e}}$, ${\mathit{\sigma}}_{{\mathrm{\text{EOR}}}^{1},\phantom{\rule{0.25em}{0ex}}{\mathrm{\text{DRR}}}^{\mathit{e}}}$, ${\mathit{\sigma}}_{\mathrm{\text{EOR}},\phantom{\rule{0.25em}{0ex}}{\mathrm{\text{DRR}}}^{\mathit{u}}}$, $\mathit{\sigma}{\text{'}}_{{\mathrm{\text{EOR}}}^{1},\phantom{\rule{0.25em}{0ex}}{\mathrm{\text{DRR}}}^{\mathit{e}}}$, $\mathit{\sigma}{\text{'}}_{\mathrm{\text{EOR}},\phantom{\rule{0.25em}{0ex}}{\mathrm{\text{DRR}}}^{\mathit{u}}}$, μ_{PV}, ${\mathit{\sigma}}_{\mathrm{\text{PV}}}^{2}$, or f_{PV}, etc.
Perturbation test
where θ_{ k } is the adjusted RR pertaining to the k th PV. From the previous section, we know that under the null hypothesis of no unmeasured confounding, the expected value of the log adjusted RR should equal the log crude RR. Under the alternative, the expected value of the log adjusted RR will be lower (positive confounding) or higher (negative confounding) than the crude RR. Therefore, the value of T should tend to be larger under the alternative hypothesis than under the null hypothesis.
Because the PVs may not be independent of one another, the ordinary chisquare distribution may not be appropriate for T. Here, we resort to permutation analysis to find a critical value for T. To be precise, we fix the vectors (PV_{1}, PV_{2}, …, PV_{ m }) and shuffle the vectors of (E, D) among the study subjects (or vice versa). Such permutations are to be performed many times, with a T value calculated each time. The critical value for a significant level of α is then the (1 − α) × 100 percentile of these permutated T values.
Perturbation adjustment
To correct the bias of unmeasured factors, one may be tempted to adjust for the whole panel of m PVs simultaneously. However, in doing so, one will run into a dimensionality problem. For example, a panel of 20 binary PVs taken together amounts to a supervariable S, with 1,048,676 levels, while in a typical study, the total number of subjects enrolled (n) is far less than that number. Therefore, each subject essentially occupies a different level of S, making adjustments of S impossible.
 1.
Start with individual subjects. Let each subject reside in a distinct cluster so that there are as many clusters as subjects.
 2.
Calculate the distance between any two clusters (for example, the A and the B clusters): ${\mathit{D}}^{\mathrm{A},\mathrm{B}}=\sqrt{\frac{1}{\mathit{m}}\sum _{\mathit{k}=1}^{\mathit{m}}{\left({\mathrm{\text{PV}}}_{\mathit{k}}^{\mathrm{A}}{\mathrm{\text{PV}}}_{\mathit{k}}^{\mathrm{B}}\right)}^{2}},$ where ${\mathrm{\text{PV}}}_{\mathit{k}}^{\mathrm{A}}$ (${\mathrm{\text{PV}}}_{\mathit{k}}^{\mathrm{B}}$) is the k th PV of the subject in the A(B) cluster.
 3.
The two clusters (for example, the C and the D clusters) with the smallest distance between them are merged into one cluster (call this the CD cluster).
 4.
The distance between the newly formed cluster and any other cluster (for example, the E cluster) is calculated as D ^{CD,E} = max(D ^{C,E}, D ^{D,E}), according to the completelinkage criterion [7].
 5.
Repeat Steps 3 and 4 until there are at least a prespecified number of subjects (n _{ c }, for example n _{ c } = 20) in each cluster.
Treating these clusters as different levels of the panel of PVs, we then use formula [3] to calculate an adjusted RR. Note that we assume that U itself does not contain too many levels beyond what the sample size of a study can handle, i.e., we assume $\mathit{L}<\frac{\mathit{n}}{{\mathit{n}}_{\mathit{c}}}$.
Results
Simulation studies
To study the performances of the perturbation test and adjustment, a panel of PVs for the hypothetical population in Table 1 was simulated. As before, the prevalences of the PVs in the four levels of U were assumed to arrive from the beta distributions. The mean prevalences (across the four levels of U), ${\mathit{\mu}}_{{\mathrm{\text{PV}}}_{1}},{\mathit{\mu}}_{{\mathrm{\text{PV}}}_{2}},\dots ,{\mathit{\mu}}_{{\mathrm{\text{PV}}}_{\mathit{m}}}$, were assumed to arrive from a U(0.05,0.95) distribution. The variance fractions, f_{PV} s, were assumed to be constant for the panel of PVs and are examined for f_{PV} = 0.05 and 0.025. A total of 200 subjects (n = 200) were randomly sampled from this population. For a given subject, the values of his/her PV_{1}, PV_{2}, …, PV_{ m } were assumed to be independent of one another and were generated from m Bernoulli distributions according to the prevalence values of their U levels, without regard to their E and D statuses. One thousand simulations were performed for each scenario. The index of operating characteristic was used to measure the performance of the perturbation test. The operating characteristic of a test is its statistical power averaged over a U(0,1)distributed α level; it is a value between 0.5 (no power at all) and 1.0 (highest power possible).
When the prevalences of PVs were distributed as a mixture of beta distributions, the results were basically the same (Additional file 3: Figure S1). Additionally, for the situation where the values of PV_{1}, PV_{2}, …, PV_{ m } within a subject were dependent, and for the situation where the panel of PVs contained a certain proportion of pure noise (PVs that are not associated with U: f_{PV} = 0), definite detection of bias and/or complete removal of bias were also possible, if with an even larger panel of PVs (Additional file 3: Figures S2 and S3).
Discussion
We will now comment on why our perturbation analysis using a panel of PVs should work. The perturbation test proposed in this paper centers on the fact that adjusting for an informative PV (a variable associated with U) will produce a value that is on average larger or lower than the crude RR as necessary to be closer to the unknown SRR. This is true irrespective of whether the association between U and PV is positive or negative. Because the adjustments all point in the same direction, we can calculate a test statistic, as in [5], without worrying that the effects of the positively and negatively associated PVs are being cancelled out. Notably, the perturbation adjustment proposed in this paper is based on distances in high dimension. Hall P, 2005 [8] and [9] studied the geometric properties of highdimension and lowsamplesize data. They showed that under very mild conditions, as the dimension (the number of PVs) approaches infinity, the distance between any two subjects in the same group (at the same level of U) will converge to a certain value, while the distance between any two subjects in different groups (at different levels of U) will converge to another (larger) value. Therefore, by calculating pairwise distances in sufficiently high dimension, the group memberships of the study subjects can be resolved, and U can be reconstructed almost perfectly.
The proposed method relies on collecting as many PVs as possible. This is in contrast to other approaches dealing with unmeasured confounding, such as the methods of negative control [10, 11], the instrumental variable [12, 13], and the latent variable [14], where only one or a few variables are considered. The method is also completely datadriven such that a researcher simply lets the data (consisting of E, D, and a panel of PVs) speak for themselves. This is in contrast to a sensitivity analysis of unmeasured confounding where one needs to specify the sensitivity parameters or assume distributions for them [1, 15, 16].
There is much work to be performed in order to further validate the proposed method. First, this paper is only a proofofconcept study. Further studies are needed to test the methodology with real data. Second, additional work is needed to design an optimal coding scheme to extract maximum information from categorical/continuous PVs and a weighting system to optimally combine the many different PVs in the panel in order to maximize the efficiency of the perturbation analysis. Third, the method is currently discussed only on the SRR using the whole population as the target. It will be worthwhile to develop the corresponding methodology for an SRR with the exposed, unexposed, or completely external population as the target. Finally, casting the present method in a proper regression framework should prove useful for accommodating more than two exposures and other confounders that are measured in the study.
Conclusions
In summary, this study shows that, as the number of PVs increases, the power of the perturbation test increases (progressively up to nearly 100%) and the bias after the perturbation adjustment decreases (progressively down to nearly 0%). Such a datamining approach is recommended for use in detecting and correcting the biases of unmeasured factors in observation studies.
Author’s contributions
WCL developed the methods, carried out the simulations, and drafted the manuscript. He read and approved the final manuscript.
Abbreviations
 CRR:

Confounding risk ratio
 DRR:

Disease risk ratio
 EOR:

Exposure odds ratio
 PV:

Perturbation variable
 RR:

Relative risk
 SRR:

Standardized relative risk.
Declarations
Acknowledgements
This paper is partly supported by grants from National Science Council, Taiwan (NSC 1022628B002036MY3) and National Taiwan University, Taiwan (NTUCESRP102R76228). No additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The author thanks Miss HsiaoYuan Huang for technical supports.
Authors’ Affiliations
References
 Rothman KJ, Greenland S, Lash TL: Modern Epidemiology. 2008, Philadelphia: Lippincott, 3
 Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA: Highdimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009, 20: 512522. 10.1097/EDE.0b013e3181a663cc.PubMed CentralView ArticlePubMed
 Lee WC: Bounding the bias of unmeasured factors with confounding and effectmodifying potentials. Stat Med. 2011, 30: 10071017. 10.1002/sim.4151.View ArticlePubMed
 Vander Weele TJ: The sign of the bias of unmeasured confounding. Biometrics. 2008, 64: 702706. 10.1111/j.15410420.2007.00957.x.View Article
 Chiba Y: The sign of the unmeasured confounding bias under various standard populations. Biom J. 2009, 51: 670676. 10.1002/bimj.200800195.View ArticlePubMed
 Ogburn EL, Vander Weele TJ: On the nondifferential misclassification of a binary confounder. Epidemiology. 2012, 23: 433439. 10.1097/EDE.0b013e31824d1f63.PubMed CentralView ArticlePubMed
 Johnson RA, Wichern DW: Applied Multivariate Statistical Analysis. 1992, New Jersey: PrenticeHall International, 3
 Hall P, Marron JS, Neeman A: Geometric representation of high dimension, low sample size data. J Royal Stat Soc (series B). 2005, 67: 427444. 10.1111/j.14679868.2005.00510.x.View Article
 Ahn J, Marron JS, Muller KM, Chi YY: The highdimension, lowsamplesize geometric representation holds under mild conditions. Biometrika. 2007, 94: 760766. 10.1093/biomet/asm050.View Article
 Lipsitch M, Tchetgen ET, Cohen T: Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology. 2010, 21: 383388. 10.1097/EDE.0b013e3181d61eeb.PubMed CentralView ArticlePubMed
 Flanders WD, Klein M, Darrow LA, Strickland MJ, Sarnat SE, Sarnat JA, Waller LA, Winquist A, Tolbert PE: A method for detection of residual confounding in timeseries and other observational studies. Epidemiology. 2011, 22: 5967. 10.1097/EDE.0b013e3181fdcabe.PubMed CentralView ArticlePubMed
 Martens EP, Pestman WR, de Boer A, Belitser SV, Klungel OH: Instrumental variables: application and limitations. Epidemiology. 2006, 17: 260267. 10.1097/01.ede.0000215160.88317.cb.View ArticlePubMed
 Hernan MA, Robins JM: Instruments for causal inference: an epidemiologist’s dream?. Epidemiology. 2006, 17: 360372. 10.1097/01.ede.0000222409.00878.37.View ArticlePubMed
 Gilthorpe MS, Harrison WJ, Downing A, Roman D, West RM: Multilevel latent class casemix modeling: a novel approach to accommodate patient casemix. BMC Health Serv Res. 2011, 11: 5310.1186/147269631153.PubMed CentralView ArticlePubMed
 Steenland K, Greenland S: Monte Carlo sensitivity analysis and Bayesian analysis of smoking as an unmeasured confounder in a study of silica and lung cancer. Am J Epidemiol. 2004, 160: 384392. 10.1093/aje/kwh211.View ArticlePubMed
 Vander Weele TJ, Arah OA: Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology. 2011, 22: 4252. 10.1097/EDE.0b013e3181f74493.View Article
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/14/18/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.