Factor analysis has been leveraged in epidemiologic research to frame broad and often complex symptom and health outcome patterns through the intercorrelations of observable symptoms and conditions. This analytical approach can be used as an exploratory tool to complement additional analyses or as a tool to understand underlying patterns in data. This study involving a large healthy military population applied exploratory factor analysis to a large dataset of self-reported symptoms, using techniques that are appropriate for binary, ordinal, and potentially incomplete data. Our exploratory analysis yielded insight into the interrelations of many self-reported physical and psychological symptoms obtained through standardized survey methods. While the factor analytic framework provided many intuitive symptom groupings, some aspects of the factor loading matrix warrant further discussion and investigation. Our finding of 14 factors that describe 60 percent of the variance of 89 variables underscores the complex set of constructs included in the Millennium Cohort questionnaire and quantifies a reasonable amount of overlap of these constructs. This assured us that the number and type of questions are appropriately assessing a spectrum of heterogeneous symptoms and conditions while affording an appreciation of the unique and shared variance of these many symptoms. These analyses also identified factors that may be used in more focused epidemiologic studies of specific exposure-outcome relationships.
The most significant factor in explaining the total variation in symptoms data was the "mental health" factor, which accounted for nearly 19 percent of the total variance. It is noteworthy that nearly all variables related to mental health outcomes loaded on a single factor, with several items from the PCL-C loading most significantly. This phenomenon persisted across multiple models with differing numbers of retained factors so that, from the perspective of factor analysis, the outcomes of depression, anxiety disorder, panic disorder, and PTSD do not represent distinct constructs in this general military population sample.
The fact that almost all the mental health symptoms loaded on a common factor can be interpreted from both a clinical and methodological framework. A clinical interpretation of this factor highlights the high degree of co-morbidity among mental health disorders. From a methodological perspective, however, these results also suggest inherent problems with the application of factor analysis across several survey instruments specific for individual clinical conditions. It is difficult to rule out the possibility that the structure of the survey influenced the factor analytic results, since many of the mental health questions are located adjacent to one another on the survey. It is possible that the factor structure reflects both underlying clinical phenomena as well as survey structure, since the factors appear to be organized according to both content and question sequence. Although exploratory factor analysis did not distinguish between many of the mental disorders assessed by the PHQ and the PCL-C, it did identify disordered eating symptoms as constituting a distinct construct (factor 4). This makes sense, given that the PHQ includes a specific disordered eating module and there is less overlap in these symptoms with symptoms of depression and anxiety disorders. "Depressed mood" (factor 12), involving items from two different instruments, also showed some degree of specificity as a distinct construct.
A number of symptoms suggestive of cardiovascular disease characterize factor 2, including chest pain and shortness of breath from the PHQ and SHQ. Overall the frequency of several of these symptoms was low, with the proportion of subjects bothered "a lot" by these symptoms being less than 2 percent as assessed by the PHQ. Most of the symptoms (chest pain, shortness of breath, fainting/dizziness, and heart pounding) loading on Factor 2 are well-recognized somatic symptoms that accompany an anxiety disorder . The likelihood of cardiovascular disease is low for several reasons, including the younger age distribution of this population, the fact that all had to pass the military induction physical in order to serve, and because all were fit enough to be on active duty in the military during initial sampling in October 2000. Muscle pain is not usually associated with anxiety disorder and this analysis suggests that it could be a manifestation of it or another condition associated with anxiety such as fibromyalgia .
Factor 3 comprises "persistent or recurring" symptoms reported on the SHQ commonly associated with viral and bacterial infections of the respiratory and gastrointestinal tract. Earlobe pain possibly was interpreted by respondents to mean earache, which is also frequently associated with respiratory infections due to otitis media or auditory tube dysfunction. The majority of these symptoms were reported infrequently (less than 10 percent of respondents). Recurring viral infections would still be very compatible with these symptom loadings as upper respiratory infections, such as the common cold, typically occur several times in any given year .
Factor 5, "vitality," loaded with symptoms from the SF-36V related to energy and mood. This factor also suggests both a clinical and methodological interpretation, since all four variables that characterize this factor occur in the same section of the survey instrument. However, factor 10, "fatigue," also related to energy level and loaded with items from several different sections of the survey. The fact that two factors, accounting cumulatively for 6.4 percent of the variance, related to energy level, vitality, and fatigue suggests that further research may be needed to understand the importance of these symptoms in military populations. Factor 11, "sleeping problems," also highlights this issue. Furthermore, cross-loadings between factors 10 and 11 could indicate underlying clinical sleep disorders in this population. Previous research has found that Cohort members report an adjusted average sleep time of 6.5 hours per night , which is slightly lower than most recommendations for optimum sleep duration. Over a prolonged period of time such sleep deficits could be manifested in fatigue and lack of energy, among other symptoms, and result in lasting effects on performance.
All five variables from the PHQ modules pertaining to alcohol abuse loaded significantly on factor 6, "problem drinking." The highest loading variables related to drinking and work, driving, or social interactions. The last variable, "drank despite doctor's warnings," had a more modest loading, which may reflect that problem drinking affects many domains before it is addressed by physicians.
Four variables from different instruments loaded on Factor 7. We named Factor 7 "aches and pains," because each of the four variables was designed to assess general myalgia. The four variables from the survey instrument included questions about experiencing bodily pain, pain associated with arms and legs, back pain, or unusual muscle pain. Muscle pain is a common symptom, especially in an active, athletic military population. However, general muscle pain can accompany many other illnesses, such as infectious diseases, autoimmune disorders, fibromyalgia, as well as other medical conditions, including comorbid psychiatric disorders. It is interesting to note that variables related to arthralgia or other joint-related pain did not load on Factor 7, nor did headache pain.
Factor 8 included variables linked to relationship and responsibility issues from one module of the PHQ. The five variables that loaded on Factor 8 include having difficulties with a spouse or partner, experiencing stress from taking care of family members, feeling as if there were no one to turn to, having little or no sexual desire/pleasure, and experiencing financial problems. With the exception of financial problems, each of the variables is related to human interaction and communication. However, underlying psychological issues and life stressors could also contribute to how these variables group.
The last two factors had the fewest number of significantly loading variables. Forgetfulness and confusion had significant loadings on Factor 13, "cognitive problems," and were notably grouped together in the survey (SHQ) following distinctly physical symptoms. Confusion also loaded equally on Factor 1, "mental health."
Factor 14 includes headache and severe headache variables and may reflect, at least in part, that headache can be a singularly incapacitating symptom. Severe headache items also loaded on Factor 3, "flu-like symptoms," but only weakly, and may be related to grouping with other variables in the survey instrument that loaded on that factor.
There are several significant limitations to this study. While invited participants were a random weighted sample of the US military, the study population may not be representative of the entire US military population. However, foundational investigations of potential biases in the Millennium Cohort have found the cohort to be representative, with participants who report data reliably [6, 9, 31, 36–41]. Although the Millennium Cohort Study is a longitudinal study, this exploratory analysis is based on a cross-sectional examination of the symptoms reported during a single follow-up period so that temporal associations cannot be established. Furthermore, this analysis does not address potentially significant associations between exposure and demographic variables with factor structure. Future investigations will examine the relationship of deployment histories and other exposure variables with covariance structure and factor scores associated with the current model. All symptoms and diagnoses included in this study are self-reported, and, therefore, are imperfect surrogates for clinical diagnoses [31, 36–38].
Despite these limitations, this study has a number of important strengths. To our knowledge, this is the first study to perform an exploratory factor analysis of this size in a large population-based cohort of US military personnel. The large sample size allowed for the inclusion of rare symptoms while minimizing the risk of biased correlation estimates . Factor analysis is an inherently subjective method, as different accepted criteria for model building may lead to disparate results. However, in order to examine the sensitivity of results to our methodological choices, analyses were repeated using multiple rotation procedures and applying several criteria for determining the number of factors. Although analyses were conducted using pairwise complete data, results from analyses repeated on a subset of the study population with complete data indicate that missing data did not influence our results.
An important finding of this study was that the majority of the factors appeared to load strongly based on how symptoms were grouped according to location on the survey. Item location, content, and response format are highly correlated with one another on the Millennium Cohort questionnaire, and this may have explained the factor loading that was observed. Thus, a major limitation of our study was that it was not able to differentiate the relative contributions of item content, location, and response format to factor loadings. This was particularly notable for mental health items, in which there was minimal ability to distinguish between individual mental disorders. This finding suggests that factor analysis may have major limitations when applied to surveys that contain several discrete validated instruments that use different response patterns and group questions according to diagnosis or co-locate questions pertaining to each domain as part of a larger survey. Further research is needed to determine how best to apply factor analysis across multiple illness domains. For example, surveys that randomly allocate symptom items across the survey and standardize response patterns could be compared with traditional surveys that include discrete disease-specific modules.
Understanding the full spectrum of symptoms and illness in a population includes investigating the interrelation of many comorbidities. Exploratory factor analysis is one way to study many symptoms and health outcomes comprehensively and to develop insight into the interrelations of symptom and outcome complexes that should be considered for future study. This study demonstrates a robust exploratory factor analysis including binary, ordinal, and some incomplete data to describe 14 factors accounting for 60 percent of the variance of 89 variables. This study also highlighted a complex set of constructs included in the survey instrument, a reasonable amount of overlap of the constructs, and assured us that the number and type of questions were appropriately assessing a spectrum of heterogeneous symptoms. Results further suggest that additional research is needed to investigate the relationship between factor analytic results and survey structure. Future research may also include the longitudinal examination of stable and evolving comorbidity structures and their relationship with self-reported exposures and health behaviors, as well as demographic and military-specific characteristics.