Psychometric analysis of the brief symptom inventory 18 (BSI-18) in a representative German sample

Background The BSI-18 contains the three six-item scales somatization, depression, and anxiety as well as the Global Severity Index (GSI), including all 18 items. The BSI-18 is the latest and shortest of the multidimensional versions of the Symptom-Checklist 90-R, but its psychometric properties have not been sufficiently clarified yet. Methods Based on a representative sample of N = 2516 participants (aged 14–94 years), detailed psychometric analyses were carried out. Results The internal consistency was good: Somatization α = .82, Depression α = .87, Anxiety α = .84 and GSI α = .93. Confirmatory factor analysis supported the three scales as second-order and GSI as first-order factors. The model fit based on RMSEA is good but that model fit based on CFI and TLI are too low. Conclusions Therefore, it is a very short, reliable instrument for the assessment of psychological distress. The BSI-18 can be used to reliably assess psychological distress in the general population. However, further studies need to evaluate the usefulness of standardization in clinical samples.

The Brief Symptom Inventory with 53 items was developed by Derogatis using a factor analysis and maintaining the scale structure with the reduced item number of the SCL-90-R (somatization, obsessive-compulsive, interpersonal sensitivity, depression, anxiety, anger-hostility, phobic anxiety paranoid ideation and psychoticism). In Germany, the BSI is mainly used for quality management in psychotherapy (e.g. [28]).
In order to reduce and prevent an overload to the patients and to ensure an easy screening-tool, the BSI-18 was developed with highest clinical relevance. The BSI-18 contains only the three six-item scales somatization (SOMA), anxiety (ANX), depression (DEPR), and the global Scale Global Severity Index (GSI). (They are documented in Table 1). Contrary to the SCL-90-R and the BSI-53, the BSI-18 scores were calculated by sum scores. The GSI therefore ranges between 0 -72 and the three scales between 0 -24. The application studies demonstrated that the BSI-18 is a suitable instrument for measuring psychological distress and comorbidities in patients with different mental and somatic illnesses (e.g. [1,4,8,9,10,29,38,39,46,48]). This instrument is also used in longitudinal studies [5,6,37].
Until now, there have only been three studies which address the applicability and psychometric properties of the German version of the BSI-18 in patients after renal transplantation [26] and in hospitalized psychosomatic patients [25,49].
As yet, psychometric properties based on a representative sample are still not available for Germany. Therefore, the aim of this study was to (1) describe the psychological distress within the German population, to present (2) the reliability, and (3) the factorial validity.

Data acquisition
A representative sample of the general population in Germany was collected in November/ December 2009 by a demography-consulting company (USUMA, Berlin). A total of 258 sample points were used (210 in the western part and 48 in the eastern part of Germany). The households and members of these households were selected via random-route procedure. The sample was representative for the German population regarding age, gender, and education as proved by comparisons with the Federal Statistical Office. To begin with, 4091 addresses were selected; 22% had to be dropped as neutral (e.g. persons unknown), and 38% could not be asked (e.g., due to illness, holidays, refusal, nonavailability). In the end, a total of 2520 persons could be included in the sample.

Psychological assessments
Demographic information, the BSI-18, and further psychological assessments were collected in the survey. To investigate validity evidence based on external criteria, the 4-item version of the Patient Health Questionnaire was used to screen for depression and anxiety (PHQ-4; [32][33][34]). All the questions apply to the two preceding weeks and are to be rated by using "0 = not at all", "1 = several days", "2 = more than half the days" and "3 = nearly every day". For statistical calculations, the answer category "0" was to be opposed to the other three categories.

Statistics
The analyses were carried out using PASW and AMOS. First, a Missing Data Analysis led to the exclusion of four participants because they showed more than the tolerated amount of missing data (tolerated < 1 items of each scale, < 3 items in total). At last, a total of 0.09% of the answers were missing and not assigned randomly (Little MCAR-Test: Chi-Quadrat = 550.971, df = 333, p < .0001). Therefore they were replaced by using Multiple Imputation (MCMC in LISREL 8.15; [35]).
Descriptive statistics, reliability as well as discriminant and convergent correlations were estimated. Construct validity was tested by using the confirmatory factor analysis (CFA).
Using AMOS [31], the respective fit of the two-factor and the three-factor model was tested using CFAs. Due to the lack of multivariate normality in the data tested with the Marida-test in AMOS, the Asymptotically Distribution Free-estimator (ADF) was used for model testing [7]. According to Schermelleh-Engel, Moosbrugger, and Müller [47], a good (acceptable) model fit is a given with SB χ 2 /df index below 2.0 (below 3.0), Comparative Fit Index (CFI) as well as Tucker-Lewis-Index (TLI) above .95 (above .90), Standardized Root Mean Square Residual (SRMR) below .05 (below .10), and Root Mean Square Error of Approximation (RMSEA) below .05 (below .08).

Psychological distress, reliability, and convergent validity of the scales
The mean values of the 18 items and sum scores of the three scales and the GSI had a left-skewed distribution (see Table 1), Table 2 reported gender-and age differences. Internal consistency was α = .82 for SOM, α = .87 for DEPR, α = .84 for ANX and α = .93 for the GSI. The corrected discriminatory power was only below .50 for item no. 7 (nausea or upset stomach). Furthermore, the elimination of item no. 17 (thoughts of ending your life) would increase the reliability of the scale DEPR. The Depression scale of the PHQ correlated the highest with DEPR (r = .72), followed by substantial correlations with GSI (r = .71), ANX (r = .63), and the lowest with SOM (r = .52). The Anxiety scale of the PHQ correlated quite equal with GSI (r = .73), ANX (r = .72), and DEPR (r = .71), but the lowest with SOM (r = .48).

Factorial validity
The confirmatory factor analysis was used to prove the theoretical and empirical structure of the BSI-18. Due to the lack of multivariate normality in the data tested with the Marida-test in AMOS, the Asymptotically Distribution Free-estimator (ADF) was used for model testing. The three factor model (SOMA, DEPR and ANX) resulted in χ 2 = 355, df = 132, p < 0.001; RMSEA = .030 [.02 -.03]; TLI = .48; CFI = .55 (see Table 1).
Two different models were tested by using the ADFmethod and the software AMOS; model modifications were not tolerated: the theoretical one-factor-model (χ

Discussion
Up to now, the BSI-18 has not been used widely in Germany. The psychometric properties and benefits of the instrument were investigated in three samples [25,26,49]. For the present representative sample, the questions concerning reliability and model fit could be answered.
The reliability (Cronbach's α) of the BSI-18 (α-SOMA = .82, α-DEPR = .87, α-ANX = .84, α-GSI = .93) was good to very good and ranged higher than in the US standardization. The reliability of the American norm sample (N = 1134; α-SOMA = .74, α-DEPR = .84, α-ANGS = .79, α -GSI = .89; [14]) had to be rated as satisfactory. Therefore, it can be concluded that the internal consistency of the scales can be affected by a sufficient sample procedure [41]. The internal consistency of the scale Depression could be increased by eliminating item 17 (thoughts of ending your life). This result is similar to that of other samples, but due to the clinical relevance the item should be retained.
Using the two-item scales Depression and Anxiety of the PHQ-4 [30], to analyze convergent validity, the results were quite similar to the results by Spitzer et al. [49] using a longer PHQ-version. On the one hand, corresponding BSI-18-and PHQ-subscales demonstrated highest correlations; on the other hand, the Anxiety scale of PHQ-4 correlated similarly with BSI-18-Anxiety and BSI-18-Depression. Non-corresponding scales like the BSI-18-SOMA showed lower correlations. The results by Spitzer et al. [49] and our own results were found in nonclinical samples. Regarding clinical data [25,26], it could be concluded that the BSI-18 is more suitable to psychologically distressed than non-distressed populations.
Congruent with international [27,40,42,50] and German clinical studies [25,26] the three scales of the BSI-18 showed the best model fits by reproducing the scale structure using the confirmatory factor analysis. The remarkable strength of the present sample is its good age distribution due to representative sampling: − young (n = 270, aged 14 -24), elderly (n = 440, aged 65 -74), and old age (n = 252, aged 75 -94). Besides the strength of a large sample size as a limitation, it is not possible to draw general conclusions based on the data from a representative sample since a large sample size could easily lead to significant effects. Since the sample was representative for the normal population, the results are not offhandedly applicable to highly distressed samples [15]. In turn, the BSI-18 should be applied to different clinical samples to further replicate or reprobate the factorial structure.
In future research it would be productive to test the stability of the distress construct (test-retest reliability) and to explore connections to other distress questionnaires (convergent validity) or external ratings (criterion validity) [44]. A design with repeated measurements would allow for the comparison of factor structures across time and the determination of possible cohort effects.
The available version of the used software to measure the factor analysis with categorical indicators was applied. This should be seen as a limitation of this study and advice for future research.

Conclusion
The BSI-18 is a very short, reliable instrument for the assessment of psychological distress. The factorial structure of the instrument is very good when using confirmatory factor analyses as well as the psychometric criteria. Therefore, it is an instrument that can be used to reliably assess psychological distress in clinical samples as well as in the general population. In addition, it can be used in psychotherapy research as well as in quality assurance for psychotherapeutic long-term effects. Taking into account the good internal consistency reliability estimates and the encouraging convergent validity estimates, this preliminary validation is a good step forward in validation studies which are iterative in nature.