Psychometric properties of the IDS-SR30 for the assessment of depressive symptoms in spanish population

Background Due to the high prevalence of depression, it is clinically relevant to improve the early identification and assessment of depressive episodes. The main objective of the present study was to examine the psychometric properties of the IDS-SR30 (Self-rated Inventory of Depressive Symptomatology) in a large Spanish sample of depressive patients. Methods This prospective, naturalistic, multicenter, nationwide epidemiological study conducted in Spain included 1595 adult patients (65.3% females) with a DSM-IV Major Depressive Disorder (MDD. IDS-SR30 and the Hamilton Depression Rating Scale (HDRS, 21 items)were administered to the sample. Data was collected during 2 routine visits. The second assessment was carried out after 10 ± 2 weeks after first assessment. Results The IDS-SR30 showed good internal consistency (α = 0.94) and high item total correlations (≥ 0.50) were found in 70% of the items. The convergent validity was 0.85. Results of the principal component analysis (PCA) and confirmatory factor analyses (CFA) showed that a three factor model (labelled mood/cognition, anxiety/somatic and sleep) is adequate for the current sample. Conclusions The Spanish version of the IDS-SR30 seems a reliable, valid and useful tool for measuring depression symptomatology in Spanish population.


Background
Depression is one of the most common diagnosis, with nearly 17% of the adult population in the community meeting criteria for major depressive disorder (MDD) during their lifetime [1] and approximately 7% experiencing MDD during a 12-month period [2]. Among primary care patients depression rates are higher due to the association of chronic disorders [3] Recent epidemiological studies in Spain show prevalences in primary care settings that range from 9 to 29% [4][5][6].
Due to the high prevalence of this disorder, it is clinically relevant to improve the early identification and assessment of depressive episodes. Thus reliable, valid and brief instruments for the screening of depression are highly required [7].
There are several accepted clinician-rated and patient self-reported measures of depressive symptoms. The most commonly clinician-rated scales used both in the clinical and research context are the 17, 21, 24, 28, and 31-item versions of the Hamilton Rating Scale for Depression (HDRS) [8], and the Montgomery-Asberg Scale (MADRS) [9]. Among the most frequently used self-reports we find the 13 and 21-item version of the Beck Depression Inventory (BDI) [10], the Zung Depression Rating Scale [11], The Carroll Rating Scale (CRS) [12], and the Patient Health Questionnaire-9 (PHQ-9) [13].
New scales for the assessment of depressive symptoms are continuously designed in order to improve the psychometric deficits of the existing instruments. Despite the increasing availability of these new tools, the Institut Universitari d'Investigació en Ciències de la Salut (IUNICS), University of Balearic Islands, Cra. de Valldemossa (km 7,5), Palma de Mallorca (07122), Spain Full list of author information is available at the end of the article Hamilton Rating Scale for Depression (HRSD) -first published back in 1960 [8], remains the most used instrument in clinical settings. However, some authors have criticized the incomplete assessment of depressive symptoms as well as the psychometric deficits of this widely used scale [14,15].
The Inventory of Depressive Symptomatology (IDS) [14,16] is an assessment tool that can be used to screen or assess the severity of depression and is widely used in large national and international multicentre studies and clinical trials [17][18][19][20][21]. The time frame for assessing symptom severity is usually a seven-day period prior to the evaluation. The IDS is available in two versions: a clinician-rated (IDS-C) and a self-report (IDS-SR) scale. Both versions require minimal training and are sensitive to changes with medication, psychotherapy, or somatic treatments, making them useful for both research and clinical purposes. These scales have been translated into different languages (German, French, Italian, Chinese...) and a psychometric validation have been published in some of these versions [14,[22][23][24][25][26].
The IDS was developed to provide equivalent weightings (0-3) for each symptom item, clearly stated anchors that estimate the frequency and severity of symptoms (including all the DSM-IV criteria required to diagnose a major depressive episode) as well as matched clinician and patient ratings [14,16,27,24]. A 16-item version was later developed and have been examined in adult and young population [28,29].
The self-rated IDS (IDS-SR 30 ), the IDS clinician version (IDS-C), and the Quick Inventory of Depressive Symptomatology (QIDS) (both clinician and self-rated) include items that rate the nine symptoms domain used to define a major depressive episode (DSM-IV criteria). The IDS includes additional items to define melancholic and atypical symptom features, as well as commonly symptoms associated to depression (irritability, anxiety).
During the development of the original version, the item-selection of the IDS was aligned with the DSM criteria and other existing depression scales. Furthermore, symptoms of the anxious and melancholic subtypes of depression and other atypical symptoms were included. The selection process was supported by clinical experts and by patients [16]. In its final form, the IDS is composed by 30 items and the total score range from 0 to 84. All items are rated on a scale from 0 (symptom is not present) to 3 (strongest impairment). In the selfrated version (IDS-SR) a cut-off-point of 18 or above indicates the presence of clinical relevant depressive symptomatology [14].
There is published evidence of acceptable psychometric properties for the IDS in the evaluation of depressive outpatients [14,27,24] and inpatients [22]. There is also a substantial correlation between total scores of the IDS-C 30 , the IDS-SR 30 and the HDRS 17 . The IDS scores have been useful to differentiate endogenous from non-endogenous depressions [30] and dysthymic disorder from major depressive disorder [16].
The aim of this study is to assess the psychometric properties of the Spanish version of the IDS (self-report) using the Hamilton Depression Rating Scale (HDRS) as a gold standard to detect the presence of a single or recurrent major depressive episode and to determine depressive symptoms severity.

Study design, sampling and recruitment
In the present study we utilized the RESIST data set. The RESIST was a prospective, naturalistic, multicenter, nationwide epidemiological study. The main objective of the RESIST study was to compare residual symptoms between early and late remission in a large sample of depressive psychiatric outpatients in daily routine practice [31]. A regionally stratified sample of 400 psychiatrists was selected to participate in the study. They were proportionally distributed by regions within Spain's 17 regional communities. Each participant was asked to recruit 4 or 5 eligible outpatients. Patients were eligible for the study if they were 18 years or older, they met MDD diagnoses according to DSM-IV criteria and they had 6-8 weeks of antidepressant treatment.

Participants
A total of 1870 patients were initially recruited. 275 patients were excluded in the final analysis due to different reasons: change of treatment (9.1%), patients without second assessment (3.6%) incomplete or missing data (1.9%). Finally 1595 patients were included in the analysis.

Procedure
Data collection took place from February to June 2009 after receiving the approval of the Clinical Research Ethics Committee of the Teknon Foundation. Data was collected during 2 routine visits after obtaining written consent from patients. As the RESIST study was originally designed to evaluate the residual symptoms of depression, it was required that patients had been treated with antidepressant at least 6 weeks to have depressive patients in remission at first assessment. Assessments were carried out after 6-8 weeks of antidepressant treatment and at 10 ± 2 weeks after first assessment. Treatment with any antidepressant was allowed by the study protocol. The antidepressant prescribed and the dose was entirely at the discretion of the psychiatrist. Change of treatment resulted in exclusion from the study. Interviews were done by clinical psychiatrists. Self-reports were completed on the same day during the clinical interview.

Instruments
Case report form at first assessment was fulfilled by the psychiatrist and included the criteria of DSM-IV for major depressive episode that patients should meet. It included also collection of data about sociodemographic characteristics (age, gender, marital status, employment status, educational level, residence environment), and clinical features of the depressive disorder (age of first depressive episode, duration of episode, number of previous episodes, DSM-IV-TR comorbid psychiatric diagnoses and comorbid medical diseases).
The Spanish translation of the Self-rated Inventory of Depressive Symptomatology (IDS-SR 30 ) [14] was obtained from the authors and available on the website http://www.ids-qids.org. There was a South American version that we adapted to the current use of the language in Spain.
The 21 item HDRS (Hamilton Depression Rating Scale) [8]: is a 21 question multiple choice questionnaire which rates the severity of symptoms observed in depression such as low mood, agitation, anxiety and weight loss. The clinician must choose the possible responses to each question by interviewing the patient and by observing the patient's symptoms. Each item has between 3-5 possible responses which increase in severity. The first 17 questions contribute to the total score and question 18-21 are recorded to give further information about depression such as if paranoid symptoms are present. The HDRS was used as a gold standard.

Statistical analyses
The Statistical Package for Social Sciences (SPSS) version 19.0 and the Mplus v5.1 were used to carry-out the statistical analyses.
Factor analyses. We made use of first assessment IDS-SR 30 scores for a principal component analysis (PCA). Then, a set of confirmatory factor analyses (CFAs) based on the exploratory result and on the scientific literature were performed on a new data set (second assessment scores) to find the best fitting factor structure for the instrument. Following the common assumption that the ratio of subjects per variable (item) is central for factor analysis, in both study periods we were able to satisfy the minimum of five participantsper-item ratio recommended by Kass and Tinsley [32].
Firstly, in order to make our results in the PCA comparable with those recently reported by Wardenaar [33], we performed a Horn's parallel analysis [34] to determine the number of factors to be retained. This is a Monte-Carlo based simulation method, considered more replicable than the traditional extraction techniques (Kaiser's criterion or Scree plot), that compares the unrotated eigenvalues to eigenvalues from a random sample with the same number of cases and variables. PROMAX was used for oblique rotation. To assess the suitability of the data for factor analysis, the Kaiser-Mayer-Olkin's (KMO) Measure of Sampling Adequacy [35] was computed. KMO scores above 0.90 are considered excellent. The Bartlett's test of Sphericity [36] was also applied to examine the extent to which the correlation matrices departed from orthogonality.
In ordinal items with a non-normal distribution, as the ones in the inventory, it may be expected that the covariances underestimate the true amount of relations among variables. Therefore, we proceeded to estimate the models from the matrix of polychoric correlations [38]. The Mean and Variance corrected Weighted Least Squares (WLSMV) was applied to test the fit of the factor models.
Although a model with a non significant chi-square estimate is generally considered a good fitting model, Hu and Bentler recommended combinational rules to evaluate model fit [39]. Therefore, the following indices were analysed (values in parentheses denote goodnessof-fit standards): the Tucker-Lewis, non-normed fit index (TLI ≥ 0.95), the comparative fit index (CFI ≥ 0.95), and the root means square error of approximation (RMSEA ≤ 0.08). We also report the Akaike's Information Criterion (AIC), a relative fit index especially designed for comparing alternative factor models. The model with the smallest AIC value has the best fit. Used together, these indices provide a more conservative and reliable evaluation of the solution. The goodness of fit index (GFI), the adjusted goodness of fit index (AGFI), the relative fit index (RFI), and the normed fit index (NFI) were not used because these indices would likely be affected by the large sample size. Model comparisons were performed based on a practical improvement in model-fit approach (TLI difference ≥ 0.01) [40,41].
Internal consistency. Stratified Cronbach's coefficient [42] was used to assess the reliability in the presence of subscales. Subscale internal consistency estimations were conducted on the best-fit model observed following CFA.
Construct validity. Pearson product-moment correlations computed between the IDS-SR 30 factors and the HDRS. We took into account the Cohen's criteria [43] to evaluate the substantive significance of correlations (large correlations are those > 0.50, medium correlations are from 0.30 to 0.49, and small correlations are from 0.10 to 0.29).

Results
Sociodemographic and clinical characteristics of the total sample are presented in table 1. The final sample comprised 1595 patients, 553 men (34.6%) and 1042 women (65.3%) with a mean age of 47.7 years (Range 18-88). Most participants were married (61%), employed (45%), and living in urban residence (72%). The mean IDS-SR 30 total score for the full sample was 34.85 (SD = 15) at first assessment and 15.34 (SD = 11) at second assessment. A paired T test indicated that the difference was statistically significant (t(1593) = 53.98, p < 0.001). Descriptive statistics were computed for all items as shown in Table 2. Each item was examined in terms of mean, standard deviation, skewness, and kurtosis. Univariate values approaching at least 2.0 for skewness and 7.0 for kurtosis indicate marked nonnormality [44]. Therefore, item 4 at first assessment and items 3, 4, 13/ 14, 18, and 27 at second assessment had questionable normality based on their skewness values, whereas Item 4 at first assessment and items 4 and 27 at second assessment had questionable normality based on their kurtosis values. With the exception of item 4, all items obtained a corrected item-total correlation that was higher than the rule of thumb minimum value of 0.20 [45].

Principal component analysis (PCA)
The KMO measure produced a coefficient of 0.96, indicative of excellent sampling adequacy. Bartlett's test of Sphericity produced a figure of 21669.61 (P < 0.0001), indicating that the correlation matrix was unlikely to be an identity matrix and was therefore suitable for factor analysis. According to the Horn's parallel analysis three components were extracted. Factor loadings (after rotation) are displayed in Table 3. Tabachnick and Fidell [46] suggest that, in exploratory factor analysis, an item forms a part of a factor if its factor loading on that factor is at least 0.32 and at least 0.10 greater than its other factor loadings. Three items (items 3, 9, and 29) did not meet these criteria. Given that it is unlikely that the exclusion of the three items would yield a significant improvement in model fit in the subsequent CFAs, all further analyses were carried out on the IDS-SR 30 . The present three-factor solution (labelled Mood/Cognition, Anxiety/Somatic, and Sleep), which accounted for 49.87% of the variance, was more similar to the threefactor structure reported by Rush [14] than to the threefactor structure recently reported by Wardenaar [33]: there were divergences (different primary loading) in 5 items (items 3, 15, 23, 29, and 30) and 8 items (items 3, 6, 7, 11/12, 13/14, 24, 29, and 30), respectively.

Confirmatory factor analyses (CFAs)
Fit statistics for the factor models are shown in Table 4. Although all models showed RMSEA values that are considered acceptable (< 0.08), the one-factor and twofactor solutions failed to provide good fit taking the other indices into account. Considering the findings collectively, all three-factor models seem adequate for the current sample. However, as the Hu and Bentler's guidelines for retaining a hypothesized model recommended, the factor structure posited by Wardenaar [33]

Internal consistency
Prior to examining the internal consistency and convergent validity of the IDS-SR 30 , the three items that had cross-loadings (9, 24, and 30) were assigned to the subscale with which they had the highest factor loading on the CFA to compute the subscale scores. Thus, items 9 and 30 were assigned to the Mood/Cognition sub-scale,   whereas item 24 was assigned to the Anxiety/Somatic sub-scale. The results of the reliability analysis are shown in table 5. The stratified overall Cronbach's coefficient was 0.94 at first and second assessment. Cronbach's α coefficient ranged from 0.54 to 0.93 in the three subscales derived from the Wardenaar's factor model. If item 4 was deleted, the Cronbach's α of the Sleep sub-scale would be 0.73 and 0.64 at first assessment and second assessment, respectively.

Convergent validity
The IDS-SR 30 would show adequate convergent validity if the total and subscale scores correlated significantly with the HDRS. As can be seen in Table 6, the correlations between Mood/Cognition, Anxiety/Somatic, Sleep, the IDS-SR 30 total score and the HDRS were all significant and large at both assessments.

Discussion
The main conclusion of our study is that the Spanish version of the IDS-SR has good psychometric properties and it is a useful tool for evaluating depressive symptomatology in Spanish population.
The rationale for and psychometric properties of the IDS (Clinician and Self-rated version) have been previously discussed [14,16]. Both versions of the scale attempt to address limitations of the HDRS and MADRS which do not cover all of the diagnostic symptoms criteria for Major Depressive Disorder (MDD) or depressive subtypes (e.g. melancholic or atypical features).
Good evidence of the correspondence between individual items and the total IDS-SR 30 score was found. All items are consistent with the scale except for item that measures hypersomnia that showed no correlation with the total score. Internal consistency of the Spanish-language version of the IDS-SR 30 was good (α = 0.94). Other adaptations and validity studies confirmed the high consistency of this instrument: (α = 0.77) [14], (α = 0.79) [22], (0.83) [46], (α = 0.92) [24], (α = 0.93) [25]. In our study, the correlation between the IDS-SR (total and sub-scale scores) and the HDRS at both assessments were strong (0.85). These results were similar compared to previous findings [14,16,47]. This indicates a good convergent validity. The present study used CFA to identify the best-fitting structural model of the IDS-SR. The unidimensional structure was not supported by exploratory or confirmatory analisys. Although an initial PCA indicted that our 3-factor solution was more similar to the 3-factor structure reported by Rush [14] the CFA identifies the Wardenaar factor structure [33] as the best fitting model of the IDS-SR 30 .
Our results have some implications. On the one hand IDS-SR 30 do not seem to be a unidimensional measure of depression severity and, on the other hand those items that measures atypical features (hypersomnia, appetite and weight increase, interperpersonal sensitivity and leaden paralysis) and some of the items that feature endogenous/melancholic depression (diurnal mood variation, appetite and weight decrease) seems to be the more psychometrically problematic items. More analysis to find out a different and more complete factorial structure that includes depressive subtypes is needed. The heterogeneity of patients suffering from depressive disorders adds complexity to the study and design of good and well suited instruments for measure depressive symptomatology.
A strength of the present study is the large and representative sample which makes the results generalizable to depressive Spanish outpatients. Second, all the analyses were conducted in patients that met MDD criteria and 6-8 weeks of antidepressant treatment and allows us to have all the severity degrees represented. Third, a 3-factor model was displayed in the statistical analysis, as recommended by the authors of the original version [14]. This fact allows the comparison of the results from similar studies. The main limitation of this study is that we did not administered the questionnaire in primary care. Further analysis in several settings should reveal whether our results could be generalized to depressive inpatients or primary care settings.

Conclusions
In conclusion, our findings from this first validation study indicate that the Spanish version of the IDS-SR have highly acceptable psychometric properties, corroborating results of the original English version [14,48] and other studies and translations. The IDS-SR 30 is a

Conflict of interests
The RESIST project was possible due to an unrestricted grant from Almirall Spain. The sponsor had no role in the design and conduct of the study, analysis, writing of the report or in the decision to submit the paper for publication. MR has received grant/support research from Almirall. SA is employee of Almirall Medical Department. The rest of the authors report no competing interest.