Skip to main content

Measurement properties of translated versions of neck-specific questionnaires: a systematic review

Abstract

Background

Several disease-specific questionnaires to measure pain and disability in patients with neck pain have been translated. However, a simple translation of the original version doesn't guarantee similar measurement properties. The objective of this study is to critically appraise the quality of the translation process, cross-cultural validation and the measurement properties of translated versions of neck-specific questionnaires.

Methods

Bibliographic databases were searched for articles concerning the translation or evaluation of the measurement properties of a translated version of a neck-specific questionnaire. The methodological quality of the selected studies and the results of the measurement properties were critically appraised and rated using the COSMIN checklist and criteria for measurement properties.

Results

The search strategy resulted in a total of 3641 unique hits, of which 27 articles, evaluating 6 different questionnaires in 15 different languages, were included in this study. Generally the methodological quality of the translation process is poor and none of the included studies performed a cross-cultural adaptation. A substantial amount of information regarding the measurement properties of translated versions of the different neck-specific questionnaires is lacking. Moreover, the evidence for the quality of measurement properties of the translated versions is mostly limited or assessed in studies of poor methodological quality.

Conclusions

Until results from high quality studies are available, we advise to use the Catalan, Dutch, English, Iranian, Korean, Spanish and Turkish version of the NDI, the Chinese version of the NPQ, and the Finnish, German and Italian version of the NPDS. The Greek NDI needs cross-cultural validation and there is no methodologically sound information for the Swedish NDI. For all other languages we advise to translate the original version of the NDI.

Peer Review reports

Background

Several disease-specific questionnaires have been developed to measure pain and disability in patients with neck pain (e.g. Neck Disability Index (NDI), Neck Pain and Disability Scale (NPDS)) [1, 2]. To make them suitable for use in other languages, several of these neck-specific questionnaires have been translated. However, a simple translation of the original version doesn't guarantee similar measurement properties, because differences in cultural context have to be taken into account as well [3, 4].

Previous reviews of neck-specific questionnaires have not paid sufficient attention to possible differences in performance, caused by differences in cultural context, and combine the results of studies that evaluate measurement properties of different language versions of the same questionnaire [5, 6]. This may lead to inconsistent results for measurement properties, as was demonstrated in a recent review of the cross-cultural adaptations of the McGill Pain Questionnaire [7].

Since it is possible that the measurement properties of neck-specific questionnaires vary between different nationalities, we decided to evaluate them per language. This reduces inconsistency in results due to cultural differences and also facilitates a choice for the best questionnaire per language. The measurement properties of original versions of the different neck-specific questionnaires were evaluated in a separate systematic review. (Schellingerhout JM, Heymans MW, Verhagen AP, De Vet HC, Koes BW, Terwee CB: Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review, submitted)

The purpose of this study is to critically appraise the quality of the translation process, cross-cultural validation and the measurement properties of translated versions of neck-specific questionnaires.

Methods

Search strategy

We searched the following computerised bibliographic databases: Medline (1966 to July 2010), EMbase (1974 to July 2010), CINAHL (1981 to July 2010), and PsycINFO (1806 to July 2010). We used the index terms "neck", "neck pain", and "neck injuries/injury" in combination with "research measurement", "questionnaire", "outcome assessment", "psychometry", "reliability", "validity", and derivatives of these terms. The full search strategy used in each database is available upon request from the corresponding author. Reference lists were screened to identify additional relevant studies.

Selection criteria

A study was included if it was a full text original article (e.g. not an abstract, review or editorial), published in English, concerning the translation or evaluation of the measurement properties of a translated version of a neck-specific questionnaire. The questionnaire had to be self-reported, evaluating pain and/or disability, and specifically developed or adapted for patients with neck pain.

For inclusion, neck pain had to be the main complaint of the study population. Accompanying complaints (e.g. low back pain or shoulder pain) were no reason for exclusion, as long as the main focus was neck pain. Studies considering study populations with a specific neck disorder (e.g. neurological disorder, rheumatological disorder, malignancy, infection, or fracture) were excluded, except for patients with cervical radiculopathy or whiplash associated disorder (WAD).

Two reviewers (JMS, APV) independently assessed the titles, abstracts, and reference lists of studies retrieved by the literature search. In case of disagreement between the two reviewers, there was discussion to reach consensus. If necessary, a third reviewer (HCV) made the decision regarding inclusion of the article.

Measurement properties

The measurement properties are divided over three domains: reliability, validity, and responsiveness [8]. In addition, the interpretability is described.

Reliability

Reliability is defined as the extent to which scores for patients who have not changed are the same for repeated measurement under several conditions: e.g. using different sets of items from the same questionnaire (internal consistency); over time (test-retest); by different persons on the same occasion (inter-rater); or by the same persons on different occasions (intra-rater) [8].

Reliability contains the following measurement properties:

  • Internal consistency: The interrelatedness among the items in a questionnaire, expressed by Cronbach's α or Kuder-Richardson Formula 20 (KR-20) [8, 9].

  • Measurement error: The systematic and random error of a patient's score that is not attributed to true changes in the construct to be measured, expressed by the standard error of measurement (SEM) [8, 10]. The SEM can be converted into the smallest detectable change (SDC) [10]. Changes exceeding the SDC can be labeled as change beyond measurement error [10]. Another approach is to calculate the limits of agreement (LoA) [11]. For determining the adequacy of measurement error the SDC and/or LoA is related to the minimal important change (MIC) [12].

  • Reliability: The proportion of the total variance in the measurements which is due to 'true' differences between patients [8]. This aspect is reflected by the Intraclass Correlation Coefficient (ICC) or Cohen's Kappa [8, 13].

Validity

Validity is the extent to which a questionnaire measures the construct it is supposed to measure and contains the following measurement properties [8]:

  • Content validity: The degree to which the content of a questionnaire is an adequate reflection of the construct to be measured [8]. Important aspects are whether all items are relevant for the construct, aim, and target population and if no important items are missing (comprehensiveness) [14].

  • Criterion validity: The extent to which scores on an instrument are an adequate reflection of a gold standard [8]. Since a real gold standard for health status questionnaires is not available, [14] we will not evaluate criterion validity.

  • Construct validity is divided into three aspects:

    • Cross-cultural validity: The degree to which the performance of the items on a translated or culturally adapted instrument are an adequate reflection of the performance of the items of the original version of the instrument [8]. This is assessed by means of multi-group factor analysis or differential item functioning using data from a population that completed the questionnaire in the original language, as well as data from a population that completed the questionnaire in the new language.

    • Structural validity: The degree to which the scores of an instrument are an adequate reflection of the dimensionality of the construct to be measured [8]. Factor analysis should be performed to confirm the number of subscales present in a questionnaire [14].

    • Hypothesis testing: The degree to which a particular measure relates to other measures in a way one would expect if it is validly measuring the supposed construct, i.e. in accordance with predefined hypotheses about the correlation or differences between the measures [8].

Responsiveness

Responsiveness is the ability of an instrument to detect change over time in the construct to be measured [8]. Responsiveness is considered an aspect of validity, in a longitudinal context [14]. Therefore, the same standards apply as for validity: the correlation between change scores of two measures should be in accordance with predefined hypotheses [14]. Another approach is to consider the measurement instrument as a diagnostic test to distinguish improved and non-improved patients. The responsiveness of the instrument is then expressed as the area under the receiver operator characteristic curve (AUC) [14].

Interpretability

Interpretability is the degree to which one can assign qualitative meaning to quantitative scores [8]. This means that investigators should provide information about clinically meaningful differences in scores between subgroups, floor and ceiling effects, and the MIC [14]. Interpretability is not a measurement property, but an important characteristic of a measurement instrument [8].

Quality assessment

Assessment of the methodological quality of the selected studies was carried out using the COSMIN checklist [9]. The COSMIN checklist consists of nine boxes with methodological standards for how each measurement property should be assessed. Each item was scored on a 4-point rating scale (i.e. "poor", "fair", "good", or "excellent", see http://www.cosmin.nl). An overall score for the methodological quality of a study was determined by taking the lowest rating of any of the items in a box. The methodological quality of a study was evaluated per measurement property. Special attention was paid to the methodological quality of the translation process and cross-cultural validation. The COSMIN box concerning this measurement property is presented in Table 1.

Table 1 Methodological criteria for the translation process and cross-cultural validation [9]

Data extraction and assessment of (methodological) quality were performed by two reviewers (JMS, CBT) independently. In case of disagreement between the two reviewers, there was discussion in order to reach consensus. If necessary, a third reviewer (HCV) made the decision.

Best evidence synthesis - levels of evidence

To determine the overall quality of the measurement properties of the different questionnaires we synthesized the different studies per language by combining their results, adjusted for methodological quality of the studies and the consistency of their results. The possible overall rating for a measurement property is "positive", "indeterminate", or "negative", accompanied by levels of evidence, similarly as was proposed by the Cochrane Back Review Group (see Table 2) [15, 16].

Table 2 Levels of evidence for the overall quality of the measurement property [16]

To assess whether the results of the measurement properties were positive, negative, or indeterminate, we used criteria based on Terwee et al. (see Table 3) [17].

Table 3 Quality criteria for measurement properties [Based on Terwee et al., [17]]

Results

The search strategy resulted in a total of 3641 unique hits, of which 119 articles were selected based on their title and abstract. The full text assessment resulted in exclusion of another 68 articles. Reference checking did not result in additional articles. Twenty-four articles concerned original versions of neck-specific questionnaires, which were evaluated in a separate systematic review. (Schellingerhout JM, Heymans MW, Verhagen AP, De Vet HC, Koes BW, Terwee CB: Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review, submitted) Finally, 27 articles on translated questionnaires, evaluating 6 different questionnaires in 15 different languages, were included in this study (see Figure 1).

Figure 1
figure 1

Flowchart search and selection.

The general characteristics of these studies are presented in Table 4. None of the included studies performed a cross-cultural validation (Table 1, items 14 and 15), i.e. no studies performed multi-group factor analysis or differential item functioning. Therefore, we were only able to rate the methodological quality of the translation process (Table 1, items 4-11). The methodological quality of the studies is presented in Table 5 for each measurement property, arranged per language. Generally the methodological quality of the studies was poor to fair. The synthesis of the results per questionnaire and their accompanying level of evidence is presented in Table 6 for each language. For each questionnaire, except for the Iranian NPDS and Spanish NDI, at least half of the information regarding measurement properties is lacking. Moreover, the evidence for the quality of measurement properties is mostly limited, due to methodological shortcomings of the included studies.

Table 4 General information per study
Table 5 Methodological quality of each study per measurement property
Table 6 Quality of the measurement properties per language and questionnaire

Below we will discuss the results for the different questionnaires per language. The results regarding measurement properties from studies of poor methodological quality are not mentioned [18–24].

Catalan

The NDI is the only neck-specific questionnaire that has been translated in Catalan [25]. The NDI was originally designed to measure activities of daily living (ADL) in patients with neck pain [1]. The methodological quality of the translation process is poor [25]. Confirmatory factor analysis showed that the NDI is not unidimensional and there is limited evidence that the NDI has a 2-factor structure [25]. Assuming a 2-factor structure, there is moderate positive evidence for internal consistency: Cronbach's α is 0.70 for "pain and interference with cognitive functioning" and 0.83 for "functional disability" [25]. There is a positive correlation (r = 0.51) between the NDI and the Pain Intensity Index [25].

The available evidence on measurement properties of the Catalan NDI is positive, despite the poor methodological quality of the translation process.

Chinese

The Northwick Park Neck Pain Questionnaire (NPQ) is the only neck-specific questionnaire that has been translated in Chinese [26–28]. The NPQ was originally designed to measure the influence of non-specific neck pain on daily activities [29]. The methodological quality of the translation process is poor [26].

There is strong positive evidence for the reliability of the NPQ (ICC = 0.95) [26]. Hypothesis testing resulted in moderate positive evidence for correlation between the NPQ and instruments measuring pain and physical functioning (r = 0.59-0.75) [26, 27]. Differences in score between subgroups have been reported (e.g. healthy persons vs. neck pain patients, and patients who sought medical consultation vs. those who did not) [26]. The average time needed to fill out the NPQ is 5.5 minutes [26].

The available information on measurement properties of the Chinese NPQ looks promising, despite the poor methodological quality of the translation process.

Dutch

The NDI, NPDS, and Neck Bournemouth Questionnaire (NBQ) have been translated in Dutch [19, 29–31]. The NPDS was originally designed to measure pain and disability in patients with neck pain [2]. The NBQ was originally designed to measure pain, physical functioning, social functioning, and psychological functioning in patients with non-specific neck pain [32]. The translation process of the NDI is not described, so the quality of this process is unknown. The methodological quality of the translation process of the NDPS is fair, [19] and of the NBQ is excellent [30].

There is limited positive evidence for the reliability of the NDI (ICC = 0.90), [31] and for responsiveness (sensitivity = 0.9 and specificity = 0.7 for a clinically important change of 3.5) [29]. There is limited negative evidence for its measurement error (MIC = 3.5 and SDC = 10.5 on a 0-50 scale) [29]. There is limited positive evidence for the reliability of the NBQ (ICC = 0.92) [30]. The result for measurement error of the NBQ is indeterminate, because the MIC is not defined [30]. No floor or ceiling effects have been detected for the NDI or NBQ, and for both questionnaires differences in score between subgroups have been reported (men vs. women) [30, 31].

The lack of information derived from these studies makes it difficult to point out the best available neck-specific questionnaire in Dutch. Based on the information available on the measurement properties of the original version of the NDI and NBQ, we advise to use the Dutch NDI. (Schellingerhout JM, Heymans MW, Verhagen AP, De Vet HC, Koes BW, Terwee CB: Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review, submitted)

English

The, originally Danish, Copenhagen Neck Functional Disability Scale (CNFDS) is the only neck-specific questionnaire that has been translated in English [33]. The CNFDS was originally designed to measure disability in patients with neck pain [34]. The translation process is not described, so the quality of this process is unknown. There is limited positive evidence for the responsiveness of the CNFDS (AUC = 0.73) [33]. Many neck-specific questionnaires have originally been developed in English. We advise to use one of these questionnaires, preferably the NDI. (Schellingerhout JM, Heymans MW, Verhagen AP, De Vet HC, Koes BW, Terwee CB: Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review, submitted)

Finnish

The NDI and NPDS have been translated in Finnish [35]. The methodological quality of the translation process of these questionnaires is poor [35].

There is moderate evidence that the NDI is not one-dimensional and that the NPDS has a 3-factor structure [35]. The result for internal consistency of the NDI is indeterminate, because the authors unjustly assume a 1-factor model [35]. There is strong positive evidence for the internal consistency of the NPDS (Cronbach α = 0.82-0.84) [35]. No floor or ceiling effects have been detected for the NDI or NPDS and for both questionnaires differences in score between subgroups have been reported (stable vs. improved patients) [35].

The available information suggests that the Finnish NPDS has better measurement properties than the Finnish NDI.

French

The following neck-specific questionnaires have been translated in French: NDI, [20, 36] NPDS, [20, 36] NBQ, [37] NPQ, [20, 36] and CNFDS [18]. The methodological quality of all these translation processes is poor [18, 36, 37].

There is limited evidence that the NDI has a 2-factor structure [20]. Hypothesis testing showed that the correlation of the NDI with an instrument measuring psychological functioning is somewhat higher (r = 0.55), than with instruments measuring pain (r = 0.48), and physical functioning (r = 0.50) [20]. There is limited evidence that the NPDS has a 3-factor structure [20]. Hypothesis testing showed a positive result for correlation of the NPDS with instruments measuring pain (r = 0.52), and physical functioning (r = 0.63), and a negative result (results slightly below the pre-set criterion of r = 0.5) for correlation with instruments measuring psychological functioning (r = 0.40-0.49) [20]. Hypothesis testing showed a positive result for correlation of the NBQ with an instrument measuring pain and physical functioning (r = 0.61-0.67), and a negative result for correlation with an instrument measuring psychological functioning (r = 0.17-0.25) [37]. There is limited negative evidence for the responsiveness of the NBQ (r = 0.42) [37]. There is limited evidence that the NPQ has a 2-factor structure [20]. Hypothesis testing showed a positive result for correlation of the NPQ with an instrument measuring physical functioning (r = 0.53), and a negative result for correlation with an instrument measuring pain (r = 0.43) [20].

No floor or ceiling effects have been detected for the NDI, NPDS, and NPQ [20, 36]. The average time needed to fill out the NDI, NPDS, and NPQ is 7.4, 6.4, and 7.2 minutes, respectively [36].

The lack of information derived from these studies makes it difficult to point out the best available neck-specific questionnaire in French. Based on the information available on the measurement properties of the original version of the NDI, NPDS, NBQ, NPQ, and CNFDS, we advise to develop a high quality translation of the NDI. (Schellingerhout JM, Heymans MW, Verhagen AP, De Vet HC, Koes BW, Terwee CB: Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review, submitted)

German

The NPDS is the only neck-specific questionnaire that has been translated in German [24, 38]. There are two translations of the NPDS in German: one translation process of poor and one of fair methodological quality [24, 38].

Factor analysis provided moderate evidence that the NPDS has a 3-factor structure [38]. The result for internal consistency is indeterminate, [38] because the authors unjustly assume a 1-factor model. There is moderate positive evidence for hypothesis testing (>75% of results in accordance with predefined hypotheses) [38]. No floor or ceiling effects have been detected for the NPDS [38].

The available information on measurement properties of the German NPDS looks promising, despite the poor methodological quality of the translation process.

Greek

The NDI is the only neck-specific questionnaire that has been translated in Greek [39]. The methodological quality of the translation process is good [39].

Exploratory factor analysis provided moderate evidence that the NDI does not have a 1-factor structure [39]. The result for internal consistency is indeterminate, [39] because the authors unjustly assume a 1-factor model. There is limited negative evidence for responsiveness (r = 0.30 with Global Rating of Change) [39].

Based on the good quality of the translation process and the negative results for unidimensionality and responsiveness, we advise to perform a cross-cultural validation of the Greek NDI.

Hindi

The NPDS is the only neck-specific questionnaire that has been translated in Hindi [40]. The methodological quality of the translation process is fair [40].

Hypothesis testing showed a positive result for correlation of the NPDS with an instrument measuring psychological functioning (r = 0.80), and a negative result for correlation with an instrument measuring pain (r = 0.30), and an instrument measuring physical functioning (r = 0.15). The average time needed to fill out the NPDS was 8 minutes [40].

Based on the information derived from this study, we advise to develop a high quality translation of the NDI.

Iranian

The NDI and NPDS have been translated in Iranian [41]. The methodological quality of the translations processes is excellent [41].

There is limited positive evidence for the internal consistency (Cronbach alpha = 0.88, assuming a 1-factor structure), reliability (ICC = 0.97), and responsiveness (r = 0.65 for physical functioning and r = 0.70 for pain) of the NDI [41]. Exploratory factor analysis resulted in limited positive evidence for a 4-factor structure of the NPDS [41]. There is limited positive evidence for internal consistency (Cronbach alpha = 0.75-0.94 for the four subscales), and reliability (ICC = 0.97) [41]. There is limited negative evidence for responsiveness of the NPDS, because correlation with change scores on instruments measuring the same constructs was lower than correlation with instruments measuring other constructs [41]. No floor or ceiling effects have been detected for the NDI or NPDS [41].

The Iranian NDI and NPDS both seem to have adequate measurement properties, but we advise using the NDI, based on the negative result for responsiveness of the NPDS and the good measurement properties of the original version of the NDI. (Schellingerhout JM, Heymans MW, Verhagen AP, De Vet HC, Koes BW, Terwee CB: Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review, submitted)

Italian

The NPDS is the only neck-specific questionnaire that has been translated in Italian [42]. The methodological quality of the translation process is poor [42].

There is limited evidence that the NPDS has a 3-factor structure (variance = 63%) [42]. A confirmatory analysis with 4 factors showed a small improvement in variance (67%) [42]. Assuming a 3-factor structure, there is limited positive evidence for internal consistency: Cronbach α was 0.92 for "neck dysfunction related to general activities", 0.86 for "cognitive-behavioral aspects", and 0.89 for "neck dysfunction related to activities of the cervical spine" [42]. There is limited positive evidence for the reliability of the NPDS (r = 0.89-0.93) [42]. The average time needed to fill out the NPDS is 7.5 minutes [42].

The available information on measurement properties of the Italian NPDS looks promising, despite the poor methodological quality of the translation.

Korean

The NDI and NPDS have been translated in Korean [43]. The methodological quality of the translation processes is poor [43].

There is limited positive evidence regarding the internal consistency of the NDI (Cronbach α = 0.92, assuming a 1-factor structure) [43]. No floor or ceiling effects have been detected for the NDI or NPDS and differences in score between subgroups have been reported (neck pain patients vs. healthy persons) [43].

Lack of information makes it difficult to point out whether the Korean NDI or NPDS has the best measurement properties. Based on the information available on the measurement properties of the original version of the NDI and NPDS, we advise to use the Korean NDI. (Schellingerhout JM, Heymans MW, Verhagen AP, De Vet HC, Koes BW, Terwee CB: Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review, submitted)

Spanish

The NDI, NPQ, and Core Neck Questionnaire (CNQ) have been translated in Spanish [23, 44]. The CNQ was originally designed to measure outcomes of care in patients with non-specific neck pain [45]. The methodological quality of the translation process of the NPQ is poor, [44] and of the NDI and CNQ is excellent [23].

There is limited positive evidence for a 1-factor structure of the NDI and its internal consistency (Cronbach α = 0.89) [46]. Hypothesis testing showed a positive result for correlation of the NDI with an instrument measuring pain (r = 0.65), and an instrument measuring physical functioning (r = 0.89) [46]. There is limited positive evidence for the responsiveness of the NDI [46]. There is limited negative evidence regarding the reliability of the NPQ (ICC = 0.63) [44]. No floor or ceiling effects have been detected for the NDI, NPQ, or CNQ, and scores across different categories of pain intensity have been reported [23]. The average time needed to fill out the NDI and CNQ is 4.0 and 2.1 minutes, respectively [23].

Based on the available information, we advise to use the Spanish NDI.

Swedish

The NDI is the only neck-specific questionnaire that has been translated in Swedish [22]. The methodological quality of the translation process is unknown. No floor or ceiling effects have been detected for the NDI [22].

Based on the lack of information, we advise to perform high quality studies to fill in the missing information on the measurement properties of the Swedish NDI.

Turkish

The following neck-specific questionnaires have been translated and evaluated in Turkish: NDI, [47, 48] NPDS, [21, 48] NPQ, [48] and CNFDS [48]. There are two translations of the NDI in Turkish: one translation process was of excellent methodological quality, [47] and one of fair methodological quality [48]. There are two translations of the NPDS as well: one translation process was of poor methodological quality, [21] and one of fair methodological quality [48]. The translation processes of the NPQ and CNFDS are both of fair methodological quality [48].

There is moderate positive evidence for the reliability of the NDI (ICC = 0.86-0.98), [47, 48] and limited positive evidence for hypothesis testing (r = 0.66-0.73 with instruments measuring pain and/or disability) and responsiveness (r = 0.79, with a physician's assessment of health) [47, 48]. There is limited positive evidence for the reliability (ICCNPDS = 0.81, ICCNPQ = 0.85, ICCCNFDS = 0.84) and responsiveness (rNPDS = 0.79, rNPQ = 0.81, and rCNFDS = 0.65, with a physician's assessment of health on a scale of 0 to 100) of the NPDS, NPQ, and CNFDS [48].

The average time needed to fill out the NDI, NPDS, NPQ, and CNFDS is 8.8, 10.2, 8.4, and 6.8 minutes, respectively [48]. All 4 translated questionnaires show promising results, but we advise using the NDI, because of the excellent methodological quality of the translation process and the good measurement properties of the original version. (Schellingerhout JM, Heymans MW, Verhagen AP, De Vet HC, Koes BW, Terwee CB: Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review, submitted)

Discussion

Translated versions of neck-specific questionnaires have been evaluated in 15 different languages. Generally the methodological quality of the translation process is poor, which was mainly due to the fact that the translated version was not pre-tested in the target population. Furthermore, none of the included studies performed a cross-cultural validation. This is necessary to evaluate whether the constructs underlying the original questionnaire are represented adequately by the questionnaire items in the new language. For each questionnaire, except for the Iranian NPDS and Spanish NDI, at least half of the information regarding measurement properties was lacking. Moreover, the evidence for the quality of measurement properties of the translated versions is mostly limited, due to methodological shortcomings of the included studies.

The COSMIN checklist has recently been developed and is based on consensus between experts in the field of health status questionnaires [9]. The COSMIN checklist facilitates a separate judgment of the methodological quality of the included studies and their results. This is in line with the methodology of systematic reviews of clinical trials [15]. The criteria in Table 2 are based on the levels of evidence as previously proposed by the Cochrane Back Review Group [16]. The criteria are originally meant for systematic reviews of clinical trials, but we believe that they are also applicable for reviews on measurement properties of health status questionnaires.

Exclusion of non-English papers may introduce selection bias. However, the leading journals, and as a consequence the most important studies, are published in English. So, research performed in populations with a different native language is generally still published in English. This is illustrated by the large number of articles we retrieved regarding translations of neck-specific questionnaires (see Figure 1). Thus, we argue that the most important translations have been included in our study.

Many studies showed similar methodological shortcomings. Some methodological aspects that need to be improved are: assessment of unidimensionality in internal consistency analysis, the use of stable patients and similar test conditions in studies on reliability and measurement error, and studies on construct validity and responsiveness should be based on predefined hypotheses. We do not discuss these flaws here, because we have elaborated on this subject in a separate paper. (Terwee CB, Schellingerhout JM, Verhagen AP, de Vet HC, Koes BW: Assessing the measurement properties of neck disability questionnaires: room for improvement, submitted)

We pooled the results per language, which neglects the fact that populations might share the same language, but differ in cultural context [3]. However, we think that this did not affect our results, because the only inconsistency in results for the same language version was found for the Chinese NPQ and the populations in the two studies evaluating the Chinese NPQ came from the same region in China and were similar in context [26, 27].

A systematic review of the measurement properties of the original version of neck-specific questionnaires showed that for each questionnaire, except for the NDI, at least half of the information regarding measurement properties was lacking. The available results were mainly positive, but the evidence was mostly limited. (Schellingerhout JM, Heymans MW, Verhagen AP, De Vet HC, Koes BW, Terwee CB: Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review, submitted) This systematic review of translated questionnaires shows similar findings, except that the results for construct validity and responsiveness are more frequently inconsistent or negative. These inconsistencies are in correspondence with those found for translations of the McGill Pain Questionnaire [7]. A possible explanation for this difference in results between original questionnaires and their translated counterparts is the poor methodological quality of the translation process and/or lack of cross-cultural validation [3, 4].

A poor translation process and/or lack of cross-cultural validation seem to primarily affect the validity of the questionnaire. This is illustrated by the differences found between the results for structural validity of the translated versions and their original counterparts, and the negative/inconsistent results for hypothesis testing of the translated questionnaires. This is not surprising, as the importance and/or meaning of questionnaire items (e.g. driving, depressed mood) may depend on setting and context. So, a simple translation of the original questionnaire is not sufficient and might affect the underlying constructs. The translation process does not seem to affect the reliability of the questionnaire. This is illustrated by the fact that 95% of the results for internal consistency and reliability are positive, regardless of the methodological quality of the translation process.

A recent review concluded that the translated versions of the NDI into Brazilian-Portuguese, Dutch, French, Korean, and Spanish are of high quality [6]. A possible explanation for discrepancies with our findings is that the methodological quality of the translation process was not taken into account in that review. The same accounts for a state-of-the-art review of the NDI, in which a list of available translations is recommended, without critical appraisal of the quality of the translation process and cross-cultural validation, nor the quality of the measurement properties [5].

This study evaluates the measurement properties of translated versions of neck-specific questionnaires, thereby providing an overview of their availability and making it possible to choose the best questionnaire for a specific study population. However, it is advisable to use them cautiously, since the evidence is mostly limited and for each of these translations, except for the Spanish NDI, at least half of the information regarding measurement properties is lacking. For clinical research and practice we advise to use the following questionnaires: the Catalan, Dutch, English, Iranian, Korean, Spanish and Turkish version of the NDI, the Chinese version of the NPQ, and the Finnish, German and Italian version of the NPDS. This is based on the available results for the measurement properties of these translations, and in the case of the Dutch, English, and Korean NDI on the measurement properties of the original version. (Schellingerhout JM, Heymans MW, Verhagen AP, De Vet HC, Koes BW, Terwee CB: Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review, submitted) The Greek NDI needs cross-cultural validation and due to poor methodological quality of the available study there is no information on the Swedish NDI. For all other languages it is advisable to first choose the best available original version of the neck-specific questionnaires and perform a high quality translation of this questionnaire. Our previous systematic review on the original versions of all neck-specific questionnaires showed that the NDI was the best questionnaire. (Schellingerhout JM, Heymans MW, Verhagen AP, De Vet HC, Koes BW, Terwee CB: Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review, submitted)

For future research we recommend performing high quality studies to fill in the information on the unknown measurement properties.

Conclusion

Translated versions of neck-specific questionnaires have been evaluated in 15 different languages. Generally the methodological quality of the translation process is poor and none of the included studies performed a cross-cultural validation. A substantial amount of information regarding the measurement properties of translated versions of the different neck-specific questionnaires is still lacking or assessed in studies of poor methodological quality. As a result the available evidence on the measurement properties is mostly limited. So, it is advisable to use the available translated questionnaires cautiously. For the time being we advise to use the following questionnaires in clinical research and practice: the Catalan, Dutch, English, Iranian, Korean, Spanish and Turkish version of the NDI, the Chinese version of the NPQ, and the Finnish, German and Italian version of the NPDS. The Greek NDI needs cross-cultural validation and there is no methodologically sound information for the Swedish NDI. Studies of high methodological quality are needed to fill in the unknown measurement properties.

For all other languages we advise to translate the original version of the NDI.

References

  1. Vernon H, Mior S: The Neck Disability Index: a study of reliability and validity. J Manipulative Physiol Ther. 1991, 14: 409-415.

    CAS  PubMed  Google Scholar 

  2. Wheeler AH, Goolkasian P, Baird AC, Darden BV: Development of the Neck Pain and Disability Scale. Item analysis, face, and criterion-related validity. Spine. 1999, 24: 1290-1294. 10.1097/00007632-199907010-00004.

    Article  CAS  PubMed  Google Scholar 

  3. Beaton DE, Bombardier C, Guillemin F, Ferraz MB: Guidelines for the process of cross-cultural adaptation of self-report measures. Spine. 2000, 25: 3186-3191. 10.1097/00007632-200012150-00014.

    Article  CAS  PubMed  Google Scholar 

  4. Wang WL, Lee HL, Fetzer SJ: Challenges and strategies of instrument translation. West J Nurs Res. 2006, 28: 310-321. 10.1177/0193945905284712.

    Article  CAS  PubMed  Google Scholar 

  5. Vernon H: The Neck Disability Index: State-of-the-Art, 1991-2008. J Manipulative Physiol Ther. 2008, 31: 491-502. 10.1016/j.jmpt.2008.08.006.

    Article  PubMed  Google Scholar 

  6. MacDermid JC, Walton DM, Avery S, Blanchard A, Etruw E, McAlpine C, Goldsmith CH: Measurement properties of the neck disability index: a systematic review. J Orthop Sports Phys Ther. 2009, 39: 400-417.

    Article  PubMed  Google Scholar 

  7. Menezes da Costa L, Maher CG, McAuley JH, Costa LO: Systematic review of cross-cultural adaptations of McGill Pain Questionnaire reveals a paucity of clinimetric testing. J Clin Epidemiol. 2009, 62: 934-943. 10.1016/j.jclinepi.2009.03.019.

    Article  Google Scholar 

  8. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, de Vet HC: The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010, 63: 737-745. 10.1016/j.jclinepi.2010.02.006.

    Article  PubMed  Google Scholar 

  9. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, de Vet HC: The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qua Life Res. 2010, 19: 539-549. 10.1007/s11136-010-9606-8.

    Article  Google Scholar 

  10. de Vet HC, Terwee CB, Knol DL, Bouter LM: When to use agreement versus reliability measures. J Clin Epidemiol. 2006, 59: 1033-1039. 10.1016/j.jclinepi.2005.10.015.

    Article  PubMed  Google Scholar 

  11. Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986, 1: 307-310.

    Article  CAS  PubMed  Google Scholar 

  12. Terwee CB, Roorda LD, Knol DL, De Boer MR, De Vet HC: Linking measurement error to minimal important change of patient-reported outcomes. J Clin Epidemiol. 2009, 62: 1062-1067. 10.1016/j.jclinepi.2008.10.011.

    Article  PubMed  Google Scholar 

  13. Streiner DL, Norman GR: Health measurement scales: a practical guide to their development and use. 2003, Oxford: Oxford University Press, 3

    Google Scholar 

  14. Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Patrick DL, Bouter LM, de Vet HC: The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: A clarification of its content. BMC Med Res Methodol. 2010, 10: 22-10.1186/1471-2288-10-22.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Furlan AD, Pennick V, Bombardier C, van Tulder M, Editorial Board CBRG: 2009 updated method guidelines for systematic reviews in the Cochrane Back Review Group. Spine. 2009, 34: 1929-1941. 10.1097/BRS.0b013e3181b1c99f.

  16. van Tulder M, Furlan A, Bombardier C, Bouter L, Editorial Board CBRG: Updated method guidelines for systematic reviews in the cochrane collaboration back review group. Spine. 2003, 28: 1290-1299.

  17. Terwee CB, Bot SDM, de Boer MR, van der Windt DAWM, Knol DL, Dekker J, Bouter LM, de Vet HCW: Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007, 60: 34-42. 10.1016/j.jclinepi.2006.03.012.

    Article  PubMed  Google Scholar 

  18. Forestier R, Francon A, Arroman FS, Bertolino C: French version of the Copenhagen neck functional disability scale. Joint Bone Spine. 2007, 74: 155-159. 10.1016/j.jbspin.2006.03.002.

    Article  PubMed  Google Scholar 

  19. Jorritsma W, de Vries GE, Geertzen JH, Dijkstra PU, Reneman MF: Neck Pain and Disability Scale and the Neck Disability Index: reproducibility of the Dutch Language Versions. Eur Spine J. 2010, 19: 1695-1701. 10.1007/s00586-010-1406-x.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Wlodyka-Demaille S, Poiraudeau S, Catanzariti JF, Rannou F, Fermanian J, Revel M: The ability to change of three questionnaires for neck pain. Joint Bone Spine. 2004, 71: 317-326. 10.1016/j.jbspin.2003.04.004.

    Article  PubMed  Google Scholar 

  21. Bicer A, Yazici A, Camdeviren H, Erdogan C: Assessment of pain and disability in patients with chronic neck pain: reliability and construct validity of the Turkish version of the neck pain and disability scale. Disabil Rehabil. 2004, 26: 959-962. 10.1080/09638280410001696755.

    Article  PubMed  Google Scholar 

  22. Ackelman BH, Lindgren U: Validity and reliability of a modified version of the neck disability index. J Rehabil Med. 2002, 34: 284-287. 10.1080/165019702760390383.

    Article  PubMed  Google Scholar 

  23. Kovacs FM, Bago J, Royuela A, Seco J, Gimenez S, Muriel A, Abraira V, Martin JL, Pena JL, Gestoso M, et al: Psychometric characteristics of the Spanish version of instruments to measure neck pain disability. BMC Musculoskelet Disord. 2008, 9: 42-10.1186/1471-2474-9-42.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Bremerich FH, Grob D, Dvorak J, Mannion AF: The neck pain and disability scale: Cross-cultural adaptation into german and evaluation of its psychometric properties in chronic neck pain and C1-2 fusion patients. Spine. 2008, 33: 1018-1027. 10.1097/BRS.0b013e31816c9107.

    Article  PubMed  Google Scholar 

  25. Nieto R, Miro J, Huguet A: Disability in subacute whiplash patients: usefulness of the neck disability index. Spine. 2008, 33: E630-635. 10.1097/BRS.0b013e31817eb836.

    Article  PubMed  Google Scholar 

  26. Chiu TT, Lam TH, Hedley AJ: Subjective health measure used on Chinese patients with neck pain in Hong Kong. Spine. 2001, 26: 1884-1889. 10.1097/00007632-200109010-00013.

    Article  CAS  PubMed  Google Scholar 

  27. Lee KC, Chiu TT, Lam TH: Correlation between generic health status and region-specific functional measures on patients with neck pain. Int J Rehabil Res. 2006, 29: 217-220. 10.1097/01.mrr.0000210060.91741.bd.

    Article  PubMed  Google Scholar 

  28. Leak AM, Cooper J, Dyer S, Williams KA, Turner-Stokes L, Frank AO: The Northwick Park Neck Pain Questionnaire, devised to measure neck pain and disability. Br J Rheumatol. 1994, 33: 469-474. 10.1093/rheumatology/33.5.469.

    Article  CAS  PubMed  Google Scholar 

  29. Pool JJ, Ostelo RW, Hoving JL, Bouter LM, de Vet HC: Minimal clinically important change of the Neck Disability Index and the Numerical Rating Scale for patients with neck pain. Spine. 2007, 32: 3047-3051. 10.1097/BRS.0b013e31815cf75b.

    Article  PubMed  Google Scholar 

  30. Schmitt MA, de Wijer A, van Genderen FR, van der Graaf Y, Helders PJ, van Meeteren NL: The Neck Bournemouth Questionnaire cross-cultural adaptation into Dutch and evaluation of its psychometric properties in a population with subacute and chronic whiplash associated disorders. Spine. 2009, 34: 2551-2561. 10.1097/BRS.0b013e3181b318c4.

    Article  PubMed  Google Scholar 

  31. Vos CJ, Verhagen AP, Koes BW: Reliability and responsiveness of the Dutch version of the Neck Disability Index in patients with acute neck pain in general practice. Eur Spine J. 2006, 15: 1729-1736. 10.1007/s00586-006-0119-7.

    Article  PubMed  Google Scholar 

  32. Bolton JE, Humphreys BK: The Bournemouth Questionnaire: A short-form comprehensive outcome measure. II. Psychometric properties in neck pain patients. J Manipulative Physiol Ther. 2002, 25: 141-148. 10.1067/mmt.2002.123333.

    Article  PubMed  Google Scholar 

  33. Stewart M, Maher CG, Refshauge KM, Bogduk N, Nicholas M: Responsiveness of pain and disability measures for chronic whiplash. Spine. 2007, 32: 580-585. 10.1097/01.brs.0000256380.71056.6d.

    Article  PubMed  Google Scholar 

  34. Jordan A, Manniche C, Mosdal C, Hindsberger C: The Copenhagen neck functional disability scale: A study of reliability and validity. J Manipulative Physiol Ther. 1998, 21: 520-527.

    CAS  PubMed  Google Scholar 

  35. Salo P, Ylinen J, Kautiainen H, Arkela-Kautiainen M, Hakkinen A: Reliability and validity of the finnish version of the neck disability index and the modified neck pain and disability scale. Spine. 2010, 35: 552-556. 10.1097/BRS.0b013e3181b327ff.

    Article  PubMed  Google Scholar 

  36. Wlodyka-Demaille S, Poiraudeau S, Catanzariti JF, Rannou F, Fermanian J, Revel M: French translation and validation of 3 functional disability scales for neck pain. Arch Phys Med Rehabil. 2002, 83: 376-382. 10.1053/apmr.2002.30623.

    Article  PubMed  Google Scholar 

  37. Martel J, Dugas C, Lafond D, Descarreaux M: Validation of the French version of the Bournemouth Questionnaire. J Canadian Chiropractic Association. 2009, 53: 102-110.

    Google Scholar 

  38. Scherer M, Blozik E, Himmel W, Laptinskaya D, Kochen MM, Herrmann-Lingen C: Psychometric properties of a German version of the neck pain and disability scale. Eur Spine J. 2008, 17: 922-929. 10.1007/s00586-008-0677-y.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Trouli MN, Vernon HT, Kakavelakis KN, Antonopoulou MD, Paganas AN, Lionis CD: Translation of the Neck Disability Index and validation of the Greek version in a sample of neck pain patients. BMC Musculoskelet Disord. 2008, 9:

    Google Scholar 

  40. Agarwal S, Allison GT, Agarwal A, Singer KP: Reliability and validity of the Hindi version of the Neck Pain and Disability Scale in cervical radiculopathy patients. Disabil Rehabil. 2006, 28: 1405-1411. 10.1080/09638280600641467.

    Article  PubMed  Google Scholar 

  41. Mousavi SJ, Parnianpour M, Montazeri A, Mehdian H, Karimi A, Abedi M, Ashtiani AA, Mobini B, Hadian MR: Translation and validation study of the Iranian versions of the neck disability index and the neck pain and disability scale. Spine. 2007, 32: E825-E831. 10.1097/BRS.0b013e31815ce6dd.

    Article  PubMed  Google Scholar 

  42. Monticone M, Baiardi P, Nido N, Righini C, Tomba A, Giovanazzi E: Development of the Italian version of the Neck Pain and Disability Scale, NPDS-I: cross-cultural adaptation, reliability, and validity. Spine. 2008, 33: E429-434. 10.1097/BRS.0b013e318175c2b0.

    Article  PubMed  Google Scholar 

  43. Lee H, Nicholson LL, Adams RD, Maher CG, Halaki M, Bae SS: Development and psychometric testing of Korean language versions of 4 neck pain and disability questionnaires. Spine. 2006, 31: 1841-1845. 10.1097/01.brs.0000227268.35035.a5.

    Article  PubMed  Google Scholar 

  44. Gonzalez T, Balsa A, Sainz de Murieta J, Zamorano E, Gonzalez I, Martin-Mola E: Spanish version of the Northwick Park Neck Pain Questionnaire: reliability and validity. Clin Exp Rheumatol. 2001, 19: 41-46.

    CAS  PubMed  Google Scholar 

  45. White P, Lewith G, Prescott P: The core outcomes for neck pain: Validation of a new outcome measure. Spine. 2004, 29: 1923-1930. 10.1097/01.brs.0000137066.50291.da.

    Article  PubMed  Google Scholar 

  46. Andrade Ortega JA, Delgado Martinez AD, Almecija Ruiz R: Validation of the Spanish version of the Neck Disability Index. Spine. 2010, 35: E114-118. 10.1097/BRS.0b013e3181afea5d.

    Article  PubMed  Google Scholar 

  47. Aslan E, Karaduman A, Yakut Y, Aras B, Simsek IE, Yagly N: The cultural adaptation, reliability and validity of neck disability index in patients with neck pain: a Turkish version study. Spine. 2008, 33: E362-365. 10.1097/BRS.0b013e31817144e1.

    Article  PubMed  Google Scholar 

  48. Kose G, Hepguler S, Atamaz F, Oder G: A comparison of four disability scales for Turkish patients with neck pain. J Rehabil Med. 2007, 39: 358-362. 10.2340/16501977-0060.

    Article  PubMed  Google Scholar 

Pre-publication history

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jasper M Schellingerhout.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

JMS carried out the bibliographic search, data extraction and assessment of (methodological) quality, and drafted the manuscript. MWH revised the manuscript. APV carried out the bibliographic search and revised the manuscript. HCV was involved in the bibliographic search, data extraction and assessment of (methodological) quality, and revised the manuscript. BWK revised the manuscript. CBT carried out the data extraction and assessment of (methodological) quality, and revised the manuscript. All authors were involved in designing the study. All authors read and approved the final manuscript.

Martijn W Heymans, Arianne P Verhagen, Henrica C de Vet, Bart W Koes and Caroline B Terwee contributed equally to this work.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Schellingerhout, J.M., Heymans, M.W., Verhagen, A.P. et al. Measurement properties of translated versions of neck-specific questionnaires: a systematic review. BMC Med Res Methodol 11, 87 (2011). https://doi.org/10.1186/1471-2288-11-87

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1471-2288-11-87

Keywords