- Research article
- Open Access
A systematic review to investigate the measurement properties of goal attainment scaling, towards use in drug trials
BMC Medical Research Methodology volume 16, Article number: 99 (2016)
One of the main challenges for drug evaluation in rare diseases is the often heterogeneous course of these diseases. Traditional outcome measures may not be applicable for all patients, when they are in different stages of their disease. For instance, in Duchenne Muscular Dystrophy, the Six Minute Walk Test is often used to evaluate potential new treatments, whereas this outcome is irrelevant for patients who are already in a wheelchair. A measurement instrument such as Goal Attainment Scaling (GAS) can evaluate the effect of an intervention on an individual basis, and may be able to include patients even when they are in different stages of their disease. It allows patients to set individual goals, together with their treating professional. However, the validity of GAS as a measurement instrument in drug studies has never been systematically reviewed. Therefore, we have performed a systematic review to answer two questions: 1. Has GAS been used as a measurement instrument in drug studies? 2: What is known of the validity, responsiveness and inter- and intra-rater reliability of GAS, particularly in drug trials?
We set up a sensitive search that yielded 3818 abstracts. After careful screening, data-extraction was executed for 58 selected articles.
Of the 58 selected articles, 38 articles described drug studies where GAS was used as an outcome measure, and 20 articles described measurement properties of GAS in other settings. The results show that validity, responsiveness and reliability of GAS in drug studies have hardly been investigated. The quality of the reporting of validity in studies in which GAS was used to evaluate a non-drug intervention also leaves much room for improvement.
We conclude that there is insufficient information to assess the validity of GAS, due to the poor quality of the validity studies. Therefore, we think that GAS needs further validation in drug studies, especially since GAS can be a potential solution when a small heterogeneous patient group is all there is to test a promising new drug.
The protocol has been registered in the PROSPERO international prospective register for systematic reviews, with registration number CRD42014010619. http://www.crd.york.ac.uk/PROSPERO/display_record.asp?ID=CRD42014010619.
One of the main challenges for drug evaluation in rare diseases is the heterogeneous course of these diseases. When a disease course differs from patient to patient, traditional outcome measures may not be applicable for all patients of a certain disease. Trial designs are often limited to patients for whom the outcome measure is relevant, whereas the underlying disease mechanism may be similar in a larger group. This increases the problem of small numbers that already challenges rare disease research.
For example, in Duchenne muscular dystrophy (DMD), new drug trials until recently often used the 6-min Walk Test (6MWT) as an outcome measure. The 6MWT has been validated as a reliable and feasible outcome measure, and has been recommended as the primary outcome measure in ambulatory DMD patients [1, 2]. However, although the 6MWT may be a relevant outcome measure for boys who are not (yet) depending on a wheelchair, it is obviously irrelevant for, usually somewhat older, boys who are. This problem in DMD research has been picked up by patient representatives and researchers from all over the world .
As the DMD example shows, existing measurement instruments use an outcome that is not relevant for all patients, or may not be responsive enough to measure the effect of an intervention in a rare disease. However, the development of disease-specific and patient-relevant outcome measures is hampered by the small number and heterogeneity of patients with a particular rare disease. In their handbook “Measurement in Medicine” De Vet et al.  recommend a minimum number of 50 patients for validation studies.
A measurement instrument that can evaluate the effect of an intervention on an individual basis may help overcome the problem of small, heterogeneous populations. The importance of patient reported outcome measures is widely recognized by pharmaceutical companies and clinical researchers as well as regulators and government agencies such as FDA and NIH .
Goal Attainment Scaling (GAS) is a measurement instrument that is intended for individual evaluation of an intervention. It allows patients to set individual goals, together with their treating professional. The number of goals and the content of these goals may differ per patient, but the attainment of the goals is measured in a standardized way. This makes a standardized evaluation of an intervention possible, even when the patients are all in a different stage of their disease.
Goal Attainment Scaling was first introduced in 1968, by Kiresuk and Sherman , originally for the evaluation of mental health services. It contains a variable number of self-defined goals and very explicit descriptions of five possible levels of goal attainment that are formulated before the intervention, usually in consultation between the patient and the clinician. In the original definition, the levels are each quantified in a 5-point scale that ranges from −2 to +2, where −2 = the most unfavorable treatment outcome thought likely, −1 = less than expected level of treatment success, 0 = expected level of treatment success, +1 = more than expected success with treatment, and +2 = best conceivable success with treatment. For each goal the expected level of treatment success and at least two other levels need to be described in such a specific way that an independent observer can assess the outcome.
There is no maximum number of goals that can be set. Each goal can be assigned a weight, according to its importance to patient and/or clinician. From the scores reached after the intervention, a composite goal attainment score is computed using the following formula:
where T is the composite score, wi is the weight assigned to the goali, xi is the original score for goali ranging from −2 to +2, and ρ is the estimated correlation between goal scores. According to Kiresuk and Sherman, it is safe to assume that the correlation between the goal scores is constant, and can be set at 0.3. The T-score has a mean of 50 and a standard deviation of 10, under the assumptions as proposed by Kiresuk and Sherman .
Besides mental health and non-medical fields such as education and social service applications , GAS is reportedly used in a few specific medical research areas, such as rehabilitation [8–12] and geriatrics [13–15]. However, the validity of GAS as a measurement instrument in drug studies has never been systematically reviewed. To evaluate the usefulness of GAS in drug studies, we formulated the following three research questions:
Has Goal Attainment Scaling been used as a measurement instrument in drug studies?
What (drug) interventions were evaluated by studies using GAS?
What is known of the validity, responsiveness and inter- and intra-rater reliability of Goal Attainment Scaling in general, and in particular in drug trials?
In this study, we follow the COSMIN guidelines, which are the generally used and accepted standards for measurement properties evaluation . This checklist contains standards for evaluating the methodological quality of studies on the measurement properties of health measurement instruments. According to the COSMIN guidelines, a health status measurement instrument can be used when its validity, reliability and responsiveness, have been tested and considered adequate. We considered GAS useful when the validity, reliability and responsiveness have been described, tested and found acceptable according to these guidelines.
We conducted a systematic review, according to the PRISMA guidelines .
We set up a sensitive search in Medline, PsychInfo and Embase. We searched for literature from 1968, the year when GAS was introduced by Kiresuk and Sherman , to May 1st, 2015. For the full search strategy, see Additional file 1. Reference lists of relevant review articles were screened for additional papers.
Papers were included in which:
Goal Attainment Scaling met the following criteria:
One or more individual goals were established by the patient or by one or more researchers or practitioners, either with or without input of the patient, prior to the intervention. The goals did not have to be devised by the patient/researcher, as long as the goals were individually chosen per patient.
The scale had to consist of at least three points (e.g. more than just goal attained – goal not attained). At least 2 points on the scale were described precisely and objectively, so that an independent observer would be able to determine whether the patient performs above or below that point.
The study was either a trial in which drugs are evaluated, or a study of any design in which psychometric properties of GAS were evaluated.
The outcome measure was the attainment of goals that had been established before the onset of the intervention.
The goals had been set up individually, i.e. per patient.
Trials using an outcome measure called Goal Attainment Scaling, when the outcome measure did not meet our definition of GAS.
Studies in which goal setting was used as an intervention rather than outcome measurement.
Reviews or narratives.
Papers published in languages other than English, French, Dutch, German or Spanish.
Papers published before 1968.
The selection of articles and data-extraction were performed in pairs of two independent reviewers. Disagreements were discussed until consensus was reached; if necessary a third reviewer acted as a referee. A standardized data-extraction form was used (see Additional file 2). We divided the included studies into two categories, i.e. drug studies, and non-drug studies in which the measurement properties of GAS were investigated.
We extracted information about the following measurement properties, defined according to the COSMIN guidelines : Inter-rater reliability, intra-rater reliability, face validity, content validity, construct validity, and responsiveness. For the full definitions of the measurement properties, see Table 1. We used the quality criteria as proposed by Terwee et al.  to evaluate the measurement properties, as also displayed in Table 1. We chose to limit the evaluation of the quality of the measurement properties to the criteria as proposed by Terwee et al., instead of using the full COSMIN guidelines, because the COSMIN guidelines are very detailed, and many details are not relevant as these aspects cannot be evaluated for GAS, e.g. internal consistency, measurement error, criterion validity.
The search yielded 3007, 1413, and 1039 abstracts from Medline, Embase and PsychInfo, respectively. After eliminating duplicates, a total of 3818 abstracts remained for screening. In the screening phase, we excluded 3511 articles based on title and abstract, and 249 articles based on the full text. Data-extraction was executed for the remaining 58 articles (see Fig. 1). Of these 58 articles, 38 articles described drug studies in which GAS was used as an outcome measure, and 20 articles described measurement properties of GAS in other settings (Fig. 2).
In Table 2 the characteristics of the articles are presented. Most studies are trials in patients with cerebral palsy or patients with spasticity due to other causes, such as acquired brain trauma or stroke (28 studies). Also, many studies focussed on the geriatric population (15 studies). There were also some studies on autism (three studies), or neurological disorders such as MS (two studies). The remaining studies covered research areas such as family problems, goal setting in adolescent students or behaviour and psychiatric problems.
Most drug studies evaluated an intervention with botulinum toxin (25 studies), mainly in patients with cerebral palsy and spasticity. Baclofen was also evaluated in children with spasticity (three studies). Other drugs that were evaluated, were galantamine (three studies), donepezil for Alzheimer’s Disease (two studies), fluvoxamine, trihexyphenidil, memantine, a phenol nerve block, and linopirdine (one study each).
As is shown in Tables 3 and 4, face validity is reported in one article . This is a drug study that evaluated the use of Fluvoxamine in patients who met the criteria for panic disorder with moderate to severe agoraphobia. GAS was used as a primary outcome measure. Both therapists and independent raters who assessed the level of goal attainment after the intervention, were asked to rate the relevance of the chosen goals on a scale of 1 to 5 (with one meaning irrelevant and five meaning very relevant). Therapists only rated the GAS score of patients not treated by themselves. The mean score of the therapists was 4.68 (SD = .51), and the mean score of the independent raters was 4.66 (SD = .52). The researchers concluded that these numbers show that ‘the goal areas were suitably chosen’. The target population of GAS (the patients) were not involved in this evaluation, which is one of the requirements of the quality criteria that we use. However, it is inherent in the measurement instrument that the patient is involved in the choice of the items. Therefore, we score the quality of the face validity evaluation as ‘good quality’.
Content validity was reported in five studies, of which one was a drug study. Content validity was measured in several ways, as shown in Table 5; by rating the usefulness or importance of the goals [21, 22], by comparing the goal areas with essential components as recommended by position papers in the specific field  and by checking whether the goals were formulated according to the criteria ‘Specific, Measurable, Assignable, Realistic, and Time-related’(SMART) [24, 25]. In one study, the content validity was reportedly tested by grouping the goals into major categories, and analyzing the content of these categories . However, the study did not report the results of the categorization of the goals . The quality of the content validity varied from ‘good quality’ in two studies, ‘intermediate quality’ in two studies and ‘poor quality’ in one study. Authors reported a ‘good overall usefulness’ of the goals , stated that all recommended areas were represented in the goals , whether goals were set according to the SMART principle (in this particular study, it was concluded that there was, even after a refinement process of the goal statements, still a difference in the quality of the goal statements between the different sites) [24, 25] or that more than 70 % of the responders rated GAS as a 4 or 5 on a 5-point scale as clinically relevant and important .
Construct validity was reported in 18 studies, of which six were drug studies (Table 6). In all 18 studies construct validity was assessed by correlations with other instruments measuring a construct similar to the goals that were expected to be set by the patients in each specific research area. Also, T-tests between the placebo and intervention condition , or T-tests between the lowest and highest T-score differences , were used to verify construct validity. In none of the studies, a hypothesis was formulated on the expected construct validity outcomes. Therefore, the quality of the construct validity is difficult to evaluate. Of the 18 studies, 14 reported significant correlations with other measurement instruments that were relevant for the research area. The measurement instruments used to establish the construct validity varied considerably, since GAS is used for different research areas. Three studies reported that no significant correlations with other measurement instruments were found [21, 29, 30]. In one study correlations between change scores were measured. The results were not clearly reported .
Intra- and inter-rater reliability
As can be seen in Tables 3 and 4, intra-rater reliability was not assessed in any of the included studies. Inter-rater reliability was reported in 12 studies, of which two were drug studies. Different methods were used to measure the inter-rater reliability (Table 7). In four studies we rated the quality of the inter-rater reliability as poor, whereas eight studies were rated with ‘good quality’. Eight out of the 12 studies reported an ICC score. Five of those studies reported that the ICC values were all 0.9 and higher [31–35]. Two studies reported ICC values between 0.8 and 0.95 [26, 36]. In one study, the reported ICC was lower than 0.5 . The specific calculation for the ICC was reported in one study . Confidence intervals for the ICC values were also reported in one study . Inter-rater reliability was also reported with kappa-values [21, 38], where the values ranged from substantial to almost perfect agreement. Another method that was used was calculating a correlation, which had a value of 0.84 . One study reported ‘agreement’ between objective goal setters and the therapists who performed the interventions, and ‘agreement’ between objective goal setters and people who did the intake of the patients before the patients were randomized. The results were an agreement of 43 and 57 % respectively. However, in the article the method used to calculate this agreement were not reported .
Responsiveness was reported in 14 studies, of which two were drug studies (Table 8). None of the studies used measurement properties as advised by Terwee et al. . Therefore, it is difficult to evaluate the quality of the responsiveness. In nine of those 14 studies, an effect size of the measured differences was reported [26, 29–31, 33, 39–42]. Of those nine studies, the reported effect size was below 1 in only one study . In five studies, a Relative Efficiency was reported [26, 30, 31, 33, 41]. The relative efficiency of two procedures or measurement instruments is the ratio of their efficiencies. For instance, a comparison can be made between GAS and a regularly used measurement instrument. The Relative Efficiency varied between 3 and 57, but was substantial in most studies, meaning that GAS is more efficient, or needs less observations, than other measurement instruments. A Standardized Response Mean was reported in six studies [22, 23, 26, 40–42]. A standardized response mean (SRM) is an effect size index used to measure the responsiveness of scales to clinical change. The SRM is computed by dividing the mean change score by the standard deviation of the change. The SRM’s that were reported varied between 1.2 and 3.54. Two studies measured responsiveness with a paired t-test comparing response before and after the intervention, with a significant difference in GAS T-scores in both studies [22, 39]. In one study, the sensitivity, specificity and positive and negative predictive value were calculated based on a group of responders and non-responders . The results were 52, 85, 81 and 60 %, respectively. In another study, responsiveness was reported as the number of patients who showed a change in T-scores of different goal areas . The proportion of patients showing changes on GAS was larger than on other measurement instruments. The number of patients showing change were nine out of 23 patients on the physical goals, 18 out of 23 patients on occupational goals and 12 out of 18 patients on speech goals, whereas there was only one patient that showed change on the Gross Motor Function Measure (GMFM-66).
In this systematic review, we have found 58 articles, of which 38 drug studies, where GAS was used as an outcome measure. Therefore, we may conclude that GAS has indeed been used in drug studies. Most drug studies that report any information on the validity of GAS, used Botulinum Toxin as an intervention for spasticity, usually in combination with physical or occupational therapy. The generalizability of the results of these validation studies is limited. The validity, responsiveness and reliability of GAS in drug studies have scarcely been studied. In only seven of the 38 drug studies that we found, some validation has been performed. The methods used to validate the measurements instruments often differ from the methods as proposed by COSMIN. The quality of the methods to assess measurement properties varies, and results are often difficult to interpret. We found 20 articles concerning non-drug studies reporting on the validity, responsiveness and inter-rater reliability of GAS. However, also in studies in which GAS was used to evaluate a non-drug intervention, the quality of the validity reports leaves much room for improvement.
In most articles, either drug or non-drug studies, no definition was given of the measurement properties that were assessed, the formulae used for calculation of parameters were not presented, and in some papers the results of the validity check were not reported [26, 31]. Also, none of the included articles describe hypotheses to test construct validity, which makes evaluating the reported results virtually impossible. Therefore, we conclude that the validity and reliability of GAS have not been researched extensively, neither in studies where a drug intervention was evaluated, nor in other studies.
Of all clinimetric characteristics that were investigated, the responsiveness of GAS was investigated most thoroughly. The responsiveness was consistently reported to be very good compared to other measurement instruments, such as the Gross Motor Function Measure (GMFM-66) in the evaluation of children with cerebral palsy, or the Standardized Mini Mental State Examination (SMMSE) for geriatric assessment. However, none of the studies evaluated the responsiveness according to the guidelines as proposed by Terwee et al. . Therefore, it is difficult to be conclusive on the responsiveness of GAS, although the reported results suggest we may tentatively be optimistic.
The search of this systematic review was very sensitive, to make sure that no studies on GAS were missed. However, our definition of GAS is rather specific, which excludes studies with an approach that is similar, but not exactly the same. Also, we may have missed studies that did not use similar terminology, but did use an approach similar to GAS.
Our findings are consistent with previous systematic reviews on the measurement properties of GAS. For instance, Steenbeek et al.  concluded that, in the setting of pediatric rehabilitation, GAS is a very responsive method for treatment evaluation and individual goal setting, but sufficient knowledge is lacking about its reliability and validity, particularly. Also, in the field of psychogeriatrics, GAS may be considered useful from a theoretical point of view. Geriatric patients are heterogeneous, and GAS may be a useful tool to evaluate geriatric interventions. However, the measurement properties of GAS in geriatrics show mixed results. The evidence is not yet strong enough to state that GAS is an applicable outcome measure in this particular field . In a systematic review on the feasibility of measurement instruments related to goal setting, GAS is considered a helpful tool for setting goals, although it is time-consuming and may be difficult for patients with cognitive impairments. However, the patient-centered nature of GAS makes it easier to focus on meaningful patient-directed treatment goals. Also, according to the results the scaling of GAS makes it possible to detect very small progress that may be of great significance to the patient, underlining its potential in responsiveness .
A problem in the evaluation of the validity of GAS may be that GAS does not measure one clear construct, since the content of the goals generally differs from patient to patient. One of the possibilities to overcome this inherent problem may be to make an item bank of possible goals that patients would be able to choose from, to make sure that the methodological properties of the goals are known . However, this would be practically very difficult to achieve, since we suspect that for many orphan diseases the patient numbers are smaller, and goals could be more diverse than those of non-orphan disease patients. Another way of approaching the construct validity is to see GAS as a measurement instrument that measures the construct of the attainment of goals. Then, the construct validity could be evaluated by comparing GAS with another measurement instrument that evaluates the attainment of goals, such as the COPM. To our knowledge, this approach has not been considered so far.
The importance and difficulty of goals are often taken into account by assigning weights to the goals (more important goals are assigned a larger weight then less important goals). However, terms such as importance and difficulty are by nature subjective. What is important for one patient, may be less important for another. For example, a Duchenne patient may perceive being able to brush his teeth as very important, where someone else may conceive it as trivial. Can this difference in importance objectively be measured? In a study on the reliability of GAS weights, Marson, Wei and Wasserman  conclude that assigning weights to the goals of GAS according to the severity of the problem has an acceptable inter-rater reliability when scored by different objective students trained in the use of GAS. This indicates that although importance and difficulty are difficult to objectively measure, objective raters may still score goals similarly. However, more research should be carried out on this topic to answer the question more definitively.
GAS is a measurement instrument with a high potential, especially in rare diseases, but in order to use it in drug studies, more research on its validity is essential. One way of achieving this would be to use GAS as an additional measurement instrument in an ongoing drug trial, to further explore its validity. For GAS to be possibly useful, the effect of the evaluated drug should be objectively measureable in terms of behavior, and it should measure something that is valuable and noticeable for a patient, and cannot be measured otherwise. Also, the drug that is evaluated should have an effect that is also clinically relevant. Again, Duchenne Muscular Dystrophy may serve as an example. A potential drug should do more than just improve for instance the dystrophin values in muscle biopsies. It should be able to improve something that is valuable for the patient, which can be measured by activities that patients perceive as important, such as brushing teeth or using a computer. GAS may be a useful outcome measure, since it can evaluate a potential drug on a patient level, and is therefore intrinsically clinically relevant.
According to guidelines on Patient Reported Outcomes and Health Related Quality of Life by the FDA and EMA, and open comments on these guidelines by experts , the following qualities were essential: a PRO should be based on a clearly defined framework, patients should be involved in the development of the measurement instrument, PRO claims should be based on and supported by improvement in all domains of a specific disease, an appropriate recall period is necessary when the effects of an intervention are tested, the test-retest reliability should be assessed, as well as the ability to detect change and the interpretability of the measurement instrument. Finally, an effect found by a PRO measurement instrument can only be valid when found in an RCT.
In general these requirements also apply to GAS, e.g. patient involvement. However, not all of them are applicable to this instrument, such as test-retest reliability. Before GAS can be used in drug trials, more validity research is needed. GAS has not yet been sufficiently validated to be supported by the regulatory agencies, but it may have potential in specific drug trials, especially in rare diseases where there is a lack of validated and responsive outcome measurement instruments.
We conclude that currently there is insufficient information to assess the validity of GAS, due to the poor quality of the validity studies. However, the overall reported good responsiveness of GAS suggests that it may be a valuable measurement instrument. GAS is an outcome measure that is inherently relevant for patients, making it a valuable tool for research in heterogeneous and small samples. Therefore, we think that GAS needs further validation in drug studies, especially since GAS can be a potential solution when only a small heterogeneous patient group is available to test a promising new drug.
ADAS-cog, Alzheimer’s disease assessment scale – cognitive subscale; AHA, assisting hand assessment; AMPS, assessment of motor and process scales; AQoL, assessment of quality of life; ARAT, action research arm test; AUC, Area under the receiver operating characteristics curve; BAD-scale, Barry-Albright Dystonia scale; CBS, Caregiving Burden Scale; CDS, Cardiac depression scale; CES-D, Center for epidemiological studies depression scale; CGI, clinical global impression; CHQ, child heath questionnaire; CIBIC-plus, Clinician’s interview based impression of change-plus; COPM, Canadian occupational performance measure; DAD, disability assessment for dementia; DCD Pinch, dynamic computerized dynamometry; FAC, functional ambulation category; FAQ, functional activities questionnaire; FIM, functional independence measure; GAS, goal attainment scaling; GHQ, general health questionnaire; GMFM, gross motor function measure; HADS, hospital anxiety and depression scale; IADL, instrumental activities of daily living; ICC, intraclass correlation coefficient; LASIS, leeds adult spasticity impact scale; LoA, limits of agreement; MAS, modified Ashworth scale; MAUULF, Melbourne assessment of unilateral upper limb function; MHOQ, Michigan hand outcomes questionnaire; MIC, minimal important change; MMSE, mini-mental state examination; MPQ, McGill pain questionnaire; MTS, Modified Tardieu Scale; NHP, Nottingham health profile; NRS, pain intensity numerical rating scale; OARS IADL, Older Americans resource scale for instrumental activities of daily living; ODQ, Oswestry low back pain disability questionnaire; PAIRS, pain and impairment relationship scale; PDMS-FM, peabody developmental motor scale – fine motor; PEDI, pediatric evaluation of disability inventory; PET-GAS, psychometrically equivalence tested goal attainment scaling; PSMS, physical self-maintenance scale; QoL, quality of life; QUEST, quality of upper extremity skills test; RR, responsiveness ratio; SDC, smallest detectable change; TSA, Tardieu Spasticity Angle
McDonald CM, Henricson EK, Abresch RT, Florence J, Eagle M, Gappmaier E, et al. The 6-minute walk test and other clinical endpoints in duchenne muscular dystrophy: reliability, concurrent validity, and minimal clinically important differences from a multicenter study. Muscle Nerve. 2013;48(3):357–68. doi:10.1002/mus.23905.
McDonald CM, Henricson EK, Han JJ, Abresch RT, Nicorici A, Elfring GL, et al. The 6-minute walk test as a new outcome measure in Duchenne muscular dystrophy. Muscle Nerve. 2010;41(4):500–10. doi:10.1002/mus.21544.
Mayhew A, Mazzone ES, Eagle M, Duong T, Ash M, Decostre V, et al. Development of the performance of the upper limb module for Duchenne muscular dystrophy. Dev Med Child Neurol. 2013;55(11):1038–45.
De Vet HC, Terwee CB, Mokkink LB, Knol DL. Measurement in medicine: a practical guide. Cambridge: Cambridge University Press; 2011.
Mendell JR, Csimma C, McDonald CM, Escolar DM, Janis S, Porter JD, et al. Challenges in drug development for muscle disease: a stakeholders’ meeting. Muscle Nerve. 2007;35(1):8–16.
Kiresuk TJ, Sherman RE. Goal attainment scaling: a general method for evaluating comprehensive community mental health programs. Community Ment Health J. 1968;4(6):443–53. doi:10.1007/bf01530764.
Kiresuk TJ, Smith A, Cardillo JE. Goal attainment scaling: applications, theory, and measurement. London: Psychology Press; 2014.
Odding E, Roebroeck ME, Stam HJ. The epidemiology of cerebral palsy: incidence, impairments and risk factors. Disabil Rehabil. 2006;28(4):183–91. doi:10.1080/09638280500158422.
Pandyan AD, Gregoric M, Barnes MP, Wood D, Van Wijck F, Burridge J, et al. Spasticity: clinical perceptions, neurological realities and meaningful measurement. Disabil Rehabil. 2005;27(1–2):2–6.
Steenbeek D, Ketelaar M, Galama K, Gorter JW. Goal attainment scaling in paediatric rehabilitation: a critical review of the literature. Dev Med Child Neurol. 2007;49(7):550–6. doi:10.1111/j.1469-8749.2007.00550.x.
van Kuijk AA, Geurts AC, Bevaart BJ, van Limbeek J. Treatment of upper extremity spasticity in stroke patients by focal neuronal or neuromuscular blockade: a systematic review of the literature. J Rehabil Med. 2002;34(2):51–61.
Wade DT. Goal planning in stroke rehabilitation: evidence. Topology. 1999;6(2):37–42. http://dx.doi.org/10.1310/FMYJ-RKG1-YANB-WXRH.
Birks J, Craig D. Galantamine for vascular cognitive impairment. Cochrane Database Syst Rev. 2013;4:Cd004746. doi:10.1002/14651858.CD004746.pub2.
Bouwens SF, van Heugten CM, Verhey FR. Review of goal attainment scaling as a useful outcome measure in psychogeriatric patients with cognitive disorders. Dement Geriatr Cogn Disord. 2008;26(6):528–40. doi:10.1159/000178757.
Loy C, Schneider L. Galantamine for Alzheimer’s disease and mild cognitive impairment. Cochrane Database Syst Rev. 2006;1:Cd001747. doi:10.1002/14651858.CD001747.pub3.
Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19(4):539–49.
Moher D, Liberati A, Tetzlaff J, Altman DG, Group P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Int J Surg. 2010;8(5):336–41.
Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63(7):737–45.
Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42.
De Beurs E, Lange A, Blonk RWB, Koele P, Van Balkom AJLM, Van Dyck R. Goal attainment scaling: an idiosyncratic method to assess treatment effectiveness in agoraphobia. J Psychopathol Behav Assess. 1993;15(4):357–73.
Palisano RJ, Gowland C. Validity of goal attainment scaling in infants with motor delays. Phys Ther. 1993;73(10):651–60.
Stolee P, Awad M, Byrne K, DeForge R, Clements S, Glenny C. A multi-site study of the feasibility and clinical utility of Goal Attainment Scaling in geriatric day hospitals. Disabil Rehabil. 2012;34(20):1716–26. http://dx.doi.org/10.3109/09638288.2012.660600.
Yip AM, Gorman MC, Stadnyk K, Mills WG, MacPherson KM, Rockwood K. A standardized menu for Goal Attainment Scaling in the care of frail elders. Gerontologist. 1998;38(6):735–42.
Turner-Stokes L, Fheodoroff K, Jacinto J, Maisonobe P. Results from the Upper Limb International Spasticity Study-II (ULIS-II): a large, international, prospective cohort study investigating practice and goal attainment following treatment with botulinum toxin a in real-life clinical management. BMJ Open. 2013;3(6). http://dx.doi.org/10.1136/bmjopen-2013-002771.
Turner-Stokes L, Fheodoroff K, Jacinto J, Maisonobe P, Zakine B. Upper limb international spasticity study: rationale and protocol for a large, international, multicentre prospective cohort study investigating management and goal attainment following treatment with botulinum toxin A in real-life clinical practice. BMJ Open. 2013;3(3). http://dx.doi.org/10.1136/bmjopen-2012-002230.
Stolee P, Stadnyk K, Myers AM, Rockwood K. An individualized approach to outcome measurement in geriatric rehabilitation. J Gerontol Ser A Biol Med Sci. 1999;54A(12):M641–M7. http://dx.doi.org/10.1093/gerona/54.12.M641.
Rockwood K, Stolee P, Howard K, Mallery L. Use of Goal Attainment Scaling to measure treatment effects in an anti-dementia drug trial. Neuroepidemiology. 1996;15(6):330–8.
Woodward CA, Santa-Barbara J, Levin S, Epstein NB. The role of goal attainment scaling in evaluating family therapy outcome. Am J Orthopsychiatry. 1978;48(3):464–76.
Cusick A, McIntyre S, Novak I, Lannin N, Lowe K. A comparison of goal attainment scaling and the Canadian Occupational Performance Measure for paediatric rehabilitation research. Pediatr Rehabil. 2006;9(2):149–57.
Gordon JE, Powell C, Rockwood K. Goal attainment scaling as a measure of clinically important change in nursing-home patients. Age Ageing. 1999;28(3):275–81.
Rockwood K, Stolee P, Fox RA. Use of goal attainment scaling in measuring clinically important change in the frail elderly. J Clin Epidemiol. 1993;46(10):1113–8.
Brown DA, Effgen SK, Palisano RJ. Performance following ability-focused physical therapy intervention in individuals with severely limited physical and cognitive abilities. Phys Ther. 1998;78(9):934–47. discussion 48–50.
Rockwood K, Joyce B, Stolee P. Use of goal attainment scaling in measuring clinically important change in cognitive rehabilitation patients. J Clin Epidemiol. 1997;50(5):581–8.
Ruble L, McGrew JH. Teacher and child predictors of achieving IEP goals of children with autism. J Autism Dev Disord. 2013;43(12):2748–63. http://dx.doi.org/10.1007/s10803-013-1884-x.
Ruble L, McGrew JH, Toland MD. Goal attainment scaling as an outcome measure in randomized controlled trials of psychosocial interventions in autism. J Autism Dev Disord. 2012;42(9):1974–83. http://dx.doi.org/10.1007/s10803-012-1446-7.
Ruble LA, McGrew JH, Toland MD, Dalrymple NJ, Jung LA. A randomized controlled trial of COMPASS web-based and face-to-face teacher coaching in autism. J Consult Clin Psychol. 2013;81(3):566–72. http://dx.doi.org/10.1037/a0032003.
Bovend’Eerdt TJ, Dawes H, Izadi H, Wade DT. Agreement between two different scoring procedures for goal attainment scaling is low. J Rehabil Med. 2011;43(1):46–9. http://dx.doi.org/10.2340/16501977-0624.
Steenbeek D, Meester-Delver A, Becher JG, Lankhorst GJ. The effect of botulinum toxin type a treatment of the lower extremity on the level of functional abilities in children with cerebral palsy: evaluation with goal attainment scaling. Clin Rehabil. 2005;19(3):274–82.
Hartman D, Borrie MJ, Davison E, Stolee P. Use of goal attainment scaling in a dementia special care unit. Am J Alzheimers Dis. 1997;12(3):111–6. http://dx.doi.org/10.1177/153331759701200303.
Khan F, Pallant JF, Turner-Stokes L. Use of goal attainment scaling in inpatient rehabilitation for persons with multiple sclerosis. Arch Phys Med Rehabil. 2008;89(4):652–9. http://dx.doi.org/10.1016/j.apmr.2007.09.049.
Rockwood K, Howlett S, Stadnyk K, Carver D, Powell C, Stolee P. Responsiveness of goal attainment scaling in a randomized controlled trial of comprehensive geriatric assessment. J Clin Epidemiol. 2003;56(8):736–43.
Turner-Stokes L, Williams H, Johnson J. Goal attainment scaling: does it provide added value as a person-centred measure for evaluation of outcome in neurorehabilitation following acquired brain injury? J Rehabil Med. 2009;41(7):528–35. http://dx.doi.org/10.2340/16501977-0383.
Turner-Stokes L, Baguley IJ, De Graaff S, Katrak P, Davies L, McCrory P, et al. Goal attainment scaling in the evaluation of treatment of upper limb spasticity with botulinum toxin: a secondary analysis from a double-blind placebo-controlled randomized clinical trial. J Rehabil Med. 2010;42(1):81–9. http://dx.doi.org/10.2340/16501977-0474.
Steenbeek D, Gorter JW, Ketelaar M, Galama K, Lindeman E. Responsiveness of Goal Attainment Scaling in comparison to two standardized measures in outcome evaluation of children with cerebral palsy. Clin Rehabil. 2011;25(12):1128–39. http://dx.doi.org/10.1177/0269215511407220.
Stevens A, Beurskens A, Köke A, van der Weijden T. The use of patient-specific measurement instruments in the process of goal-setting: a systematic review of available instruments and their feasibility. Clin Rehabil. 2013;0269215513490178.
Tennant A. Goal attainment scaling: current methodological challenges. Disabil Rehabil. 2007;29(20–21):1583–8.
Marson SM, Wei G, Wasserman D. A reliability analysis of goal attainment scaling (GAS) weights. Am J Eval. 2009;30(2):203–16.
Bottomley A, Jones D, Claassens L. Patient-reported outcomes: assessment and current perspectives of the guidelines of the food and drug administration and the reflection paper of the European medicines agency. Eur J Cancer. 2009;45(3):347–53.
Mokkink L, Terwee C, Patrick D, Alonso J, Strat-ford P, Knol D, et al. International consensus on taxonomy, terminology, and definitionsof measurement properties for health-related patientreportedoutcomes: results of the COSMIN study. J Clin Epidemiol.
Rockwood K, Graham JE, Fay S, Investigators A. Goal setting and attainment in Alzheimer's disease patients treated with donepezil. J Neurol Neurosurg Psychiatry. 2002;73(5):500–7.
Ashford S, Turner-Stokes L. Ma of shoulder and proximal upper limb spasticity using botulinum toxin and concurrent therapy interventions: a preliminary analysis of goals and outcomes. Disabil Rehabil. 2009;31(3):220–6. http://dx.doi.org/10.1080/09638280801906388.
Barden HL, Baguley IJ, Nott MT, Chapparo C. Dynamic computerised hand dynamometry: Measuring outcomes following upper limb botulinum toxin-A injections in adults with acquired brain injury. J Rehabil Med. 2014;46(4):314–20.
Barden HLH, Baguley IJ, Nott MT, Chapparo C. Measuring spasticity and fine motor control (pinch) change in the hand after botulinum toxin-a injection using dynamic computerized hand dynamometry. Arch Phys Med Rehabil. 2014;95(12):2402–9.
Bonouvrie LA, Becher JG, Vles JSH, Boeschoten K, Soudant D, de Groot V, et al. Intrathecal baclofen treatment in dystonic cerebral palsy: a randomized clinical trial: The IDYS trial. BMC Pediatr. 2013;13(1). http://dx.doi.org/10.1186/1471-2431-13-175.
Borg J, Ward AB, Wissel J, Kulkarni J, Sakel M, Ertzgaard P, et al. Rationale and design of a multicentre, double-blind, prospective, randomized, European and Canadian study: evaluating patient outcomes and costs of managing adults with post-stroke focal spasticity. J Rehabil Med. 2011;43(1):15–22. http://dx.doi.org/10.2340/16501977-0663.
Demetrios M, Gorelik A, Louie J, Brand C, Baguley IJ, Khan F. Outcomes of ambulatory rehabilitation programmes following Botulinum toxin for spasticity in adults with stroke. J Rehabil Med. 2014;46(8):730–7.
Ferrari A, Maoret AR, Muzzini S, Alboresi S, Lombardi F, Sgandurra G, et al. A randomized trial of upper limb botulimun toxin versus placebo injection, combined with physiotherapy, in children with hemiplegia. Res Dev Disabil. 2014;35(10):2505–13.
Fietzek UM, Schroeteler FE, Ceballos-Baumann AO. Goal attainment after treatment of parkinsonian camptocormia with botulinum toxin. Mov Disord. 2009;24(13):2027–8. http://dx.doi.org/10.1002/mds.22676.
Lam K, Lau KK, So KK, Tam CK, Wu YM, Cheung G, et al. Can botulinum toxin decrease carer burden in long term care residents with upper limb spasticity? A randomized controlled study. J Am Med Dir Assoc. 2012;13(5):477–84. http://dx.doi.org/10.1016/j.jamda.2012.03.005.
Lam K, Wong D, Tam CK, Wah SH, Myint MWWJ, Yu TKK, et al. Ultrasound and electrical stimulator-guided obturator nerve block with phenol in the treatment of Hip adductor spasticity in long-term care patients: a randomized, triple blind, placebo controlled study. J Am Med Dir Assoc. 2015;16(3):238–46.
Leroi I, Atkinson R, Overshott R. Memantine improves goal attainment and reduces caregiver burden in Parkinson's disease with dementia. Int J Geriatr Psychiatry. 2014;29(9):899–905.
Lowe K, Novak I, Cusick A. Low-dose/high-concentration localized botulinum toxin A improves upper limb movement and function in children with hemiplegic cerebral palsy. Dev Med Child Neurol. 2006;48(3):170–5.
Lowe K, Novak I, Cusick A. Repeat injection of botulinum toxin A is safe and effective for upper limb movement and function in children with cerebral palsy. Dev Med Child Neurol. 2007;49(11):823–9.
Mall V, Heinen F, Siebel A, Bertram C, Hafkemeyer U, Wissel J, et al. Treatment of adductor spasticity with BTX-A in children with CP: a randomized, double-blind, placebo-controlled study. Dev Med Child Neurol. 2006;48(1):10–3.
McCrory P, Turner-Stokes L, Baguley IJ, De Graaff S, Katrak P, Sandanam J, et al. Botulinum toxin A for treatment of upper limb spasticity following stroke: a multi-centre randomized placebo-controlled study of the effects on quality of life and other person-centred outcomes. J Rehabil Med. 2009;41(7):536–44. http://dx.doi.org/10.2340/16501977-0366.
Molenaers G, Fagard K, Van Campenhout A, Desloovere K. Botulinum toxin A treatment of the lower extremities in children with cerebral palsy. J Child Orthop. 2013;7(5):383–7.
Nott MT, Barden HL, Baguley IJ. Goal attainment following upper-limb botulinum toxin-A injections: Are we facilitating achievement of client-centred goals? J Rehabil Med. 2014;46(9):864–8.
Olesch CA, Greaves S, Imms C, Reid SM, Graham HK. Repeat botulinum toxin-A injections in the upper limb of children with hemiplegia: a randomized controlled trial. Dev Med Child Neurol. 2010;52(1):79–86. http://dx.doi.org/10.1111/j.1469-8749.2009.03387.x.
Rice J, Waugh MC. Pilot study on trihexyphenidyl in the treatment of dystonia in children with cerebral palsy. J Child Neurol. 2009;24(2):176–82. http://dx.doi.org/10.1177/0883073808322668.
Rockwood K, Fay S, Song X, MacKnight C, Gorman M. Video-imaging synthesis of treating Alzheimer's disease I. Attainment of treatment goals by people with Alzheimer's disease receiving galantamine: a randomized controlled trial. Cmaj. 2006;174(8):1099–105.
Rockwood K, Fay S, Jarrett P, Asp E. Effect of galantamine on verbal repetition in AD: a secondary analysis of the VISTA trial. Neurology. 2007;68(14):1116–21.
Rockwood K, Fay S, Gorman M, Carver D, Graham JE. The clinical meaningfulness of ADAS-Cog changes in Alzheimer's disease patients treated with donepezil in an open-label trial. BMC Neurol. 2007;7:26.
Rockwood K, Fay S, Gorman M. The ADAS-cog and clinically meaningful change in the VISTA clinical trial of galantamine for Alzheimer's disease. Int J Geriatr Psychiatry. 2010;25(2):191–201. http://dx.doi.org/10.1002/gps.2319.
Russo RN, Crotty M, Miller MD, Murchland S, Flett P, Haan E. Upper-limb botulinum toxin A injection and occupational therapy in children with hemiplegic cerebral palsy identified from a population register: a single-blind, randomized, controlled trial. Pediatrics. 2007;119(5):e1149–58.
Scheinberg A, Hall K, Lam LT, O'Flaherty S. Oral baclofen in children with cerebral palsy: a double-blind crossover pilot study. J Paediatr Child Health. 2006;42(11):715–20.
Schramm A, Ndayisaba J-P, Brinke M, Hecht M, Herrmann C, Huber M et al. Spasticity treatment with onabotulinumtoxin a: Data from a prospective german real-life patient registry. J Neural Transm. 2014(Pagination):No Pagination Specified. http://dx.doi.org/10.1007/s00702-013-1145-3.
Turner-Stokes L, Ashford S. Serial injection of botulinum toxin for muscle imbalance due to regional spasticity in the upper limb. Disabil Rehabil. 2007;29(23):1806–12.
Wallen MA, O'Flaherty SJ, Waugh MCA. Functional Outcomes of Intramuscular Botulinum Toxin Type A in the Upper Limbs of Children with Cerebral Palsy: A Phase II Trial. Arch Phys Med Rehabil. 2004;85(2):192–200. http://dx.doi.org/10.1016/j.apmr.2003.05.008.
Wallen M, O'Flaherty SJ, Waugh MC. Functional outcomes of intramuscular botulinum toxin type a and occupational therapy in the upper limbs of children with cerebral palsy: a randomized controlled trial. Arch Phys Med Rehabil. 2007;88(1):1–10.
Ward FA, Pulido-Velazquez M. Incentive pricing and cost recovery at the basin scale. J environ manage. 2009;90(1):293–313. http://dx.doi.org/10.1016/j.jenvman.2007.09.009.
Ward AB, Wissel J, Borg J, Ertzgaard P, Herrmann C, Kulkarni J, et al. Functional goal achievement in poststroke spasticity patients: The BOTOX® Economic Spasticity Trial (BEST). J Rehabil Med. 2014;46(6):504–13.
Bovend'Eerdt TJ, Dawes H, Sackley C, Izadi H, Wade DT. An integrated motor imagery program to improve functional task performance in neurorehabilitation: a single-blind randomized controlled trial. Arch Phys Med Rehabil. 2010;91(6):939–46.
Fisher K, Hardie RJ. Goal attainment scaling in evaluating a multidisciplinary pain management programme. Clin Rehabil. 2002;16(8):871–7.
Sheldon KM, Elliot AJ. Not all personal goals are personal: Comparing autonomous and controlled reasons for goals as predictors of effort and attainment. Personal Soc Psychol Bull. 1998;24(5):546–57. http://dx.doi.org/10.1177/0146167298245010.
We would like to thank René Spijker for his excellent help with the design of the literature search.
This research was funded by the EU FP7 program: EU FP7 HEALTH.2013.4.2-3 project Advances in Small Trials dEsign for Regulatory Innovation and eXcellence (Asterix): Grant 603160.
Availability of data and materials
The dataset supporting the conclusions of this article is included within the article and its additional files.
CMWG has set up and executed the study, and written the article. MCJ has co-written the article, was second reviewer in the abstract selection process and second reviewer in the data-extraction process, and has helped with the analysis and interpretation. SSW has co-written the article, was second reviewer in the abstract selection process and has helped with the analysis and interpretation. JHL has co-written the article, was second reviewer in the abstract selection process and second reviewer in the data-extraction process, and has designed and supervised the study. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Not applicable, as this study concerns literature only.
GAS search. This additional file is the complete search with all the terms that we used to come to the set of articles that we included. (PDF 354 kb)
Data extraction form GAS. This additional file is the complete data extraction form that we have used for the included articles. (PDF 158 kb)
About this article
Cite this article
Gaasterland, C.M.W., Jansen-van der Weide, M.C., Weinreich, S.S. et al. A systematic review to investigate the measurement properties of goal attainment scaling, towards use in drug trials. BMC Med Res Methodol 16, 99 (2016). https://doi.org/10.1186/s12874-016-0205-4
- Rare diseases
- Goal attainment scaling
- Drug trials
- Systematic review