Responsiveness of different pain measures and recall periods in people undergoing surgery after a period of splinting for basal thumb joint osteoarthritis
BMC Medical Research Methodology volume 22, Article number: 37 (2022)
Basal thumb joint osteoarthritis (OA) is a common painful condition of the hand often treated surgically if non-operative care does not provide sufficient pain relief. Many instruments are available to measure pain for this condition including single item and multidimensional measures. To inform our choice of instrument for the purpose of evaluating the value of surgery for people with thumb OA, the aim of this study was to compare the longitudinal validity and signal to noise ratio of a single item numeric rating scale (NRS) for pain and the Patient-rated Wrist and Hand Evaluation (PRWHE) pain subscale, and to assess if recall period affects longitudinal validity of the NRS pain and reported pain levels.
We invited 52 patients referred for surgical treatment of basal thumb joint OA to participate in this study. All wore a splint for six weeks followed by surgery. Pain during the past day, week, and month and the PRWHE were collected at baseline, operation day, and 3, 6, 9 and 12 months after surgery. Responsiveness was assessed with two methods: 1) using participant-reported global improvement and PRWHE function subscale as external anchors (longitudinal validity) and 2) comparing Standardized Response Means (SRM).
The Spearman’s ρ between PRWHE pain and participant-reported global improvement was better (0.71) compared with NRS past day (0.55), past week (0.62), or past month (0.59). Similar findings were found with PRWHE function as anchor (Pearson’s r for PRWHE pain 0.78; NRS past day 0.68; past week 0.73; past month 0.69). The SRM of PRWHE pain subscale (2.8) and NRS past week (2.9) outperformed pain past day (2.3) and month (2.4). Mean pain was 0.3 points (on a 0 to 10 scale) worse during past week when compared with past day and 0.3 worse during past month than during past week.
All studied pain measures captured the change in pain over time. For clinical trials, we recommend PRWHE pain subscale or NRS past week due to their better signal noise ratio.
Basal thumb joint osteoarthritis is one of the most common painful conditions of the hand typically affecting the quality of life of middle aged and elderly people [1,2,3,4,5]. Anti-inflammatory drugs, intra-articular glucocorticoid injections and various orthoses can be used for pain relief. These treatments, however, may not mitigate symptoms sufficiently, and in those cases surgery – most commonly removal of the trapezoid bone (trapeziectomy) – is used to relieve pain. Although observational data suggests this procedure relieves pain, there are no studies that compare surgery with non-operative treatment .
Since pain is a subjective experience, patient-reported outcomes are used to assess the effect of interventions . A systematic review assessing the outcomes used in basal thumb joint OA identified 101 different outcome measures, most frequently measuring pain and function domains . Pain can be evaluated with various measures. A single item Visual Analog Scale (VAS) and Numeric Rating Scale (NRS) are common methods for measuring OA thumb pain. However, single item instruments can only measure one dimension of pain, such as its intensity, and they cannot capture different aspects of the pain experience such as its fluctuation, frequency and quality . Another potential issue with pain questions is the recall period, which is often defined as current pain, pain during the past 24 h, or past week.
The difference between recalled pain and real time mean pain measurement is called recall bias . Recall bias is usually greater with longer recall periods, such as a week or a month, when compared with the past day . Furthermore, in musculoskeletal pain, studies suggest that the recall period of the question may influence the reported pain levels [11, 12] but it is not clear if this is a universal phenomenon. It is also unclear if the recall period (and related recall bias) can impact the longitudinal validity of the measure, i.e. how well the measure captures the change in the underlying construct.
The Patient-Rated Wrist and Hand Evaluation (PRWHE) is a hand-specific instrument which includes a pain subscale comprising five items; four measure pain intensity (at rest, when doing a task with a repeated wrist movement, when lifting a heavy object and when it is at its worst), and one measures pain frequency. In theory, the benefit of multiple items is that an overall pain score better captures all important aspects of a person’s pain. Furthermore, use of multiple items could improve the signal to noise ratio as the random errors of different responses cancel each other to some extent when the total score is summed from several items. These properties could improve the ability to measure the change. However, the evidence informing whether multi-item PRWHE is better choice for clinical trials compared with single item NRS is limited.
The aim of this study was to determine 1) if change in PRWHE pain subscale correlates better than a single item pain NRS with participant-reported global improvement or hand function following treatment (i.e. compare longitudinal validity of NRS and PRWHE pain); 2) if the PRWHE pain subscale has a better signal to noise ratio than the single item pain NRS; and 3) if the recall period affects the longitudinal validity of single item NRS or 4) self-reported mean pain levels of a single item NRS pain in an observational cohort of participants who underwent sequential splinting then surgical intervention for basal thumb joint OA.
Study design and setting
The study was a multi-center prospective observational cohort study. We recruited participants from one secondary and two tertiary referral centers between August 2017 and November 2018. All participants had been referred for operative treatment for symptomatic basal thumb joint OA that had not responded to non-operative treatment. Helsinki University Hospital Institutional Review Board approved the study protocol before commencement of the study. All methods were carried out in accordance with relevant guidelines and regulations.
In people being treated for basal thumb joint OA, our null-hypotheses were:
The longitudinal validity of the PRWHE pain subscale and a single item pain NRS are not significantly different (measured as the correlation between target instrument and participant-reported global improvement or hand function improvement)
The signal to noise ratio (expressed as a standardized response mean, SRM) of the PRWHE pain subscale is not significantly different from the NRS.
The length of the recall period does not affect the longitudinal validity of the pain NRS (correlation between improvement in pain and participant-reported global improvement or hand function improvement)
Difference in the length of the recall period does not affect the reported level of pain.
All people referred for basal thumb joint OA were screened by a board-certified hand surgeon. If they were dissatisfied with the pain and/or function of the hand, and accepted the risks related to surgery, standard surgical care (trapeziectomy) was offered and they were informed about the study. Recruitment occurred after the decision to operate was made.
Basal thumb (carpometacarpal) joint OA (Eaton-Littler grades 2-4)
> 45 years of age
Scheduled for trapeziectomy
Systemic inflammatory disease
Neurological disease that weakens upper arm muscles
Any operation of the same hand during the past 6 months
Bilateral operation planned
Zig zag deformity (> 45 degrees of thumb metacarpophalangeal (MP) joint hyperextension in rest)
After recruitment, all participants were given an Actimove® Rhizo Forte splint (Essity AB, Stockholm, Sweden), and they were instructed to use it continuously for six weeks. After splinting period, all participants underwent a simple trapeziectomy. The trapezoid bone was exposed using a volar approach and removed in a piecemeal fashion. The space between first metacarpal and scaphoid was carefully examined for any bone remnants before the remaining joint capsule was closed. We did not use any interposition or suspensionplasty. After skin closure, the thumb was immobilized with a cast in palmar abduction/opposition for 3 weeks. At 3 weeks, the cast was removed and participants were provided with a post-operative exercise protocol that they could perform at home daily. They were also instructed to use the orthosis in daily activities as needed for the next three weeks and after that only if needed to control the pain.
The baseline characteristics were recorded after the participants had signed the informed consent. Questionnaires were sent by mail to all participants. The outcome measures were collected at baseline, after 6 weeks of splinting, and then 3 months, 6 months, 9 months and 12 months post-surgery.
Pain was measured by two instruments:
a single item 11-point NRS where 0 = no pain and 10 = worst possible pain. The same question was posed three times with different recall periods: “How would you describe the pain in your thumb during the past day / during the past week / during the past month while doing daily activities, work and/or hobbies?”. The intraclass correlation coefficient (ICC) in NRS with knee OA patients is 0.95, whereas standard error of measurement (SEM) is 0.48 and minimum detectable difference (MDD) 1.33 . The minimal important difference (MID) is 2.0 with chronic musculoskeletal patients .
the PRWHE pain subscale which comprises five items as described above. The four items assessing pain intensity are measured on an 11-point NRS scale where 0 = no pain and 10 = worst possible pain. The recall period is one week. The fifth item assessing pain frequency is measured on an 11-point NRS scale where 0 = never to 10 = always. The pain subscale is the sum of the five items and ranges from 0 (best) to 50 (worst). PRWHE is the same instrument as the PRWE  but the word ‘wrist’ is replaced by ‘hand’. PRWE has been validated in Finnish .
All pain items were in the same order in the questionnaires: day, week and month.
The anchor measures in the assessment of longitudinal validity were
participant-reported global improvement measured by a 5-point Likert scale (much better; little better; no change; little worse; much worse).
the PRWHE function subscale which comprises 10 items rating the difficulty in doing specific tasks during the past week on a NRS scale from 0 = not difficult to 10 = unable to do. The PRWHE function subscale is the sum of the 10 items and ranges from 0 (no disability) to 100 (worst disability). Systematic review found an ICC of 0.85 for PRWHE , SEM was 2 points with various hand surgery patients,  MDD was 11 points, and minimal clinically important difference (MCID) was 11.5 points in patients with distal radius fracture .
Change scores were calculated by subtracting baseline score from the follow-up scores. We changed the direction of the pain scales so that a positive value corresponds with improvement in pain. We used measurements from all time points for the longitudinal validity assessment (50 target measure–anchor measure data pairs at five follow-up points = 250 data pairs), and also for the comparison of different recall periods of the single item NRS (6 time points * 3 question * 50 participants = 900 observations).
To compare the longitudinal validity of the three pain NRS recall periods (past day, week, month) and the PRWHE pain subscale, we calculated Spearman correlations between the target measures and participant-reported global improvement; and Pearson correlations with the target measures and change in PRWHE function subscale. Confidence intervals for the correlations were derived by bootstrapping 1000 samples. We then compared these correlations between measures using the method described by Steiger .
To assess the signal/noise to ratio, SRMs were calculated as change in score / SD of change for all pain outcomes using data from baseline and at 12 months after surgery. We compared the SRMs of the single item pain NRS scores and the PRWHE pain subscale scores using a “modified Jack-knife test” similar to Angst et al. .
We estimated the difference in the reported pain in different recall periods using a linear mixed model. Participant was entered as random factor and the question type (pain during past day, week, or month) and time points as fixed factors. We also fitted another model assessing if the time from recruitment modified the effect (time point*question interaction). Estimated marginal means from the mixed model were used as mean values. In all analyses, p-values less than 0.05 were considered statistically significant.
In total, we recruited 52 participants. Two participants did not attend any of the follow-up visits and thus complete data were available for 50 (96%) participants at all follow-up points. Their demographic data and outcome variables at baseline are shown in Table 1. The study population was predominantly female (75%). Most of the participants (80%) had Eaton-Littler grade 3 or 4 basal thumb joint OA.
The PRWHE pain subscale displayed slightly better longitudinal validity (correlation with both patient-reported global improvement and change in PRWHE function) compared with any of the single item pain NRSs (Table 2). Recall period for the single item NRS did not influence its longitudinal validity (p > 0.05 for all comparisons). The SRM of the PRWHE pain scale (2.8) was similar to the single item pain NRS with a recall period of during the past week (2.9), but better than the NRS with recall periods of during the past day (2.3) and during the past month (2.4) (Table 3).
Effect of NRS recall period
Pain did not change during the 6-week splinting period, but it decreased steadily after surgery. (Fig. 1). The duration of the recall period affected pain NRS values significantly; the longer the recall period, the higher the reported pain (Table 4). The 0.3-point difference between day and week and between week and month was consistent across all time points (Fig. 1). The duration of follow-up did not modify this difference (time*question interaction; day versus week p = 0.8; day versus month p = 0.8).
The PRWHE pain subscale was slightly more sensitive to change, i.e., it had better longitudinal validity compared with a single item pain NRS, but all measures seemed to capture change in pain well. On the other hand, PRWHE pain subscale and NRS past week had best signal to noise ratio translating into greater precision of treatment estimates in clinical trials making them best options for research use. Longer recall periods resulted in small increments in reported pain levels with NRS, but the recall period did not impact the longitudinal validity of the measure.
Sensitivity to detect change in a measured construct can be assessed by 1) comparing change to an external anchor (‘external responsiveness’ or longitudinal validity), i.e. measuring the extent to which change relates to a reference measure of health status; or 2) calculating varying signal to noise ratios (‘internal responsiveness’), that characterize the ability to measure change in relation to the variability of the measurement (SRM in this study) . The PRWHE pain subscale seems to outperform single NRS pain items in external responsiveness and is better than pain during past day and month in internal responsiveness while the benefit of NRS is acceptable longitudinal validity with less response burden.
Our finding agrees with a previous study comparing the Multidimensional Pain Inventory (MPI) with a single item pain NRS in which the MPI showed better responsiveness . On the other hand, a more comprehensive evaluation does not automatically result in better responsiveness: the McGill pain questionnaire, which has 49 items including several items for pain quality and intensity, was less responsive when compared with a single pain VAS in people with low back pain . The PRWHE pain subscale measures pain intensity (4 items) and pain frequency (1 item), and it seems to capture important aspects of thumb pain more comprehensively than a single item pain NRS.
The data do not shed light on the mechanisms regarding the recall period of pain, but the results are in line with other studies assessing recall bias, which have also found lower pain values with shorter recall periods [11, 12]. Evidence also suggests, that as the recall period grows longer, the reported pain values tend to correlate less with the real time pain measurements, indicative of recall bias . This bias likely relates to how people construct the past in their mind – recalled pain seems to correlate best with the worst pain experienced and pain at the end of the period rather than the average pain over the whole time period [24, 25]. Arthritic pain intensity typically fluctuates and it is more likely that a person experiences peaks in longer recall periods compared with shorter periods. Thus, it is plausible that participants reported, on average, slightly higher pain during the past week and past month when compared with past day.
The main limitation of this study is external validity. All participants had been referred to a hand surgical center due to persisting painful basal thumb joint OA and it is unclear if the results can be directly applied to other hand joints or patients with less severe symptoms. However, as pain recall bias is a psychological phenomenon, we don’t see any reason why the results would markedly differ in other OA joints, but further investigation is warranted. We also did not assess how an overall pain value (a single item pain NRS without a specific recall period) performs, and the anchor questions can also be subject to recall bias . Also, the order of NRS pain questions was not randomized and this could impact their relative values.
We also did not record real time pain in order to assess which measurement is least biased from real time values. There is no gold standard for pain measurement and it is unclear if real time pain is more important compared with recalled pain but people seem to put more weight on the recalled pain than currently experienced pain in decision making .
Of the tested measures, we recommend PRWHE pain subscale or NRS past week for clinical trials. The slightly lower longitudinal validity of NRS past week can be weighed against the benefit of having lower response burden, at least in clinical practice when a multiple item questionnaire is not feasible. Since the recall period does not seem to impact the longitudinal validity of the measurement, it is not an important issue within a trial, but the difference in the absolute level caused by recall period may be factored in when absolute values are compared across trials with different pain recall periods.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
- NRS pain:
Numeric pain rating scale
Patient-rated Wrist and Hand Evaluation
Standardized Response Means
Visual Analog Scale
Patient-Rated Wrist Evaluation
Multidimensional Pain Inventory
Intraclass Correlation Coefficient
Standard error of measurement
minimum detectable difference
The minimal important difference
minimal clinically important difference
Armstrong AL, Hunter JB, Davis TRC. The prevalence of degenerative arthritis of the base of the thumb in post-menopausal women. J Hand Surg Am. 1994. https://doi.org/10.1016/0266-7681(94)90085-X.
Van Heest AE, Kallemeier P. Thumb carpal metacarpal arthritis. J Am Acad Orthop Surg. 2008.
Wolf JM, Turkiewicz A, Atroshi I, Englund M. Prevalence of doctor-diagnosed thumb carpometacarpal joint osteoarthritis: an analysis of Swedish health care. Arthritis Care Res. 2014.
Haara MM, Manninen P, Kröger H, Arokoski JPA, Kärkkäinen A, Knekt P, et al. Osteoarthritis of finger joints in Finns aged 30 or over: prevalence, determinants, and association with mortality. Ann Rheum Dis. 2003. https://doi.org/10.1136/ard.62.2.151.
Haugen IK, Bøyesen P, Slatkowsky-Christensen B, Sesseng S, Van Der HD, Kvien TK. Associations between MRI-defined synovitis, bone marrow lesions and structural features and measures of pain and physical function in hand osteoarthritis. Ann Rheum Dis. 2012. https://doi.org/10.1136/annrheumdis-2011-20034.
Wajon A, Vinycomb T, Carr E, Edmunds I, Ada L. Surgery for thumb (trapeziometacarpal joint) osteoarthritis. Cochrane Database Syst Rev. 2015;2017. https://doi.org/10.1002/14651858.CD004631.pub4.
Visser AW, Bøyesen P, Haugen IK, Schoones JW, Van Der HDM, Rosendaal FR, et al. Instruments measuring pain, physical function, or patient’s global assessment in hand osteoarthritis: a systematic literature search. J Rheumatol. 2015;42:2118–34. https://doi.org/10.3899/jrheum.141228.
Marks M, Schoones JW, Kolling C, Herren DB, Goldhahn J, Vliet Vlieland TPM. Outcome measures and their measurement properties for trapeziometacarpal osteoarthritis: a systematic literature review. J Hand Surg Eur. 2013;38:822–38. https://doi.org/10.1177/1753193413488301.
Hawker GA, Mian S, Kendzerska T, French M. Measures of adult pain: visual analog scale for pain (VAS pain), numeric rating scale for pain (NRS pain), McGill pain questionnaire (MPQ), short-form McGill pain questionnaire (SF-MPQ), chronic pain grade scale (CPGS), short Form-36 bodily pain scale. Arthritis Care Res. 2011;63:240–52. https://doi.org/10.1002/acr.20543.
Schoth DE, Radhakrishnan K, Liossi C. A systematic review with subset meta-analysis of studies exploring memory recall biases for pain-related information in adults with chronic pain. Pain Reports. 2020;5:32440609. https://doi.org/10.1097/PR9.0000000000000816.
Giske L, Sandvik L, Røe C. Comparison of daily and weekly retrospectively reported pain intensity in patients with localized and generalized musculoskeletal pain. Eur J Pain. 2010. https://doi.org/10.1016/j.ejpain.2010.02.011.
Broderick JE, Schwartz JE, Vikingstad G, Pribbernow M, Grossman S, Stone AA. The accuracy of pain and fatigue items across different reporting periods. Pain. 2008;139:146–57. https://doi.org/10.1016/j.pain.2008.03.024 Epub 2008 May 1.
Alghadir AH, Anwer S, Iqbal A, Iqbal ZA. Test-retest reliability, validity, and minimum detectable change of visual analog, numerical rating, and verbal rating scales for measurement of osteoarthritic knee pain. J Pain Res. 2018.
Salaffi F, Stancati A, Silvestri CA, Ciapetti A, Grassi W. Minimal clinically important changes in chronic musculoskeletal pain intensity measured on a numerical rating scale. Eur J Pain. 2004.
MacDermid JC, Tottenham V. Responsiveness of the disability of the arm, shoulder, and hand (DASH) and patient-rated wrist/hand evaluation (PRWHE) in evaluating change after hand therapy. J Hand Ther. 2004;17:18–23. https://doi.org/10.1197/j.jht.2003.10.003.
Sandelin H, Jalanko T, Huhtala H, Lassila H, Haapala J, Helkamaa T. Translation and validation of the finnish version of the patient-rated wrist evaluation questionnaire (PRWE) in patients with acute distal radius fracture. Scand J Surg. 2016;105:204–10. https://doi.org/10.1177/1457496915613649 Epub 2015 Oct 26.
Mehta SP, Macdermid JC, Richardson J, Macintyre NJ, Grewal R. A systematic review of the measurement properties of the patient-rated wrist evaluation. J Orthop Sports Phys Ther. 2015.
Finsen V, Hillesund S, Fromreide I. The reliability of remembered pre-operative patient-rated wrist and hand evaluation (PRWHE) scores. Orthop Rev (Pavia). 2018.
Walenkamp MMJ, de Muinck Keizer RJ, Goslings JC, Vos LM, Rosenwasser MP, Schep NWL. The minimum clinically important difference of the patient-rated wrist evaluation score for patients with distal radius fractures. Clin Orthop Relat Res. 2015.
Steiger JH. Tests for comparing elements of a correlation matrix. Psychol Bull. 1980;87:245–51. https://doi.org/10.1037/0033-2909.87.2.245.
Angst F, Verra ML, Lehmann S, Aeschlimann A. Responsiveness of five condition-specific and generic outcome assessment instruments for chronic pain. BMC Med Res Methodol. 2008;8:1–8. https://doi.org/10.1186/1471-2288-8-26.
Husted JA, Cook RJ, Farewell VT, Gladman DD. Methods for assessing responsiveness: a critical review and recommendations. J Clin Epidemiol. 2000;53(5):459–68. https://doi.org/10.1016/s0895-4356(99)00206-1.
Scrimshaw SV, Maher C. Responsiveness of visual analogue and McGill pain scale measures. J Manip Physiol Ther. 2001;24(8):501–4. https://doi.org/10.1067/mmt.2001.118208.
Redelmeier DA, Kahneman D. Patients’ memories of painful medical treatments: real-time and retrospective evaluations of two minimally invasive procedures. Pain. 1996;66:3–8. https://doi.org/10.1016/0304-3959(96)02994-6.
Kahneman D, Fredrickson BL, Schreiber CA, Redelmeier DA. When more pain is preferred to less: adding a better end. Psychol Sci. 1993;4:401–5.
Kamper SJ, Ostelo RW, Knol DL, Maher CG, de Vet HC, Hancock MJ. Global perceived effect scales provided reliable assessments of health transition in people with musculoskeletal disorders, but ratings are strongly influenced by current status. J Clin Epidemiol. 2010;63(7):760–6. https://doi.org/10.1016/j.jclinepi.2009.09.009 Epub 2010 Jan 8.
RB is supported by an Australian National Health and Medical Research Council (NHMRC) Investigator Fellowship (APP1994483).
Ethics approval and consent to participate
The study was performed in accordance with the ethical standards of Declaration of Helsinki. Helsinki University Hospital Institutional Review Board approved the study protocol before commencement of the study. All participants gave their informed consent prior to their inclusion in the study.
Consent for publication
The authors declare that they have no competing interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Pajari, J., Jokihaara, J., Waris, E. et al. Responsiveness of different pain measures and recall periods in people undergoing surgery after a period of splinting for basal thumb joint osteoarthritis. BMC Med Res Methodol 22, 37 (2022). https://doi.org/10.1186/s12874-022-01527-7