The reliability of assigning individuals to cognitive states using the Mini Mental-State Examination: a population-based prospective cohort study
© Marioni et al; licensee BioMed Central Ltd. 2011
Received: 9 November 2010
Accepted: 6 September 2011
Published: 6 September 2011
Previous investigations of test re-test reliability of the Mini-Mental State Examination (MMSE) have used correlations and statistics such as Cronbach's α to assess consistency. In practice, the MMSE is usually used to group individuals into cognitive states. The reliability of this grouping (state based approach) has not been fully explored.
MMSE data were collected on a subset of 2,275 older participants (≥ 65 years) from the population-based Medical Research Council Cognitive Function and Ageing Study. Two measurements taken approximately two months apart were used to investigate three state-based categorisations. Descriptive statistics were used to determine how many people remained in the same cognitive group or went up or down groups. Weighted logistic regression was used to identify predictive characteristics of those who moved group.
The proportion of people who remained in the same MMSE group at screen and follow-up assessment ranged from 58% to 78%. The proportion of individuals who went up one or more groups was roughly equal to the proportion that went down one or more groups; most of the change occurred when measurements were close to the cut-points. There was no consistently significant predictor for changing cognitive group.
A state-based approach to analysing the reliability of the MMSE provided similar results to correlation analyses. State-based models of cognitive change or individual trajectory models using raw scores need multiple waves to help overcome natural variation in MMSE scores and to help identify true cognitive change.
KeywordsMMSE reliability test-retest ageing elderly
The Mini-Mental State Examination (MMSE) was developed in 1975 as a brief tool to measure global cognitive function . It contains nineteen items on orientation, registration, attention and calculation, recall, language, and praxis, and is scored from 0 to 30. It is primarily used as a screening test for dementia with scores below 24 commonly used to indicate a cognitive deficit. A 1998 review of the MMSE noted that it has a ceiling effect in young healthy adults and a floor effect in older, severely impaired adults ; ceiling and floor effects of the have also been discussed in detail elsewhere [3–5]. It has also been shown that MMSE scores are affected by age and education .
Despite its intrinsic limitations for measuring subtle change in ability, the MMSE is frequently used to measure cognitive change over time. Several studies have measured change as the difference in two scores [6, 7] whereas others have used data from multiple waves [8, 9]. When monitoring cognitive test scores over time it is desirable to account for natural variation from measurement error and test re-test reliability. Test re-test reliability of the MMSE has been investigated to a limited extent despite being of potential importance in the application of cut-points to categorise individuals for many purposes such as eligibility for medication or care support. Grouping of the MMSE variable is used in policy with dementia treatment being given to selected subgroups . However, if an individual is assigned to a treatment group based on a single MMSE measure, it is vital to know how reliable such a measure is. This also applies to clinical research where MMSE cut-points are commonly used to select or reject individuals from a study or treatment regimen.
A review paper of studies analysing MMSE test-retest reliability described moderate to high correlations between measures . However, it is debatable whether these are the most appropriate assessments of agreement. Correlations will measure association but not necessarily agreement . Similarly, reliability as measured by Cronbach's α also relies on the calculation of intercorrelations between the two or more measures being analysed. For example, if everyone in a cohort had a one point increase in MMSE score between baseline and follow-up then the correlation between the two measures would be 1. This would imply association but not agreement. In an approach using MMSE groupings, if all individuals again scored an additional MMSE point between waves, many would remain in the same MMSE group giving better scope to measure agreement.
A statistical issue to consider when using the MMSE as a screening tool for further assessment of a sub-group of participants is regression to the mean. This phenomenon occurs when there is imperfect correlation between two measures . For example, in a test re-test situation where scores at both testing occasions have the same mean and variance, the group of individuals attaining a particular score at baseline will be expected to average a score that is closer to the mean at re-test. This may account for much of the apparent cognitive decline in people with high initial scores on the MMSE.
Whilst many studies split MMSE scores into groups before analysing, the short term reliability of these groupings and the potential for misclassification has not been studied in detail. The aim of this study was to investigate the reliability of a single measure of MMSE group, as used in clinical practice, by investigating the reliability of two measures taken a short time apart to minimise the potential for cognitive decline. MMSE groupings were defined using three different criteria and the study was population-based using data on 2,275 individuals from five sites across England and Wales.
Data came from the Medical Research Council Cognitive Function and Ageing Study (MRC CFAS) . Briefly, MRC CFAS is a multi-centre study on over 18,000 persons from across six centres in England and Wales; five of the centres have the same standardised design. These centres used a two-phase sampling design with a screening interview followed by an assessment interview. Participants were selected from Family Health Service Authority lists and were stratified by age to include persons aged 65 years and over at the index date for each centre and living within a specified geographical area. The study began in the late 1980s; baseline interviews took place between 1989 and 1993.
In this study data were used from the five centres with a standardised design: Cambridgeshire, Gwynedd, Newcastle, Nottingham, and Oxford (total n = 13,004). The population under investigation contained individuals who were cognitively assessed at the baseline screening interview or the assessment interview around two months later (n = 2,640, both tests were completed by 2,275 participants). The population invited to the assessment interview was weighted towards those in a potentially frail cognitive state (identified using details from the screen interview, including MMSE scores) although all levels of ability were represented. For full details of the questionnaires used at the screen and assessment waves please see http://www.cfas.ac.uk.
The Mini-Mental State Examination (MMSE)  was administered to participants at both the screen and assessment interviews. The version of the MMSE used in this study included serial sevens, but not spelling 'world' backwards . The words to repeat and recall were 'apple, penny, table' at screen, and 'tree, clock, boat' at assessment. Items that could not be answered due to sensory or mobility problems were considered failed, all other items that were not answered were kept as missing data . Incomplete MMSE scores tend to come from individuals who are severely cognitively impaired.
MMSE scores range from 0-30 and there have been several definitions proposed to categorise these scores into cognitive states. The three definitions used in this paper were suggested by MRC CFAS, Tombaugh and McIntyre  and Folstein et al. . The MRC CFAS categorisation was based on the ROC curve findings from Figure One of Stephan et al. 2010 , which showed the MMSE to be as accurate as other diagnostic definitions of Mild Cognitive Impairment in predicting future risk of dementia. The graph indicated MMSE groupings as follows: < 18 (severe impairment), 18-22 (moderate impairment), 23-26 (slight impairment), 27-30 (no impairment). Folstein et al. who devised the MMSE  also recommended splitting the MMSE scores into four groups (< 11 severe impairment, 11-20 moderate impairment, 21-26 mild impairment, 27-30 no impairment) while Tombaugh and McIntyre's seminal review reported a trend towards a three group categorisation (< 18 severe impairment, 18-23 mild impairment, 24-30 no impairment).
Interviewers at both screen and assessment had a range of backgrounds, mainly professions allied to medicine. These included psychologists, psychiatrists, registered nurses and others with similar backgrounds. All interviewers received identical training from the CFAS study co-ordinators. Wording, prompting and feedback were all strictly controlled by a combination of training and computer assisted interviewing. Monitoring of the quality and consistency of interviews was carried out to ensure comparability both within and between centres through observation, role play, and analysis of audiotapes of interviews in the field. Interviews took place in the respondents' homes.
MMSE scores were categorised into groups, which were relabelled in ascending order from 1 (low cognition) to 4 (high cognition) (or 1 to 3). Cognitive change was measured by subtracting the assessment group number from the screen group number. This created a scoring range of -3 to 3 (or -2 to 2) where 0 represented no change in group. Descriptive statistics were used to compare the classification performance of each categorisation method.
To determine whether baseline cognitive score had an effect on cognitive change, weighted logistic regression was used to test for differences between those who changed group compared to those who did not. Age, sex, and study centre were entered as covariates along with the MMSE score from the screen interview and the duration in months between screen and assessment interviews. Inverse probability weights were calculated using logistic regression-study participation was regressed on age, sex, screening MMSE score, and GMS-AGECAT (Geriatric Mental State-Automated Geriatric Examination for Computer Assited Taxonomy), which is a computerised diagnostic system that can be used to diagnose dementia. This enabled the cohort under investigation to be back-weighted to the original population-based cohort of 13,004 individuals. All analyses were conducted in R version 2.10.1 .
Characteristics of the CFAS analysis population with valid MMSE scores at baseline screen and assessment
Total sample (n = 13,004)
Analysis sample (n = 2,275)
Age group (years)
Education < 9 years
Social class grouping-manual
Baseline/screen MMSE < 21
Days between screen and assessment*
Tombaugh and McIntyre categorisation
Classification of MMSE states at screen and assessment waves*
Tombaugh and McIntyre criteria
MMSE at assessment
MMSE at screen
Folstein et al. criteria
MMSE at assessment
MMSE at screen
MRC CFAS criteria
MMSE at assessment
MMSE at screen
The proportion of participants classified in the same cognitive group was 66%. A similar proportion of people moved either up (19%) or down (16%) one group with very few moving two or more groups (1%). When comparing the actual changes in cognitive scores as opposed to the changes by group, 95% of people who stayed in the same group at assessment were within three points of their initial MMSE (results not shown). For those who moved up or down one cognitive group, 56% were within three points of their initial MMSE score.
MRC CFAS categorisation
The distribution of change in cognitive category is shown in Table 2. The data were symmetrical about the participants who remained in the same cognitive group (58%). Approximately 40% of the sample went up (21%) or down (18%) one cognitive group whilst ~2% moved by more than one group. The distribution of actual difference in cognitive scores showed that the majority of people who stayed in the same group scored within three points of their initial MMSE score (98%, results not shown). For those who moved up or down one cognitive group, the majority were also within three points of their initial MMSE score (63%).
Logistic regression output
Weighted logistic regression output for no change versus change in cognitive group
Weighted Logistic Regression Odds Ratios and 95% Confidence Intervals
Tomabugh and McIntyre
Folstein et al.
1.37 (1.01, 1.85) *
1.16 (0.87, 1.54)
1.17 (0.89, 1.53)
1.81 (1.32, 2.49)†
1.22 (0.89, 1.67)
1.30 (0.96, 1.74)
1.55 (1.08, 2.23) *
1.34 (0.96, 1.86)
1.21 (0.88, 1.67)
1.72 (1.11, 2.67) *
1.12 (0.74, 1.69)
1.12 (0.76, 1.65)
0.85 (0.41, 1.72)
0.97 (0.53, 1.77)
0.56 (0.31, 1.02)
1.01 (0.81, 1.26)
0.91 (0.74, 1.13)
0.93 (0.76, 1.13)
1.08 (0.77, 1.51)
1.30 (0.94, 1.79)
1.28 (0.94, 1.75)
0.75 (0.53, 1.06)
0.91 (0.65, 1.28)
0.96 (0.70, 1.32)
1.16 (0.82, 1.64)
1.43 (1.01, 2.00) *
1.39 (1.01, 1.93) *
0.89 (0.63, 1.27)
1.08 (0.76, 1.52)
1.10 (0.79, 1.54)
Screen MMSE score
0.86 (0.84, 0.88)†
0.99 (0.97, 1.02)
0.95 (0.93, 0.97)†
Months between screen
1.05 (0.99, 1.11)
1.00 (0.95, 1.05)
1.00 (0.95, 1.05)
This study investigated the reliability of the Mini-Mental State Examination (MMSE) using three state-based categorisations on 2,275 older persons from a population-based study from five sites across England and Wales. The number of individuals classified in the same state two months after an initial screen assessment varied from 57% (MRC CFAS), to 65% (Folstein et al.), to 78% (Tombaugh and McIntyre). The proportions of participants who went either up or down a single group were similar with a minimal number moving up or down more than one group. The reliability of state-based groupings is moderate-to-good and similar to statistics obtained from correlation or Cronbach-α analyses.
There was no significant predictor of changing group across all three models although higher original MMSE scores were associated with reduced change in the MRC CFAS and Tombaugh and McIntyre classifications. This inverse association in the former case was very weak whilst in the latter case it is most likely due to the large range of values lying within their non-impaired state (MMSE score between 24 and 30).
The greatest reliability was found using the Tombaugh criteria although this had much to do with their classification method using three cognitive groups as opposed to four. Indeed, there is very little difference between the four-state approaches. The slightly poorer performance of the MRC CFAS classification is most likely due to the use of smaller bands for the cognitive groupings at the higher level of scoring-where most of the data points lie in the general population. This again implies that most of the change occurs around the cut-points-an issue raised by Van Den Hout and Matthews who split cognition into two groups based around a cut-point between 21 and 22 for a two-state illness-death multi-state model .
It is common for MMSE scores less than 18 to be used as an indication of severe impairment in healthy populations. How the MMSE is categorised at its upper levels is more contentious, particularly with regards to attempts to identify individuals with MCI. Recent studies have shown there to be many different definitions of MCI  with progression rates to dementia dependent on which scale has been used . It has been shown that an MMSE group between 23 and 26 performs as well as other, more complex methods of MCI classification in prediction of future dementia . This justifies its place as a valuable tool in the assessment of cognitive ability and highlights the importance of understanding its reliability. It also highlights the usefulness of the MRC CFAS criteria applied in this paper where one of the groups contained MMSE scores between 23 and 26.
The strengths of the investigation include the application of two commonly applied MMSE categorisation models along with the MRC CFAS groupings to a large population-based sample of older persons. In addition to this being the first time that state-based variation of the MMSE has been investigated over a short follow-up period, we also looked at actual variation about scores-most were found to lie within three points of each other. A previous analysis that examined differences by MMSE groupings found a regression to the mean effect . However, the elapsed time between interviews was five years-a period too long to assess test-retest reliability in older people as actual cognitive change is likely to have occurred during this time.
A potential limitation of the study was the duration of time between the cognitive measures and the age of participants in the study. However, the former was not significant in any of the logistic regression models that attempted to identify those who changed group, and in addition a sensitivity analysis using a cut-point of 60 days between screen and assessment showed the same effects. There was some inconsistent evidence of an association between changing group and age; with younger people were more likely to move group. This may have an impact upon the frequency of testing required to identify an 'at risk' population of younger participants. A limitation of using MMSE groups for analysing cognitive change in population-based studies is that the MMSE ceiling effect makes it difficult to assess successful cognitive ageing. However, this problem is also present in non state-based MMSE models. Finally, reliable change indices (RCIs) can also be used to assess cognitive change over time whilst adjusting for measurement error, practice effects, and regression to the mean [19, 20]. However, the current analysis is motivated by the assignment of individuals to cognitive groups based on a single MMSE score. Future analyses will use the MRC CFAS state classification to assess longitudinal decline in abilities.
Compared to correlation and Cronbach α statistics, a state-based approach to analysing the MMSE provides similar estimates of its reliability. However, the large proportion of participants with test re-test scores within three points of each other suggests that a state-based approach to modelling cognitive change using MMSE scores may help avoid bias in the form of regression to the mean. State-based models are therefore an ideal analysis tool when assessing longitudinal cognitive change using the MMSE.
We thank the CFAS population, their families, and carers for their participation. This work was supported by a Medical Research Council project grant (MRC G9901400). Fiona Matthews is funded under MRC programme grant UC_US_A030_0031. Riccardo Marioni is an Alzheimer's Research UK Fellow (ART-RF2010-2). The study is part of the Cambridge and Peterborough CLAHRC.
- Folstein MF, Folstein SE, McHugh PR: "Mini-mental state". A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res. 1975, 12 (3): 189-198. 10.1016/0022-3956(75)90026-6.View ArticlePubMedGoogle Scholar
- Brayne C: The mini-mental state examination, will we be using it in 2001?. Int J Geriatr Psychiatry. 1998, 13 (5): 285-290. 10.1002/(SICI)1099-1166(199805)13:5<285::AID-GPS753>3.0.CO;2-V.View ArticlePubMedGoogle Scholar
- Crum RM, Anthony JC, Bassett SS, Folstein MF: Population-based norms for the Mini-Mental State Examination by age and educational level. JAMA. 1993, 269 (18): 2386-2391. 10.1001/jama.269.18.2386.View ArticlePubMedGoogle Scholar
- Franco-Marina F, Garcia-Gonzalez JJ, Wagner-Echeagaray F, Gallo J, Ugalde O, Sanchez-Garcia S, Espinel-Bermudez C, Juarez-Cedillo T, Rodriguez MA, Garcia-Pena C: The Mini-mental State Examination revisited: ceiling and floor effects after score adjustment for educational level in an aging Mexican population. Int Psychogeriatr. 2010, 22 (1): 72-81. 10.1017/S1041610209990822.View ArticlePubMedGoogle Scholar
- Tombaugh TN, McIntyre NJ: The mini-mental state examination: a comprehensive review. J Am Geriatr Soc. 1992, 40 (9): 922-935.View ArticlePubMedGoogle Scholar
- Christensen H, Batterham PJ, Mackinnon AJ, Jorm AF, Mack HA, Mather KA, Anstey KJ, Sachdev PS, Easteal S: The association of APOE genotype and cognitive decline in interaction with risk factors in a 65-69 year old community sample. BMC Geriatr. 2008, 8: 14-10.1186/1471-2318-8-14.View ArticlePubMedPubMed CentralGoogle Scholar
- Yaffe K, Lindquist K, Penninx BW, Simonsick EM, Pahor M, Kritchevsky S, Launer L, Kuller L, Rubin S, Harris T: Inflammatory markers and cognition in well-functioning African-American and white elders. Neurology. 2003, 61 (1): 76-80.View ArticlePubMedGoogle Scholar
- Laukka EJ, MacDonald SW, Backman L: Contrasting cognitive trajectories of impending death and preclinical dementia in the very old. Neurology. 2006, 66 (6): 833-838. 10.1212/01.wnl.0000203112.12554.f4.View ArticlePubMedGoogle Scholar
- Muniz-Terrera G, Matthews F, Dening T, Huppert FA, Brayne C: Education and trajectories of cognitive decline over 9 years in very old people: methods and risk analysis. Age Ageing. 2009, 38 (3): 277-282.View ArticlePubMedGoogle Scholar
- National Institute for Health and Clinical Excellence: Dementia. Supporting people with dementia and their carers in health and social care. London. 2006Google Scholar
- Bland JM, Altman DG: Regression towards the mean. BMJ. 1994, 308 (6942): 1499-View ArticlePubMedPubMed CentralGoogle Scholar
- MRC CFAS: Cognitive function and dementia in six areas of England and Wales: the distribution of MMSE and prevalence of GMS organicity level in the MRC CFA Study. Psychol Med. 1998, 28 (2): 319-335. 10.1017/S0033291797006272.View ArticleGoogle Scholar
- Folstein M: Mini-mental and son. Int J Geriatr Psychiatry. 1998, 13 (5): 290-294.PubMedGoogle Scholar
- Stephan BCM, S G, Brayne C, Bond J, McKeith IG, Matthews FE, MRCCFAS: Optimizing Mild Cognitive Impairment for Discriminating Dementia Risk in the General Older Population. Am J Geriatr Psychiatry. 2010, 18: 662-673. 10.1097/JGP.0b013e3181e0450d.View ArticlePubMedGoogle Scholar
- R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. 2010, Vienna, Austria, (ISBN) 3-900051-07-0, [http://www.R-project.org]Google Scholar
- van den Hout A, Matthews FE: Multi-state analysis of cognitive ability data: a piecewise-constant model and a Weibull model. Stat Med. 2008, 27 (26): 5440-5455. 10.1002/sim.3360.View ArticlePubMedGoogle Scholar
- Stephan BC, Brayne C, McKeith IG, Bond J, Matthews FE: Mild cognitive impairment in the older population: Who is missed and does it matter?. Int J Geriatr Psychiatry. 2008, 23 (8): 863-871. 10.1002/gps.2013.View ArticlePubMedGoogle Scholar
- Matthews FE, Stephan BC, McKeith IG, Bond J, Brayne C: Two-year progression from mild cognitive impairment to dementia: to what extent do different definitions agree?. J Am Geriatr Soc. 2008, 56 (8): 1424-1433. 10.1111/j.1532-5415.2008.01820.x.View ArticlePubMedGoogle Scholar
- Tombaugh TN: Test-retest reliable coefficients and 5-year change scores for the MMSE and 3MS. Arch Clin Neuropsychol. 2005, 20 (4): 485-503. 10.1016/j.acn.2004.11.004.View ArticlePubMedGoogle Scholar
- Frerichs RJ, Tuokko HA: A comparison of methods for measuring cognitive change in older adults. Arch Clin Neuropsychol. 2005, 20 (3): 321-333. 10.1016/j.acn.2004.08.002.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/11/127/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.