- Research article
- Open Access
- Open Peer Review
Selecting optimal screening items for delirium: an application of item response theory
© Yang et al; licensee BioMed Central Ltd. 2013
- Received: 8 June 2012
- Accepted: 27 December 2012
- Published: 22 January 2013
Delirium (acute confusion), is a common, morbid, and costly complication of acute illness in older adults. Yet, researchers and clinicians lack short, efficient, and sensitive case identification tools for delirium. Though the Confusion Assessment Method (CAM) is the most widely used algorithm for delirium, the existing assessments that operationalize the CAM algorithm may be too long or complicated for routine clinical use. Item response theory (IRT) models help facilitate the development of short screening tools for use in clinical applications or research studies. This study utilizes IRT to identify a reduced set of optimally performing screening indicators for the four CAM features of delirium.
Older adults were screened for enrollment in a large scale delirium study conducted in Boston-area post-acute facilities (n = 4,598). Trained interviewers conducted a structured delirium assessment that culminated in rating the presence or absence of four features of delirium based on the CAM. A pool of 135 indicators from established cognitive testing and delirium assessment tools were assigned by an expert panel into two indicator sets per CAM feature representing (a) direct interview questions, including cognitive testing, and (b) interviewer observations. We used IRT models to identify the best items to screen for each feature of delirium.
We identified 10 dimensions and chose up to five indicators per dimension. Preference was given to items with peak psychometric information in the latent trait region relevant for screening for delirium. The final set of 48 indicators, derived from 39 items, maintains fidelity to clinical constructs of delirium and maximizes psychometric information relevant for screening.
We identified optimal indicators from a large item pool to screen for delirium. The selected indicators maintain fidelity to clinical constructs of delirium while maximizing psychometric information important for screening. This reduced item set facilitates development of short screening tools suitable for use in clinical applications or research studies. This study represents the first step in the establishment of an item bank for delirium screening with potential questions for clinical researchers to select from and tailor according to their research objectives.
- Confusion assessment method
- Delirium screening
- Factor analysis
- Item response theory
- Item bank
Delirium is a preventable [1, 2] acute confusional disorder. In the US, delirium affects over 2.3 million hospitalized older adults each year  at an estimated total annual cost of $152 billion . Recognition of delirium is a prerequisite for developing a coherent treatment program. However, delirium remains under-recognized and is consequently mismanaged in most clinical settings .
Formal diagnostic criteria for delirium were first codified in 1980 in the American Psychiatric Association′s Diagnostic and Statistical Manual of Mental Disorders, Version 3 (DSM-III) . Different definitions have appeared in subsequent DSM versions [7–9]. The first appearance of delirium in the International Classification of Diseases occurred in ICD-10 . While the DSM clearly captures the key elements of the delirium syndrome, the DSM criteria themselves can be challenging to apply diagnostically, both in clinical practice and in research settings, particularly for patients who are not communicative . Additionally, the DSM-IV criteria require knowledge of underlying cause before diagnosis can be made. In clinical practice, usually delirium is first recognized and then a search for the underlying cause proceeds. Wide discrepancies in case identification have been reported when different criteria are used [11–13].
There are many methods for research and clinical diagnosis of delirium, operationalizing either the International Classification of Diseases (ICD) or DSM criteria . The most commonly used algorithm for case identification of delirium is the Confusion Assessment Method (CAM) . The CAM reduces the nine original DSM-III-R criteria to four key features, requiring the presence of both 1) acute change in mental status with a fluctuating course and 2) inattention, and either 3) disorganized thinking or 4) altered level of consciousness. A recent comprehensive review showed its strong performance characteristics and widespread use . The CAM algorithm has been used in over 1600 publications over the past 14 years, more than 10 times more frequently than the DSM criteria . The recommended interview prior to completion of the CAM is a short cognitive screening tool, including assessment of attention . However, different researchers may operationalize the CAM features differently. To maximize the accuracy and reliability of the CAM, standardized mental status and neuropsychiatric assessments, questionnaires and ratings should be used to assess delirium symptoms . However, because such assessments may require up to 30 minutes for administration and scoring  they are impractical for clinical use and burdensome for research studies. Therefore, reducing the length of screening interviews is an important step in improving case identification. Item response theory is a statistical tool that can help in this process. The goal of our work is to identify the most efficient set of items to determine the presence or absence of each of the CAM features.
Item response theory (IRT) encompasses a set of psychometric tools that—among other things —can help in the selection of optimal test questions to shorten instrument [20–25]. IRT is a statistical framework that relates observed patient data (responses to test items, or diagnostic signs and symptoms) to theoretical (i.e., latent) and presumed continuously distributed constructs. IRT can be considered an extension of classical factor analysis  and is a useful tool in test construction because it provides a framework for expressing characteristics of test-takers and test items on a uniform metric. IRT and factor analysis are isomorphic when the factor analysis is performed on a matrix of polychoric correlations and only one latent variable is modeled [26–28]. In this study, the unidimensional factor analysis results are item response theory results, and more globally the multidimensional factor analysis results are multidimensional item response theory . The ordinal dependent variable approach to factor analysis was described by Birnbaum in Lord and Novick′s seminal work on IRT , formalized by Christoffersson  and Muthén .
In our approach, insofar as unidimensionality is an assumption of IRT , we sought first to assess the extent to which our data satisfied this assumption before moving on to formal IRT analyses. This feature makes possible the construction of tests for specific uses or specific populations. In many IRT parameter estimation procedures, item parameters are assumed to be fixed and invariant across population subsamples . This is a strength in that tests can be constructed using only some items from a larger bank of items but still produce estimates of person level on the same metric as other tests using different items from the bank.
where G is some cumulative probability transformation, usually the inverse logit, but the normal probability distribution function is also used. The unobserved variable (e.g., latent level for the CAM feature of inattention)θ, is often assumed to be distributed normally with mean zero and unit variance. The difference between a person′s latent trait level (θ i ) and the item difficulty (or item location, or symptom severity level, b j ) defines the probability that a person will display a symptom (e.g., ″Trouble keeping track of what was being said,″ for the CAM feature of inattention). P j (θ) describes the increasing probability of a randomly chosen patient displaying indicator y j with increasing values of the latent trait θ.
If a test symptom severity is greater than the person′s level on the underlying trait or exceeds the test item symptom severity, less likely than not they will express the symptom. The precise probability is modified by the strength of the relationship between the latent trait and the item response, captured with the item discrimination parameter (a j ). When logistic regression estimation procedures are used, it is common to include a scaling constant (D) so that the logit parameters are standardized .
Building tests to suit specific uses can employ the concept of item information . Item information is expressed with I j (θ) = a j 2 P j (θ)[1 − P j (θ)]. The more highly discriminating an item is, the more peaked its information function. Information functions are centered over the item difficulty parameter. Information is analogous to reliability in the sense that it expresses measurement error. Due to the assumption of local independence, item information functions are additive. Local independence is an important basic assumption in IRT along with unidimensionality, where an answer to one item is not contingent or statistically dependent upon an answer to a preceding item. The curve describing the sum of information over the underlying trait is called a test information curve. Taken together, it is possible to achieve fine control over where and how well a given item set measures a latent trait along the latent trait distribution (subject to the availability of items with the desired parameters). The goal of this paper was to identify the shortest set of mental status assessment questions and interviewer observations that could be used to efficiently provide relevant information for screening about a patient′s level on four CAM diagnostic features. We present our approach to developing an item bank for the future development of screening tool using item response theory and related psychometric methods. The context is the future development of predictive tests for distinguishing persons who satisfy each of the four CAM criteria for delirium. Our substantive goal was to develop a parsimonious set of indicators for each of the four key CAM features of delirium to be considered in further developing brief clinically useful screening measures .
Design and participants
To meet our objective of identifying a small set of indicators for the core features of delirium that would be useful for screening, we began with a conceptual model of the important symptom dimensions of delirium. This was informed by the CAM —specifically the four features of delirium described above. We identified mental status, neuropsychological performance, and delirium symptom assessment instruments that include specific tests that could provide information relevant to those symptom dimensions. We then identified and obtained an existing data source that included relevant assessment tools. We formed a panel of clinical experts to inform the data handling and statistical and psychometric data analysis. Specific details are described below. The overall construction and evaluation of the item bank is similar to the National Institutes of Health Roadmap Initiative Patient-Reported Outcomes Measurement Information System (PROMIS). The description of the PROMIS psychometric analysis for item banking is found in Reeve et al. .
The sample used in this analysis was chosen to provide a high rate of delirium, and was drawn from the screening phase of a randomized controlled trial of a Delirium Abatement Program (DAP) . The trial was conducted in eight post acute care facilities in Massachusetts between years 2000 to 2003. During the enrollment period, 6,354 persons were admitted to one of the eight facilities. All assessments were conducted within three days of admission. Of the 4,744 screened, medical records were unavailable for 92, and 54 were excluded due to coma. The final sample included 4,598 subjects from the screening cohort. Of these 611 (13%) displayed CAM delirium. Patients were only included if they were able to provide assent and their caregivers subsequently provided informed consent. The details of the study have been provided previously by Kiely and colleagues . The Mini-Mental State Examination (MMSE) was used as part of the structured mental status assessment with Delirium Symptom Interview (DSI), Memorial Delirium Assessment Scale (MDAS), and CAM to identify delirium . Review and approval of this study and protocol was provided by the Institutional Review Boards of the Beth Israel Deaconess Medical Center and Hebrew SeniorLife.
Source items, which included direct patient questions, mental status testing, and observational items, were obtained from a structured delirium assessment [18, 37], which culminated in a rating of the presence or absence of the four core features of CAM delirium . The DAP trial structured delirium assessment included the Mini-Mental State Examination (MMSE) , which assesses orientation to year, season, month, day of the week, date, city or town, name of place, and type of place. The DAP screening assessment also included the Digit Span test , which involves asking patients to repeat increasingly long sets of numbers in order forwards and different sets of numbers backwards. The Delirium Symptom Interview (DSI)  is a 113-item instrument that includes both questions asked directly of the patient, and a series of structured observations. Assessment areas included attention, organization of thought, level of consciousness, disturbance of perception, sleep and psychomotor activity. The total number of items from the source instruments that were considered in the clinical consensus was 119. The description of the clinical consensus process is reported in detail by Huang and colleagues . All items were dichotomous, as described in Huang et al. .
Summary of results from dimensionality assessment models
Single factor model
Number of proposed indicators
Number of modeled indicators
Number of significant eigenvalues
Large secondary loadings
Feature 1-Acute Change and Fluctuating Course-Direct Interview
Feature 1-Acute Change and Fluctuating Course-Observational
Feature 2-Inattention-Direct Interview
Featured 3-Disorganized Thinking-Direct Interview
Featured 3-Disorganized Thinking-Observational
Featured 4-Altered Level of Consciousness-Observational §
Expert panel review
Our Clinical Expert Panel (CEP) consisted of one geriatric psychiatrist, one geriatric nurse, one behavioral neurologist, one neuropsychologist, and three internists/geriatricians, all of whom were experts in delirium assessment and familiar with the CAM algorithm. Details regarding the CEP review process are described elsewhere . Briefly, we summarize the stages of CEP review process most relevant to this study. Stage I began with identifying source items from established instruments. The CEP classified indicators from source items according to relevance for each of four features of delirium as defined by the CAM algorithm (Stage II). Indicators were then sub-classified as reflecting observational data (i.e., a rating of a symptom observed by trained interviewer) versus direct interview data (i.e., a verbatim response to a directly asked question, including cognitive test questions) (Stage II). Items could be assigned to more than one feature, as implied by the overlapping boxes in Figure 1. For example, the first orientation question ″What is the year?″ was assigned to both CAM Feature 2, Inattention and CAM Feature 3, Disorganized thinking.
Exploratory data analysis
At Stage III, eight indicator sets were defined (i.e., indicator sets assigned to each of the four CAM features, separately considering direct interview and observational indicators). We performed exploratory data analysis within indicator sets, including cross-tabulations and data quality assessment (e.g., missing data checking. Item cross-tabulations were carefully examined for voids (empty cells) that might arise from logically dependent response sets. For example, a pair of items with a logical dependency could be an item that (a) assesses whether symptom x was present followed by an item (b) that assesses the symptom frequency. Linear modeling of such dependent items is not appropriate. When we found logical dependencies, the expert panel restructured the indicators. For example, generating a single ordinal composite to summarize both presence and frequency of a symptom.
We performed an empirical multi-collinearity check for local dependency among pairs of indicators for which correlations could be not be estimated. We fit a weighted least-squares factor analysis model using Mplus software (version 5.2, Muthén & Muthén, Los Angeles CA)  to the indicator sets and examined error messages indicating sparsely populated cells, potentially due to logical dependencies. We developed an automated iterative algorithm to rank individual indicators according to the number of times correlations could not be estimated, and we dropped the indicators most frequently involved. Every indicator dropped was reviewed by the CEP. The CEP recommended that some indicators should be retained if the symptom was clinically important. We then repeated the empirical multicollinearity checking but forcing the algorithm to retain the flagged items.
Assessment of dimensionality
Within each indicator set, we evaluated the assumption of unidimensionality using permuted parallel analysis  and factor analysis (exploratory, confirmatory, and bi-factor  models). Parallel analysis involves comparing observed and random eigenvalues. Eigenvalues, or latent roots, of a correlation matrix can be interpreted as variances for (latent) variables derived from the observed variables . Random eigenvalues were estimated empirically by randomly assigning indicators to persons and extracting eigenvalues from the resulting correlation matrix. With multiple replications of the random data, this procedure represents a permutation test on the observed eigenvalues. We defined significant eigenvalues as those where the observed eigenvalue exceeds the 97.5th percentile of eigenvalues from the permuted (random) data.
We used the number of significant eigenvalues, m, to set the number of factors to examine in an exploratory factor analysis (EFA). Based on the results of the EFA we specified a simple structure confirmatory factor analysis (CFA)  and bi-factor analysis (BFA)  models. We assigned each item to a single factor in CFA—or to a specific factor in BFA—on which it had the largest loading. For example, if the parallel analysis suggested the presence of two significant factors, we examined the EFA solution for two factors. A simple structure CFA model would have two factors, and each indicator would load on the factor for which it had the highest factor loading in the EFA solution. A BFA model would use this same factor loading pattern, but would include a general factor loading on all indicators, and factor correlations would be constrained to zero.
We considered the preponderance of the evidence in making dimensionality decisions, together with the input of the CEP on the interpretation of secondary factors. Evidence of sufficient unidimensionality included only one significant factor on permuted parallel analysis. Signs of failure of the indicator set to conform to unidimensionality included significant eigenvalues beyond the first, improvement in fit statistics [the confirmatory fit index (CFI)  and root mean square error of approximation (RMSEA) ], between a single factor model and an m-factor CFA model, and greater factor loadings on specific factors relative to loadings on general factors in the BFA. When we rejected the assumption of unidimensionality, the indicator sets were split into m sub-sets to achieve sufficiently unidimensional indicator sets for IRT-based data analyses.
All eigenvalues, EFA, CFA, and IRT models were estimated with Mplus software (version 5.2, Muthén & Muthén, Los Angeles CA) using limited information weighted least squares estimation with mean and variance adjustment [43, 51, 52]. Permuted parallel analyses were performed with a custom Stata macro (version 10, Stata Corp, College Station, TX). Bi-factor models were estimated with Gibbons and Hedeker′s BIFACTOR software (Center for Health Statistics, University of Illinois at Chicago). Item characteristic curves and information functions were calculated using formulae provided in Baker and Kim  and Thissen and Wainer  and plotted using Stata. All syntax and command files and custom software are available upon request (RNJ).
Final indicator selection
The final results of our analytic steps (Stage V, Figure 1) represent the selection of up to 5 indicators per CAM feature. The reason for choosing 5 indicators is because studies have shown more than four indicators per latent trait are ideal for arriving at a proper solution in latent variable modeling, and diminishing returns are observed with more than five indicators [55, 56].
Characteristics of study participants
Total [n (%)]
Age [M (SD)]
Sex [n (%)]
Race/Ethnicity [n (%)]
Delirium Present [n (%)]
Mini‐Mental State Examination Score [M (SD)] (scored 0–30, 30 best)
Mini‐Mental State Examination Score group [n (%)]
Severe cognitive impairment (0–17)
Cognitive impairment (18–23)
No cognitive impairment (24–30)
The clinical expert panel defined CAM feature indicators from source items drawn from the MMSE orientation items, digit span, and DSI. We analyzed the resulting 135 indicators following the psychometric modeling steps described in the methods (multi-collinearity checking, dimensionality determination, IRT). Results are summarized in Table 1. This table lists by CAM feature (column 1) the number of indicators proposed by the clinical expert panel (column 2), the number of indicators remaining after empirical multi-collinearity checking (column 3), number of significant eigenvalues following permuted parallel analysis (column 4), and the marginal reliability of each feature at θ 50 + (column 5). Columns 6–7 summarize model fit statistics and estimates of a single factor model fit to the indicator set, and columns 8 and 9 the model fit statistics for the m-dimensional model. As indicated in Table 1, no indicator set had more than two significant eigenvalues based on the permuted parallel analysis. Column 10 summarizes whether large secondary loadings were observed (secondary factor loading exceeded the common factor loading for a given item) in the BFA. Column 11 reports the final adjudication of the expert panel on the number of retained dimensions. Three indicator sets identified more than one secondary factor, and the expert panel agreed with the results. When only one significant eigenvalues was detected, model fit statistics were generally good (CFIs > 0.94 and RMSEAs < 0.05) .
The next step was to identify items that provided high information content in a region of the underlying trait assessed by the items. We did this by evaluating the item information at the 50th percentile of the latent trait distribution underlying the indicator set (or sub-set) among those participants who were rated as CAM feature positive. We identify this level of the latent trait as the 50th percentile (θ 50 +) curve. An example of one such curve is shown in Figure 2. This figure plots item information curves for the indicators identified by the Clinical Expert Panel as measures of CAM feature 2 – inattention—direct interview. All indicators are illustrated, but we highlight two for discussion: ″List the months of the year backward″ (heavy dotted line) and ″List the days of the week backward″ (solid bolded line). The box and whisker plots beneath the horizontal axis indicate the distributions of posterior estimates of latent trait scores for participants ultimately classified as CAM feature 2 - inattention positive and negative. Vertical reference lines for key percentiles of the CAM feature positive group are illustrated in the main panel.
This figure illustrates several important points about the analysis of this indicator set. First, the latent trait distributions for the CAM feature positive and negative sub-groups show wide separation. Nevertheless, most of the item difficulty parameters (located where the information functions peak) are above the 75th percentile of the CAM feature positive group. Thus, most of indicators in this set contribute the most information at very severe levels of the underlying trait. Such items would not be useful for screening purposes, even if the assessed symptoms were pathognomonic of delirium. Our goal is to derive a test information curve tuned for screening purposes. We approach this by choosing the items with the most information at the 50th percentile for our item bank. The two highlighted items provide the most information at the 50th percentile of the latent trait distribution in the feature positive group. This is the area of the latent trait of greatest interest for screening purposes.
Source items and indicator IRT parameters for top five indicators identified for each dimension of each CAM feature*
Feature and Indicator Selection (top five indicators)
threshold level(θ 50 +) on latent trait
Feature 1 -Acute Change and Fluctuating Course- Direct interview (θ 50 + = − 0.20)
Felt confused during the past day
Thought you were not really in (name of facility)
Saw things that were not really there
Thought things were moving that were not really moving
Heard things that were not really there
Feature 1 -Acute Change and Fluctuating Course-Observational (θ 50 + = 1.17)
Level of consciousness fluctuated
Level of attention fluctuated
Evidence of disturbance of sleep
Psychomotor activity fluctuated
Feature 2 -Inattention- Direct interview First Factor (θ 50 + = 0.22)
What is the year? †
What is the month? †
What is the day of the week? †
What type of place is this? †
What is the name of this place? †
Second Factor (θ 50 + = 0.27)
Days of the week backwards
Months of the year backwards
Digit span backwards 3 Numbers ‡
Digit span backwards 4 Numbers ‡
Digit span forwards 4 Numbers ‡†
Feature 2 -Inattention- Observational (θ 50 + = 0.38)
Trouble keeping track of what was being said
Level of attention fluctuated
Unaware of environment
Distracted by environmental stimuli
Staring into space
Feature 3 -Disorganized Thinking Direct interview (θ 50 + = 0.67)
What type of place is this?†
What is the year? †
What is the month? †
What is the day of the week? †
What is the name of this place? †
Feature 3 -Disorganized Thinking Observational § First Factor (θ 50 + = 1.03)
Unclear or illogical flow of ideas
Changes the subject suddenly
Conversation was rambling
Words or phrases that were disjointed or inappropriate
Feature 4 -Fluctuating Course and Altered Level of Consciousness- Observational First Factor (θ 50 + = 1.99)
Sleepy, or stuporous, or comatose
Disturbance of sleep
Lethargy and sluggishness
Slowness of motor response
Expressed a paucity of thoughts
Second Factor (θ 50 + = − 0.14)
Speech unusually fast or pressured
Excessive absorption with ordinary objects
Increased speed of motor response
Of note, we did not pursue IRT modeling for the second observational factor of Feature 3 (disorganized thinking) because only three items loaded on this factor: limited speech, paucity of thoughts, and slow speech. We also did not include the direct interview items of Feature 4 (altered level of consciousness) because the item set was redundant with Feature 2 (inattention-direct interview). For Feature 4 (altered level of consciousness-observational), the second factor showed all items having very low information content at the 50th percentile, so for this feature, we made our decision based on the 75th percentile in the CAM feature positive group.
The marginal reliability estimates for each of the CAM IRT-derived features are shown in Table 1. The marginal reliability estimates are based on the mean standard error of the IRT scores for the items at the 50th percentile of the latent trait distribution for the CAM feature positive group. Most marginal reliability estimates were 0.80 or higher, with higher reliability approaching a coefficient of 1, suggesting good reliability at the area of reliability relevant to screening.
Through an iterative process pairing a clinical expert panel with psychometric data analysis, we have identified a set of 48 indicators, derived from 39 items that are optimal for screening patients for the four core features of delirium as defined in the CAM algorithm. The symptoms assessed are clinically relevant and optimize psychometric properties for screening. The resulting item pool can be used to develop short form screening instruments for clinical or research use.
A challenge we faced in our item selection procedure is what criteria to use for selecting candidate items that would be optimal for screening. To this end, we generated item information functions for each indicator, and selected indicators that maximized information around the median underlying latent trait level for persons with each CAM feature positive. Some items, even those that are pathognomonic for a particular CAM feature, may have been omitted if they provide most of their information around a level of severity that is not relevant for screening. Our approach leads to measures that maximize measurement precision of underlying latent traits at a level that is important for separating persons who are or who are not classified as demonstrating the CAM feature.
Our goal was to define a set of items for clinical researchers to construct a short form for the routine screening of delirium to replace lengthy batteries of mental status, neuropsychological assessment, and observational items. The significance of this work is for the future establishment of validated instrument for delirium screening. Our work represents a first step in development of a more refined delirium screening instrument. The approach used here may be more widely applicable to a broad array of conditions that rely on multi-item assessment batteries to screen for delirium. The innovation of the approach we used in this study is the use of IRT to select optimal items for screening that maximizes psychometric information at the latent trait level that discriminates between persons who do and do not demonstrate the four core features of delirium described in the CAM algorithm. The items were chosen in an iterative fashion that incorporates an interdisciplinary perspective from both clinical and methodological expertise in measurement research. The novel approach used in this study for case identification in delirium allows the interdisciplinary team to select items based on item information at the 50th percentile for those who screen positive on the specific CAM feature. Ideally, in the near future our analysis will be enhanced by computer assisted bedside interviewing with well characterized item banks and adaptive testing algorithms tuned to distinct purposes (e.g., grading delirium severity, screening for probable delirium).
Several caveats are worthy of discussion. First, our study involved a single, albeit very large, sample of acutely ill elderly patients. Future work will be needed to extrapolate our findings in other samples. Second, the operationalization of the critical theta value for screening could have been incorrect; however, we performed sensitivity analyses demonstrating that using values other than the median among CAM feature positive persons identified similar items. Third, any delirium tool developed from the identified items would need to be validated in an independent cohort. We are actively pursuing this work.
The DSM-IV and ICD-10 are used for diagnosis and coding by trained clinicians. In contrast, the design and purpose of the current study was to identify items for delirium screening based on the four CAM features, which can be done by both clinicians and trained non-clinicians. Therefore, this research may not directly inform diagnosis relying only and strictly on the DSM and ICD.
Another limitation of our analysis is that age, sex and race/ethnicity, have not been considered in this analysis. These factors have been shown to be associated with the differential expression of signs and symptoms in other psychiatric and cognitive disorders, although not necessarily in delirium. Our results assume that the measurement of symptoms of CAM features is invariant across major sociodemographic groups. A future direction for potentially improving the current instrument is to examine measurement bias due to age and gender.
We have identified a candidate set of delirium indicators for the future development of a short assessment for detecting delirium. In a health care setting where time and resources are limited, accurate and brief assessments are greatly needed for systematic case-finding of delirium. In research settings, efficient assessment is crucial to reduce participant burden and maximize validity. This study lays the groundwork for the development of short forms for a variety of clinical conditions. Future work is needed to further apply this methodology to develop short form tools for delirium detection and research application, and to validate these new instruments across a broad range of populations and settings.
Presented at the Gerontological Society of America 64th Annual Meeting in Boston, Massachusetts, USA.
This work was funded in part by grants from the National Institute on Aging: R01AG030618 (ERM), K24 AG035075 (ERM), R03AG025262 (RNJ), R01AG17649 (ERM), and P01AG031720 (SKI). Dr. Inouye is supported by the Milton and Shirley F. Levy Family Chair.
- Inouye SK, Bogardus ST, Charpentier PA, Leo-Summers L, Acampora D, Holford TR, Cooney LM: A multicomponent intervention to prevent delirium in hospitalized older patients. N Engl J Med. 1999, 340: 669-676. 10.1056/NEJM199903043400901.View ArticlePubMedGoogle Scholar
- Marcantonio E, Flacker JM, Wright RJ, Resnick NM: Reducing delirium after hip fracture: a randomized trial. J Am Geriatr Soc. 2001, 49: 516-522. 10.1046/j.1532-5415.2001.49108.x.View ArticlePubMedGoogle Scholar
- Francis J: Delirium in older patients. J Am Geriatr Soc. 1992, 40: 829-838.View ArticlePubMedGoogle Scholar
- Leslie DL, Marcantonio ER, Zhang Y, Leo-Summers L, Inouye SK: One-year health care costs associated with delirium in the elderly population. Arch Intern Med. 2008, 168: 27-32. 10.1001/archinternmed.2007.4.View ArticlePubMedPubMed CentralGoogle Scholar
- Inouye S: Delirium in older persons. N Engl J Med. 2006, 354: 1157-1165. 10.1056/NEJMra052321.View ArticlePubMedGoogle Scholar
- American Psychiatric Association: Diagnostic and Statistical Manual of Mental Disorders (third edition) (DSM-III). 1980, Washington, D.C.: American Psychiatric Association, 3Google Scholar
- American Psychiatric Association: Diagnostic and Statistical Manual of Mental Disorders (third edition - revised) (DSM-III-R). 1987, Washington, D.C.: American Psychiatric Association, 3Google Scholar
- American Psychiatric Association: Diagnostic and Statistical Manual of Mental Disorders (fourth edition) (DSM-IV). 1994, Washington, D.C.: American Psychiatric Association, 4Google Scholar
- Tucker LR, Lewis C: A reliability coefficient for maximum likelihood factor analysis. Psychometrika. 1973, 38: 1-10. 10.1007/BF02291170.View ArticleGoogle Scholar
- World Health Organization: The ICD-10 classification of mental and behavioural disorders: diagnostic criteria for research. 1993, World Health OrganizationGoogle Scholar
- Liptzin B: What criteria should be used for the diagnosis of delirium?. Dement Geriatr Cogn Disord. 1999, 10: 364-10.1159/000017172.View ArticlePubMedGoogle Scholar
- Laurila JV, Pitkala KH, Strandberg TE, Tilvis RS: The impact of different diagnostic criteria on prevalence rates for delirium. Dement Geriatr Cogn Disord. 2003, 16: 156-162. 10.1159/000071004.View ArticlePubMedGoogle Scholar
- Cole MG, Dendukuri N, McCusker J, Han L: An empirical study of different diagnostic criteria for delirium among elderly medical inpatients. J Neuropsychiatry Clin Neurosciences. 2003, 15: 200-207. 10.1176/appi.neuropsych.15.2.200.View ArticleGoogle Scholar
- Adamis D, Sharma N, Whelan PJP, Macdonald AJD: Delirium scales: a review of current evidence. Aging & mental health. 2010, 14: 543-555. 10.1080/13607860903421011.View ArticleGoogle Scholar
- Inouye SK, van Dyck CH, Alessi CA, Balkin S, Siegal AP, Horwitz RI: Clarifying confusion: the confusion assessment method. A new method for detection of delirium. Ann Intern Med. 1990, 113: 941-948.View ArticlePubMedGoogle Scholar
- Wei LA, Fearing MA, Sternberg EJ, Inouye SK: The Confusion Assessment Method: a systematic review of current usage. J Am Geriatr Soc. 2008, 56: 823-830. 10.1111/j.1532-5415.2008.01674.x.View ArticlePubMedPubMed CentralGoogle Scholar
- Inouye SK: The Confusion Assessment Method (CAM): Training Manual and Coding Guide. 2003, New Haven, CT: Yale University School of MedicineGoogle Scholar
- Simon S, Bergmann M, Jones RN, Murphy K, Orav E, Marcantonio E: Reliability of a structured assessment for non-clinicians to detect delirium among new admissions to post-acute care. J Am Med Dir Assoc. 2006, 7: 412-415. 10.1016/j.jamda.2006.02.006.View ArticlePubMedGoogle Scholar
- Lord F: Applications of item response theory to practical testing problems. 1980, Hillsdale, NJ: Lawrence Erlbaum Associates, PublishersGoogle Scholar
- Dregan A, Grieve A, van Staa T, Gulliford MC: Potential application of item-response theory to interpretation of medical codes in electronic patient records. BMC Med Res Methodol. 2011, 11: 168-10.1186/1471-2288-11-168.View ArticlePubMedPubMed CentralGoogle Scholar
- van Nispen RMA, Knol DL, Langelaan M, van Rens GHMB: Re-evaluating a vision-related quality of life questionnaire with item response theory (IRT) and differential item functioning (DIF) analyses. BMC Med Res Methodol. 2011, 11: 125-10.1186/1471-2288-11-125.View ArticlePubMedPubMed CentralGoogle Scholar
- Vogels AG, Jacobusse GW, Reijneveld SA: An accurate and efficient identification of children with psychosocial problems by means of computerized adaptive testing. BMC Med Res Methodol. 2011, 11: 111-10.1186/1471-2288-11-111.View ArticlePubMedPubMed CentralGoogle Scholar
- Edelen MO, Reeve BB: Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Quality of Life Res. 2007, 16: 5-18. 10.1007/s11136-007-9198-0.View ArticleGoogle Scholar
- van der Linden WJ, Hambleton RK: Handbook of modern item response theory. 1996, SpringerGoogle Scholar
- Goetz C, Ecosse E, Rat AC, Pouchot J, Coste J, Guillemin F: Measurement properties of the osteoarthritis of knee and hip quality of life OAKHQOL questionnaire: an item response theory analysis. Rheumatology. 2011, 50: 500-505. 10.1093/rheumatology/keq357.View ArticlePubMedGoogle Scholar
- Mislevy RJ: Recent developments in the factor analysis of categorical variables. J Educ Behav Stat. 1986, 11: 3-31.View ArticleGoogle Scholar
- Glockner-Rist A, Hoijtink H: The best of both worlds: Factor analysis of dichotomous data using item response theory and structural equation modeling. Struct Equ Model. 2003, 10: 544-565. 10.1207/S15328007SEM1004_4.View ArticleGoogle Scholar
- Takane Y, de Leeuw J: On the relationship between item response theory and factor analysis of discretized variables. Psychometrika. 1987, 52: 393-408. 10.1007/BF02294363.View ArticleGoogle Scholar
- Reckase MD: Multidimensional item response theory. 2009, SpringerView ArticleGoogle Scholar
- Birnbaum A: Some latent trait models (chapter 17). Statistical Theories of Mental Test Scores. Edited by: Lord F, Novick M. 1968, Addison-Wesley, 397-424.Google Scholar
- Christoffersson A: Factor analysis of dichotomized variables. Psychometrika. 1975, 40: 5-32. 10.1007/BF02291477.View ArticleGoogle Scholar
- Muthén B: Contributions to factor analysis of dichotomous variables. Psychometrika. 1978, 43: 551-560. 10.1007/BF02293813.View ArticleGoogle Scholar
- McDonald RP: Test theory: A unified treatment. 1999, Lawrence ErlbaumGoogle Scholar
- Yang JS, Hansen M, Cai L: Characterizing sources of uncertainty in Item Response Theory scale scores. Educ Psychol Meas. 2011Google Scholar
- Camilli G: Teacher's corner: Origin of the scaling constant D=1.7 in item response theory. J Educ Behav Stat. 1994, 19: 293-Google Scholar
- Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, Thissen D, Revicki DA, Weiss DJ, Hambleton RK: Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical care. 2007, 45: S22-S31. 10.1097/01.mlr.0000250483.85507.04.View ArticlePubMedGoogle Scholar
- Marcantonio ER, Bergmann MA, Kiely DK, Orav EJ, Jones RN: Randomized trial of a delirium abatement program for postacute skilled nursing facilities. J Am Geriatr Soc. 2010, 58: 1019-1026. 10.1111/j.1532-5415.2010.02871.x.View ArticlePubMedPubMed CentralGoogle Scholar
- Kiely DK, Bergmann MA, Murphy KM, Jones RN, Orav EJ, Marcantonio ER: Delirium among newly admitted postacute facility patients: prevalence, symptoms, and severity. J Gerontology Series A: Biological Sci Med Sci. 2003, 58: M441-M445. 10.1093/gerona/58.5.M441.View ArticleGoogle Scholar
- Folstein MF, Folstein SE, McHugh PR: "Mini-mental state". A practical method for grading the cognitive state of patients for the clinician. J Psychiatric Res. 1975, 12: 189-198. 10.1016/0022-3956(75)90026-6.View ArticleGoogle Scholar
- Wechsler D: Wechsler Adult Intelligence Scale - Revised. 1981, New York: Psychological CorpGoogle Scholar
- Albert MS, Levkoff SE, Reilly C, Liptzin B, Pilgrim D, Cleary PD, Evans D, Rowe JW: The delirium symptom interview: an interview for the detection of delirium symptoms in hospitalized patients. J Geriatr Psychiatry Neurol. 1992, 5: 14-21.View ArticlePubMedGoogle Scholar
- Huang L-W, Inouye SK, Jones RN, Fong TG, Rudolph JL, O’Connor MG, Metzger ED, Crane PK, Marcantonio ER: Identifying Indicators of Important Diagnostic Features of Delirium. J Am Geriatrics Soc. 2012, 60: 1044-1050. 10.1111/j.1532-5415.2012.03996.x.View ArticleGoogle Scholar
- Muthén LK, Muthén BO: Mplus Users Guide. 1998–2012, Los Angeles, CA: Muthén & MuthénGoogle Scholar
- Buja A, Eyuboglu N: Remarks on parallel analysis. Multivar Behav Res. 1992, 27: 509-10.1207/s15327906mbr2704_2.View ArticleGoogle Scholar
- Gibbons RD, Bock RD, Hedeker D, Weiss DJ, Segawa E, Bhaumik DK, Kupfer DJ, Frank E, Grochocinski VJ, Stover A: Full-Information Item Bifactor Analysis of Graded Response Data. Appl Psychol Meas. 2007, 31: 4-10.1177/0146621606289485.View ArticleGoogle Scholar
- Horn JL: A rationale and test for the number of factors in factor analysis. Psychometrika. 1965, 30: 179-185. 10.1007/BF02289447.View ArticlePubMedGoogle Scholar
- Brown TA: Confirmatory Factor Analysis for Applied Research. 2006, New York: Guilford PublicationsGoogle Scholar
- Bentler PM: Comparative fit indexes in structural models. Psychol Bull. 1990, 107: 238-246.View ArticlePubMedGoogle Scholar
- Browne M, Cudeck R: Alternative ways of assessing model fit. Testing structural equation models. Edited by: Bollen K, Long J. 1993, Thousand Oaks, CA: Sage, 136-162.Google Scholar
- Embretson SE, Reise SP: Item Response Theory for psychologists. 2000, Mahwah, New Jersey: Lawrence Erlbaum AssociatesGoogle Scholar
- Muthén BO, du Toit S, Spisic D: Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. 1997, Los Angeles, CA: UCLA, Version 5Google Scholar
- Beauducel A, Herzberg P: On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Struct Equ Model. 2006, 13: 186-203. 10.1207/s15328007sem1302_2.View ArticleGoogle Scholar
- Baker FB, Kim S-H: Item Response Theory: Parameter Estimation Techniques. 2004, New York: Marcel Dekker, Inc, 2Google Scholar
- Thissen D, Wainer H: Test scoring. 2001, Mahwah, NJ: Lawrence ErlbaumGoogle Scholar
- Marsh HW, Hau KT, Balla JR, Grayson D: Is more ever too much? The number of indicators per factor in confirmatory factor analysis. Multivar Behav Res. 1998, 33: 181-220. 10.1207/s15327906mbr3302_1.View ArticleGoogle Scholar
- Little TD, Lindenberger U, Nesselroade JR: On selecting indicators for multivariate measurement and modeling with latent variables: When "good" indicators are bad and "bad" indicators are good. Psychol Methods. 1999, 4: 192-211.View ArticleGoogle Scholar
- Hu L, Bentler P: Fit indices in covariance structure analysis: Sensitivity to underparameterized model misspecifications. Psychol Methods. 1998, 4: 424-453.View ArticleGoogle Scholar
- Blessed G, Tomlinson B, Roth M: The association between quantitative measures of dementia and of senile change in the cerebral grey matter of elderly subjects. Br J Psychiatry. 1968, 114: 797-811. 10.1192/bjp.114.512.797.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/13/8/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.