Delirium is a preventable [1, 2] acute confusional disorder. In the US, delirium affects over 2.3 million hospitalized older adults each year [3] at an estimated total annual cost of $152 billion [4]. Recognition of delirium is a prerequisite for developing a coherent treatment program. However, delirium remains under-recognized and is consequently mismanaged in most clinical settings [5].

Formal diagnostic criteria for delirium were first codified in 1980 in the American Psychiatric Association′s Diagnostic and Statistical Manual of Mental Disorders, Version 3 (DSM-III) [6]. Different definitions have appeared in subsequent DSM versions [7–9]. The first appearance of delirium in the International Classification of Diseases occurred in ICD-10 [10]. While the DSM clearly captures the key elements of the delirium syndrome, the DSM criteria themselves can be challenging to apply diagnostically, both in clinical practice and in research settings, particularly for patients who are not communicative [11]. Additionally, the DSM-IV criteria require knowledge of underlying cause before diagnosis can be made. In clinical practice, usually delirium is first recognized and then a search for the underlying cause proceeds. Wide discrepancies in case identification have been reported when different criteria are used [11–13].

There are many methods for research and clinical diagnosis of delirium, operationalizing either the International Classification of Diseases (ICD) or DSM criteria [14]. The most commonly used algorithm for case identification of delirium is the Confusion Assessment Method (CAM) [15]. The CAM reduces the nine original DSM-III-R criteria to four key features, requiring the presence of both 1) acute change in mental status with a fluctuating course and 2) inattention, and either 3) disorganized thinking or 4) altered level of consciousness. A recent comprehensive review showed its strong performance characteristics and widespread use [16]. The CAM algorithm has been used in over 1600 publications over the past 14 years, more than 10 times more frequently than the DSM criteria [16]. The recommended interview prior to completion of the CAM is a short cognitive screening tool, including assessment of attention [17]. However, different researchers may operationalize the CAM features differently. To maximize the accuracy and reliability of the CAM, standardized mental status and neuropsychiatric assessments, questionnaires and ratings should be used to assess delirium symptoms [18]. However, because such assessments may require up to 30 minutes for administration and scoring [18] they are impractical for clinical use and burdensome for research studies. Therefore, reducing the length of screening interviews is an important step in improving case identification. Item response theory is a statistical tool that can help in this process. The goal of our work is to identify the most efficient set of items to determine the presence or absence of each of the CAM features.

Item response theory (IRT) encompasses a set of psychometric tools that—among other things [19]—can help in the selection of optimal test questions to shorten instrument [20–25]. IRT is a statistical framework that relates observed patient data (responses to test items, or diagnostic signs and symptoms) to theoretical (i.e., latent) and presumed continuously distributed constructs. IRT can be considered an extension of classical factor analysis [26] and is a useful tool in test construction because it provides a framework for expressing characteristics of test-takers and test items on a uniform metric. IRT and factor analysis are isomorphic when the factor analysis is performed on a matrix of polychoric correlations and only one latent variable is modeled [26–28]. In this study, the unidimensional factor analysis results are item response theory results, and more globally the multidimensional factor analysis results are multidimensional item response theory [29]. The ordinal dependent variable approach to factor analysis was described by Birnbaum in Lord and Novick′s seminal work on IRT [30], formalized by Christoffersson [31] and Muthén [32].

In our approach, insofar as unidimensionality is an assumption of IRT [33], we sought first to assess the extent to which our data satisfied this assumption before moving on to formal IRT analyses. This feature makes possible the construction of tests for specific uses or specific populations. In many IRT parameter estimation procedures, item parameters are assumed to be fixed and invariant across population subsamples [34]. This is a strength in that tests can be constructed using only some items from a larger bank of items but still produce estimates of person level on the same metric as other tests using different items from the bank.

IRT posits models that express a person′s response (

*y*
_{
ij
}), person-level trait (

*θ*
_{
i
}), and item parameters (

*a*
_{
j
}
*,b*
_{
j
}). Let

*y*
_{
ij
} represent person

*i*′s response to item

*j* that is observed as correct (or symptom present) (

*y* = 1) or incorrect (or symptom not expressed) (

*y* = 0). The probability that a randomly selected person from the population expresses a symptom is

where *G* is some cumulative probability transformation, usually the inverse logit, but the normal probability distribution function is also used. The unobserved variable (e.g., latent level for the CAM feature of inattention)θ, is often assumed to be distributed normally with mean zero and unit variance. The difference between a person^{′}s latent trait level (*θ*
_{
i
}) and the item difficulty (or item location, or symptom severity level, *b*
_{
j
}) defines the probability that a person will display a symptom (e.g., ″Trouble keeping track of what was being said,″ for the CAM feature of inattention). *P*
_{
j
}(θ) describes the increasing probability of a randomly chosen patient displaying indicator *y*
_{
j
} with increasing values of the latent trait *θ*.

If a test symptom severity is greater than the person′s level on the underlying trait or exceeds the test item symptom severity, less likely than not they will express the symptom. The precise probability is modified by the strength of the relationship between the latent trait and the item response, captured with the item discrimination parameter (*a*
_{
j
}). When logistic regression estimation procedures are used, it is common to include a scaling constant (*D*) so that the logit parameters are standardized [35].

Building tests to suit specific uses can employ the concept of item information [30]. Item information is expressed with *I*
_{
j
}(*θ*) = *a*
_{
j
}
^{2}
*P*
_{
j
}(*θ*)[1 − *P*
_{
j
}(*θ*)]. The more highly discriminating an item is, the more peaked its information function. Information functions are centered over the item difficulty parameter. Information is analogous to reliability in the sense that it expresses measurement error. Due to the assumption of local independence, item information functions are additive. Local independence is an important basic assumption in IRT along with unidimensionality, where an answer to one item is not contingent or statistically dependent upon an answer to a preceding item. The curve describing the sum of information over the underlying trait is called a test information curve. Taken together, it is possible to achieve fine control over where and how well a given item set measures a latent trait along the latent trait distribution (subject to the availability of items with the desired parameters). The goal of this paper was to identify the shortest set of mental status assessment questions and interviewer observations that could be used to efficiently provide relevant information for screening about a patient′s level on four CAM diagnostic features. We present our approach to developing an item bank for the future development of screening tool using item response theory and related psychometric methods. The context is the future development of predictive tests for distinguishing persons who satisfy each of the four CAM criteria for delirium. Our substantive goal was to develop a parsimonious set of indicators for each of the four key CAM features of delirium to be considered in further developing brief clinically useful screening measures [15].