Rating scales are commonly used as a means of quantifying latent variables across a variety of fields, such as education, psychology, economics, and the health sciences. For example, in clinical studies it is common to use rating scale-based instruments (e.g., questionnaires and observational assessments) to determine interventional outcomes such as symptom severity, activity performance and self-reported health. This typically involves a process of summing individual item scores into total scores that are treated as measures of the variable that the instrument intends to represent [1]. However, just as individual item scores, raw total scores are ordinal. This means that it is not possible to make inferences regarding what magnitude a certain change or difference in scores represents, which in turn prohibits sound comparisons. Furthermore, the ordinal nature of raw total scores means that they are unsuitable for common calculations such as the mean and other parametric statistics [2, 3].
To overcome issues related to the use of raw scores and to improve the quality assurance of rating scale-based instruments, the Rasch measurement model is recommended over traditional psychometric methods [1, 4]. In addition to being a powerful means to disclose and diagnose anomalies in the measurement process, the Rasch model allows for linear measurement to be accomplished based on ordinal item responses. However, these linear measures, expressed as log-odds units (logits), may be inconvenient for practical use and interpretation since they range from negative to positive values that typically are reported with two or three decimals [5, 6]. Such values differ from the non-negative integer raw scores that users may be familiar with. It can therefore be desirable to transform linear logit locations into more user-friendly ranges that can be used in further applications of the instrument [7].
In this paper, different approaches to transform Rasch model derived logits into more user-friendly ranges that preserve their linear properties are described. We then use empirical data to illustrate and discuss these transformations, and an Excel tool to aid researchers and practitioners in conducting logit transformations is proposed.
The Rasch model
The Rasch measurement model [8] can be used to overcome challenges associated with the use of raw total scores, and to assure that rating scale-based instruments are of acceptable standard and appropriate for their intended use [9,10,11]. The Rasch model assumes unidimensionality and local response independence, which both are assessed by various tests of fit of the data to the model as a means of quality control. When data exhibits sufficient fit to the model, linear measurement is accomplished. While issues related to fit are central in Rasch measurement, this will not be discussed here, but readers are instead referred to other sources [e.g., 9, 12, 13].
The Rasch model estimates the locations of both items and persons on a common latent continuum from less to more of the measured variable. The unit used to locate (or measure) persons and items along the latent continuum is the logit (or log-odds unit) that, in the case of dichotomous item responses, is derived from the natural logarithm (ln) of the probability of scoring 1 over the probability of scoring 0. The resulting logit represents the difference between the location (e.g., ability) of the person and the location (e.g., difficulty) of the item. Formally, the basic Rasch model for dichotomous item response data takes the form
$$ln\left(\frac{{P}_{ni1}}{1-{P}_{ni1}}\right)={\beta }_{n}-{\delta }_{i},$$
(1)
where Pni1 is the probability of person n scoring 1 (rather than 0) on item i, βn is the location of person n, and δi is the location of item i. The model may also be expressed as
$${P}_{ni1}=\frac{{exp}^{\left({\beta }_{n}-{\delta }_{i}\right)}}{1+{exp}^{\left({\beta }_{n}-{\delta }_{i}\right)}}.$$
(2)
One distinguishing feature of the model is that it yields separate person and item locations that are independent of each other [9]. In addition, the model estimates the precision (standard error, SE) of these locations. The SE is also expressed on the logit scale and provides direct information on the measurement uncertainty associated with individual person and item locations. The SE of measurement is not constant but vary along the measurement continuum. For person locations, it is greater for people with very low and high scores. The location of persons who score the minimum or maximum possible score on an instrument cannot be estimated since their levels on the variable are below or above the levels that the instrument represents; their locations and associated SEs are infinite [9, 14]. However, Rasch model estimation software typically set the total scores of such persons at non-minimum and non-maximum values so that persons with extreme scores can be included rather than excluded in further analyses. For example, the Rasch Unidimensional Measurement Model 2030 (RUMM2030) software derives these extrapolated extreme locations (and associated SEs) using a geometric mean algorithm involving the three lowest and highest item location estimates for the items attempted by persons with minimum or maximum possible scores, respectively [9].
For completeness it should also be noted that the Rasch model can be generalized and expressed as the polytomous Rasch model, which is applicable when items have more than two ordered response categories [13, 15,16,17,18]. There are different versions of the polytomous Rasch model, often referred to as the rating scale and partial credit model, respectively [13]. While these variations are mathematically equivalent [17], the rating scale model assumes that response categories are the same and function the same way across all items, whereas the partial credit model does not [13]. The polytomous Rasch model takes the following general form:
$${P}_{nix}=\frac{{exp}^{-{\tau }_{1i}-{\tau }_{2i}\dots -{\tau }_{xi}+x\left({\beta }_{n}-{\delta }_{i}\right)}}{{\sum }_{{x}^{{\prime }}=0}^{{m}_{i}}{exp}^{-{\tau }_{1i}-{\tau }_{2i}\dots -{\tau }_{{x}^{{\prime }}i}+{x}^{{\prime }}\left({\beta }_{n}-{\delta }_{i}\right)}},$$
(3)
where Pnix is the probability of person n to score x on item i, τxi (x = 1, 2, …mi) are the thresholds that divide the latent continuum of item i into mi+1 ordered categories, and x is the observed item score.
As summarized above, given acceptable model fit, the Rasch model enables linearization of raw total scores into interval logit measures with known measurement uncertainties that are appropriate for, e.g., parametric statistics and comparisons of magnitudes [6, 10, 11, 19]. The interval nature of the logit scale means that its origin is unspecified, but the mean item location is typically set to zero and used as an arbitrary origin, where higher and lower raw total scores are represented by higher (positive) and lower (negative) logit values, respectively. However, the occurrence of both positive and negative values that typically are reported to the second or third decimal can be confusing and abstract, particularly to practitioners and researchers that are used to interpret a certain non-negative integer raw total score range.
To facilitate interpretation and use of logits, it is therefore often desirable to transform these into a more user-friendly range that preserves the linear properties without unnecessary loss of information or precision [20]. In some cases, for example with established instruments, it may be desirable to transform the logit into the original integer raw total score range. In other situations, a 0–10 or a 0-100 (or any other user defined) range may be sought. Although it is possible to transform the logit based on any analysis according to the Rasch model, there are some considerations that should be pointed out. Arguably, the purpose of transforming the logit primarily lies in facilitating future use of an instrument in a way that preserves linearity and yields valid estimates of measurement uncertainty of individual person scores without the need to apply the Rasch model. For the transformed scale (as well as the original logit locations) to be generalizable and useful in a wider context (not only for the data at hand) it is therefore recommended that the transformation is based on complete item response data from an appropriate sample that is representative for the instrument’s intended target population. To gain generalizability, estimated locations (of persons, items and response category thresholds) and associated SEs should also be as stable as possible, which is achieved from well-targeted and relatively large samples using the final version of the instrument. For example, malfunctioning response categories or differential item functioning that compromises invariant measurement should have been rectified earlier during the instrument development or revision process.
Transforming the logit
Transformations of the logit is relatively straight forward since they are linear, and the actual procedure and mathematics in doing so were first described by Wright and Stone [14] and later by, e.g., Smith [21] and Smith, Jr. [20]. The basic logit transformation formula is
where y is the new transformed value, m is the location factor (= wanted minimum – current logit minimum × s), s is the spacing factor (= wanted new range / current logit range), and x is the logit measure. The spacing factor preserves the relative size of the intervals between logit measures, and the location factor realigns the scale to a new wanted minimum. As seen in Eq. 4, logit measures may be transformed into any desired new score range, and the defining factor in the transformation is the spacing factor.
In addition to transforming the logit locations, the associated SEs also need to be transformed to provide information on the measurement uncertainty on the new transformed scale. This is achieved by multiplying the spacing factor used in the transformation of the logit locations by the original logit SE. That is,
$${SE}_{y}=s\left({SE}_{x}\right),$$
(5)
where SEy is the new transformed SE, s is the spacing factor and SEx is the original logit SE.
As described earlier [14, 20], there are some considerations that need to be made when conducting logit transformations. Specifically, one needs to consider what range to transform into and whether this range is reasonable from a measurement perspective. For example, if transformed into an integer range that is too wide there is a risk that the transformed scores give the impression of a level of precision that appears better than it is and there will be wide gaps between achievable scores. Conversely, if transformed into a range that is too narrow, precision and information may be lost. Wright and Stone [14] outline different transformations with different properties related to such considerations. Again, the key in these different transformations is the definition of the spacing factor s, and Wright and Stone [14] suggest three alternatives: the least measurable difference (LMD), the standard error of measurement (SEM) and the least significant difference (LSD). These three transformations are summarized below, while details regarding, e.g., their derivations are available elsewhere [14].
The LMD stems from the least observable difference (i.e., one raw score point) and estimates the smallest possible meaningful unit. Therefore, the LMD defines a spacing factor so that the logit LMD represents at least one integer on the new transformed measure. Wright and Stone [14] suggests 6/L as a working definition of the LMD, implying an LMD spacing factor (sLMD) of
$${s}_{LMD}= \frac{1}{6/L}= \frac{L}{6},$$
(6)
where L is the maximum possible raw total score when the minimum score is set at 0. However, Wright and Stone [14] point out that there may be cases where a combination of characteristics of the instrument as well as of the persons means that a spacing factor up to L/4 may be needed to guarantee that each logit location for the raw total scores is transformed into unique integers in the new scale. On the other hand, in some cases a factor down to L/9 may be sufficient to guarantee unique integers. However, an LMD spacing factor of L/6 is recommended unless it renders non-unique integers in the new scale, in which case a redefined LMD spacing factor may be considered.
The SEM based spacing factor relates the transformed scale values to measurement uncertainty. This has an advantage in terms of interpretation since one unit on the new transformed SEM based measure represents roughly one SE, and hence +/- two units represents the approximate 95% confidence interval (CI). However, an obvious disadvantage is that it is somewhat less discriminating than a scale based on the LMD. Wright and Stone [14] relate the SEM to the LMD (SEM = √LMD) and suggest 2.5/√L as a working value for SEM, resulting in a spacing factor (sSEM) of
$${s}_{SEM}= \frac{1}{2.5/\sqrt{L}}= \frac{\sqrt{L}}{2.5} .$$
(7)
However, it should be noted that it is common for instruments to measure with lower precision towards the ends of their ranges, which results in larger SEs for the logit positions towards the ends. Therefore, the principle that one step on the new SEM-based integer scale corresponds to one SE will not necessarily be true for the entire range of the scale, with the ends of the scale being the most common exceptions.
Finally, the LSD represents an estimated lower bound on how coarse the new transformed scale can be without loss of valid information. The working value of LSD has been suggested to be 3.5/√L [14], which corresponds to 1.4 SEM and yields a minimum spacing factor (SLSD) of
$${s}_{LSD}= \frac{1}{3.5/\sqrt{L}}= \frac{\sqrt{L}}{3.5}.$$
(8)
Equations 6, 7 and 8 yields that these spacing factors are ordered relative to one another according to sLMD > sSEM > sLSD, provided that the raw total score range is more than 6 (i.e., L > 6).
Other approaches include transformation of the linear logit measures into the same range as that of the original raw total score or other ranges, for example 0–10 or 0-100. Indeed, this is a common approach in the health sciences [7, 22,23,24,25,26,27,28]. However, regardless of which transformation that is considered, we suggest that the properties of the LMD, SEM and LSD transformations make them useful as a means of quality control and benchmarking in deciding on the most appropriate transformation. If, for example, the new transformed measure has a range that exceeds that from the LMD, this would suggest that the transformed scores give the impression of a level of discrimination that goes beyond their actual precision. Conversely, if the range of the new transformed measure is less than that of the LSD, information will be lost because the transformation is too coarse. However, we are unaware of any studies in the health sciences that have taken advantage of and accounted for the properties of the LMD, SEM and LSD spacing factors in their logit transformations.
Next, we present an empirical illustration of the transformations outlined above and how the LMD, SEM and LSD can be used to enhance interpretation and serve as sources of benchmarking user-defined transformations.