Web-based computer adaptive assessment of individual perceptions of job satisfaction for hospital workplace employees

Background To develop a web-based computer adaptive testing (CAT) application for efficiently collecting data regarding workers' perceptions of job satisfaction, we examined whether a 37-item Job Content Questionnaire (JCQ-37) could evaluate the job satisfaction of individual employees as a single construct. Methods The JCQ-37 makes data collection via CAT on the internet easy, viable and fast. A Rasch rating scale model was applied to analyze data from 300 randomly selected hospital employees who participated in job-satisfaction surveys in 2008 and 2009 via non-adaptive and computer-adaptive testing, respectively. Results Of the 37 items on the questionnaire, 24 items fit the model fairly well. Person-separation reliability for the 2008 surveys was 0.88. Measures from both years and item-8 job satisfaction for groups were successfully evaluated through item-by-item analyses by using t-test. Workers aged 26 - 35 felt that job satisfaction was significantly worse in 2009 than in 2008. Conclusions A Web-CAT developed in the present paper was shown to be more efficient than traditional computer-based or pen-and-paper assessments at collecting data regarding workers' perceptions of job content.


Background
Many previous studies have reported on the relationships between job satisfaction, psychological distress, psychosocial processes and stress-related biological factors [1][2][3][4][5]. Amati et al. [1] reported that job satisfaction is related to psychological stress affecting cellular immune function and that changes in work satisfaction over time could affect the immunological-inflammatory status of workers. Optimizing the ways in which healthcare providers use institutional services to maximize the likelihood of positive health outcomes is thus urgent and essential [6,7].

Standardized assessments of health status
Within survey or research settings, there are two routinely used forms of standardized health status assessments [8].
(1) A lengthy and structured interview conducted by experts to systematically investigate the presence and nature of each symptom of every disorder (this is often considered the ''gold standard'' in psychiatric diagnosis by researchers [9,10], but it requires significant amounts of time and training to administer).
(2) A rapid assessment instrument that attempts to briefly screen for the most common symptoms of psychiatric disorders by using a cut-off point to identify degrees of impairment based on specific scores (e.g., sleep, the quality-of-life scale [11], the Job Content Questionnaire (JCQ) [12], and the Beck Anxiety and Depression Inventories [13]).
The length and complexity of many fixed-form instruments are problematic and raise concerns about both the burden on respondents and the administration costs [14,15]. Conversely, the shift to shorter fixed-form versions of patient-reported instruments has raised concern over possible resultant losses of precision and reliability [16] as well as insensitivity to clinically meaningful changes [17].
* Correspondence: shihbin.su@msa.hinet.net 6 Institute of Biomedical Engineering, Southern Taiwan University, Tainan, Taiwan Full list of author information is available at the end of the article

CAT reduces the burden on patients and diagnosticians
Studies have shown that computer adaptive testing (CAT) can save time and alleviate the burdens on both examinees (e.g., patients) and test administers (e.g., diagnosticians), as compared to traditional computer-based or pen-and-paper assessments [18][19][20][21]. CAT, which is based on item response theory (IRT) [21], is a testadministration method that tailors the assessment to the latent-trait level of the examinee. Only items that are neither too hard, nor too easy, are administered. IRTbased CAT has attracted much attention because of its better control of item exposure and lower cost of item development for medical and healthcare professionals [22,23]. CAT can efficiently collect data from examinees and identify the degree of severity of each symptom of disorder. Thus, CAT overcomes the shortcomings of the two traditional forms of standardized assessments in clinical settings, both the burdens associated with lengthy assessments and the loss of precision and reliability of shorter fixed-form assessments.

Item-by-item questionnaire analyses
Although CAT and the aforementioned lengthy and short assessments are all used to obtain composite scores for measurement, item-by-item analyses are also common in research reports. In item-by-item analyses, perception changes between groups are compared across items. One item (or one composite score) is assessed at a time [22] by traditional one-way ANOVA, by a t-test, or even by Pearson's chi-square test [6]. Recently, itemby-item skewness analysis by a bootstrapping procedure has been reported as effective for identifying quality-oflife concerns of patients [24]. The problem we face when using CAT is how to obtain the specific responses interacted by item and person because only individual measures were stored in the CAT module.

Study Objectives
This study aimed to answer two questions: (1) Can a CAT be used via a website to facilitate more efficient response collection for the self-evaluation of job satisfaction by workers? and (2) Is it possible to generate data using the Rasch model (1960) to assess achievement through item-by-item analysis?

Study participants and research instrument
The study was conducted in a 1,200-bed hospital in Taiwan. One-tenth of hospital employees were randomly enrolled for surveys of job satisfaction in September of 2008 and 2009. The self-administered 37-item Job Content Questionnaire (JCQ-37) was designed for use on a website via NAT (non-adaptive testing) in 2008 and CAT assessments with 24 items in 2009 was provided to workers.
The response rates were 92.6% and 91.1% for 2008 and 2009, respectively. This study was approved and monitored by the administration units of the hospital.

Instrument selection (1) Questionnaire
Eight items related to supervisors and coworker-support in the Chinese version of the JCQ (C-JCL) [25] were combined with 29 other items regarding job satisfaction to form the 37-item Job Content Questionnaire (JCQ-37). The questionnaire covered the following six domains: welfare and the environment (measured by eight items), institutional image (measured by five items), intra-and inter-department relationship (measured by seven and five items, respectively) and personal professional learning and working conditions (measured by five and seven items, respectively). For each item, the response was recorded using a four-point Likert scale ranging from 1 (strongly disagree) to 4 (strongly agree).

(2) Rasch analysis
We constructed a user-friendly Web-CAT self-rated questionnaire assessment to help provide hospital services based on individual needs as identified from relevant descriptions of job satisfaction. Construction of a unidimensional assessment to measure job satisfaction was required. The Rasch rating scale model [26,27] and WINSTEPS software [28] were used to examine the 2008 responses to JCQ-37 by workers and to determine whether these responses could form a unidimensional measurement. The items meeting the requirements of the Rasch model (unidimensionality and data-model fit) were the items used to construct the Web-CAT in 2009.

(3) Unidimensionality
Rasch modeling has been reported to be superior to factor analysis for confirming one factor structure [29]. Using Rasch analyses to assess unidimensionality has been the subject of much discussion in the literature [30][31][32][33]. Tennant and Pallant [34] and Richard Smith [35] suggested that exploratory factor analysis (EFA), especially using parallel analysis [36], should be undertaken to assess the dimensionality of the study data. Several studies [24,[37][38][39] have used principal component analysis (PCA) of the standardized residuals to verify that items fit the assumption of unidimensionality. Certain criteria are suggested to determine whether the standardized residuals conform to unidimensionality: 1) a cutoff at 60% of the variance explained by the Rasch factor and 2) the first eigenvalues on residuals smaller than 3 and the percentage of the variance explained by the first contrast of less than 5% [40,41]. Poor-fitting items with a mean square error (MNSQ) beyond the range of 0.5-1.5 were discarded from the questionnaire to guarantee unidimensional interval measures in a logit unit (i.e., log odds) [27,40,42].

Web-CAT assessment
We designed a CAT questionnaire that complies with rules and criteria for CAT-based testing on the internet http://www.healthup.org.tw/irt_test4/irt_start.htm.
Based on person-separation reliability (e.g., Rasch_rel, similar to Cronbach's alpha) calculated from the jobsatisfaction survey conducted in 2008, the CAT termination rule for measurement of standardized error (MSE) is determined by formula (1) [43].
where, SD x represents the standard deviation of person measures estimated in 2008. We also defined another termination rule for CAT so that the minimum number of items required for completion of the CAT questionnaire was 10. The initial item was selected according to the overall job-satisfaction level designated by the examinee's response at the beginning of the CAT questionnaire. When an examinee rated the CAT questionnaire after completing three items on the web, the computer could update the estimate of the examinee's satisfaction level (ability) after each subsequent item's answer was complete. The provisional-person measures was estimated by the iterative Newton-Raphson procedure [18,44], a brief algorism was presented in Additional file 1. The next item selected was that with the most information about the provisional-person measures in the remaining unanswered items.

Generation of person responses across items
Only individual measures were stored in the CAT module. We should thus generate appropriate responses for each person and each item so that item-by-item comparisons can be made over several years. A standard item-response generation method, as used in previously published papers [24,[45][46][47][48], was conducted using the Rasch rating scale model. An Excel routine was demonstrated in Additional file 1. Table 1 compares the demographic characteristics of the study sample in 2008 and 2009. The average age and the mean duration of work tenure were 34 and 8.5 years, respectively. The majority of respondents were female (79%) and only 12-14% were physicians. Chi-square tests showed that gender, occupation, age and work tenure were not significantly different between the two assessment years (p > 0.05).

Unidimensional validity and the identification of concerns
Of the 37 items, 24 items in the 2008 survey, fit the expectations of the Rasch model well, with an Infit MNSQ range of 0.50-1.50 (shown in Table 2). The most difficult (i.e., rarest in frequency) item to obtain was a well-designed hospital-to-worker message delivery system (item 11; 2.73 logits in 2008). In contrast, the easiest (i.e., most common occurrence) was always maintaining a happy mood at work (item 33; -0.68 logits in 2008). Person-separation reliability was 0.88 for 2008. The standard deviation and mean of person measures were 1.99 and 2.30, respectively. The termination rule for CAT was thus set at SEM = 0.68 [1.99 × sqrt(1-0.88)] according to formula (1).
The principal components analysis of the residuals demonstrated that the 24-item scale accounted for 52.2% of the raw variance explained by the measures. The first contrast had an eigenvalue of 1.8 (less than 3 [41]) and accounted for 4.2% (less than 5% [40]) of the total variance, suggesting that the 24-item scale can be regarded as substantially unidimensional. A parallel analysis also indicated that the 24-item questionnaire regarding job satisfaction measures a common entity. These findings indicate that these 24 items measured a single construct for job satisfaction. The three intersection parameters (also called the step calibrations [48]) under the Rasch rating scale model for the 24-item questionnaire were set at -4.16, -1.50 and 2.66 logits. These thresholds are congruent with the guidelines proposed by Linacre [49] as follows: (1) average measures advance monotonically within each category, (2) step calibrations advance, (3) step difficulties advance by at least 1.4 logits and (4) step difficulties advance by less than 5.0 logits.

Web-CAT performance
Based on the finding of a unidimensional construct in Table 2, we embedded the stop rules of SEM = 0.68 and the minimal corresponding item length = 10 into the CAT questionnaire. The Web-CAT is at http://www. healthup.org.tw/irt_test4/irt_start.htm. Table 3 shows an example of a CAT report: (1) The person measure (θ) begins to be estimated at step 4. The final logit is -1.08 and is stopped at step 10 when SE is equal to or less than a SEM of 0.68. (2) The probabilities corresponding to each item difficulty (δ) are in agreement with formula (2) under the Rasch rating scale model [26]: where P nij and P ni(j-1) are the probabilities of being scoring j and j -1 in item i for person n, θ n is the ability of person n, δ i is the difficulty of item i, and τ j is the j-th step difficulty. (see Additional file 1). (3) Outfit MNSQ for CAT was determined by the average squared residuals (i.e., squared observation minus the expected score and then divided by the variance, see Additional file 1) across all items. The outfit MNSQ terminated the CAT procedures once the item length was longer than 10 or the MNSQ was greater than 10. An outfit MNSQ of greater than 2.0 was referred to the aberrant responses given by the person [50] (Figure 1). We assumed that aberrant respondents, participants' guessing, inattentiveness, carelessness and coaxing could be caused by fatigue, misunderstanding, or a poor fit of the examinee for evaluation based on item-response theory [51,52]. Z-scores beyond +/-1.96 were marked on observation with a symbol * to designate that an unexpected response was given to a specified item (p < .05).

Item difference between years
Taking item 8 (salary and wage levels compared with other hospitals) as an example, we examined differences between 2008 and 2009 with the t-test, shown in Table  4. In general, the 2008 perceptions had a higher mean score (i.e., more satisfied) than those in 2009, except that the participants aged greater than 55 showed no difference on item 8 between years. Other items were analyzed similarly. Due to space constraints, the results are not reported but available on request.

Features (1) Key findings
The very group worthy of concern for the studied hospital is workers aged 26-35, who had a substantially lower job satisfaction in 2009 than in 2008. Female nurses with work tenure beyond 18 years showed the most significant deterioration, whereas workers aged greater than 55 showed no difference, on item 8 (salary and wage levels compared with other hospitals) between 2008 and 2009.
(2) What this study contributes to current knowledge This study develops a CAT to examine workers' perceptions of job satisfaction and demonstrates its advantages in reducing the burdens associated with lengthy assessments and improving the measurement precision than non-adaptive testing.
(3) Implications of the results and suggested actions There were two major implications: (1) The Web-CAT (especially when adopting a polytomous as opposed to a dichotomous item design) can be used as a tool for hospital workers to measure their perceptions of job satisfaction, and (2) a standard item-response generation method referring to individual measures estimated by CAT could be applied to item-by-item comparisons. An Excel routine was demonstrated in Additional file 1.

Study strengths (1) Using CAT and the t-test to compare individual differences on measures and items across years
From a management perspective, promotion of the health of workers has emerged as an important issue [53,54]. Many workplaces now routinely conduct jobsatisfaction surveys for employees. Using a questionnaire to measure differences between groups and across items over several years is thus necessary. Providers can rapidly obtain input from workers by means of the results of Web-CAT assessments for individual examinees and the t-test for specific items (or composite scores). Such evaluation is useful for individual and group comparison.
(2) Web-CAT saves time and reduces burdens compared with traditional non-adaptive tests To maximize the likelihood of achieving a desired health promotion outcome, workers are provided with a Web- CAT report that reveals their perceptions of job satisfaction. In contrast to traditional non-adaptive assessment methods, this feature saves time and alleviates burdens on examinees and diagnosticians by immediately transmitting messages. The system also can detect aberrant responses with CAT report cards (Table 3), by outfit MNSQ [47] and by Z-residual scores [18,22,24,27]. By identifying unexpected responses to items, diagnosticians are more likely to notice when feedback messages contain unexpected responses from individual examinees.

(3) Polytomous CAT module developed in this study
Many studies investigating IRT-and CAT-based tests using dichotomous items have evaluated both the efficiency and precision of CAT-based tests in the educational, psychometrical and medical fields. However, few studies examine CAT with polytomous items applied to satisfaction surveys. This study especially demonstrated a Web-CAT module for interested readers to practice at http://www.healthup.org.tw/irt_test4/irt_start.htm.

Study limitations
Because many studies have shown that CAT can save time and alleviate burdens on examinees compared to traditional non-adaptive computer-based or pen-andpaper assessments [18][19][20][21], we thus did not demonstrate the efficiency and precision of CAT as compared to  non-adaptive assessments. Obtaining high quality examinee feedback from CAT assessments is essential to produce accurate results, and adequate training is required to facilitate an efficient health-promotion system. Without such results and training, it will be extremely difficult for readers to understand the computation of outfit and infit statistics with regard to probability and outfit MNSQ disclosed in Table 3. In this study, the job-satisfaction questionnaire was used as a tool to collect information about workers' perceptions using the CAT feedback system. Accordingly, diagnosticians may need training to interpret the results of the data adequately.

Problems in application and daily use (1) Applications of CAT
Traditionally, all examinees' responses have to be collected and saved for further analyses, which can be very tedious. In this study, we used the Web-Cat at http:// www.healthup.org.tw/irt_test4/irt_start.htm to record item responses of all examines. One can easily apply CAT to any kind of questionnaires. The availability and accessibility of information technology and item response theory makes CAT implementation simple and easy. Those who are interested in CAT implementation can consult the textbook [42] and the following websites: http://www.eddata.com/resources/publications/ EDS_Rasch_Demo.xls (for information on the iteration of person estimation and item calibration), http://www. rasch.org/rmt/rmt34e.htm (for information on the computation of outfit and infit statistics) and http://www. rasch.org/rmt/rmt213a.htm (for information on the method to simulate Rasch data). Other relevant information regarding CAT algorithms such as the Newton-Raphson method, item information and SE are shown in Additional file 1.

(2) Generation of person responses across items
It is impossible to collect all the necessary response data as traditional computer-based or pen-and-paper assessments when applying CAT. Person responses across all items should be statistically yielded if item-by-item analyses across groups are required for comparisons. The standard item-response generation method introduced in previously published papers [24,[45][46][47][48] is worth consulting for further reference.

Conclusion
The outcomes of this study, especially for the item parameters presented in Table 2, imply that the Web-CAT is a useful tool for examining job satisfaction in hospital work sites. Future studies can further investigate the job-satisfaction cut-off point for hospital workers for the purpose of improving job-satisfaction perceptions and promoting mental health in the workplace. A Web-CAT with graphs and animations will be developed by the authors in the near future.

Additional material
Additional file 1: Expected scores obtained by the Rasch model's probability theory. Excel-VBA program for randomly generating Rasch model's expected scores.
List of abbreviations CAT: computer adaptive testing; EFA: exploratory factor analysis; JCQ: job content questionnaire; IRT: item response theory; MNSQ: mean square error; MSE: standardized error of measurement; NAT: non-adaptive testing; PA: parallel analysis; VBA: visual basic for application