BMC Medical Research Methodology

Background: Database systems have been developed to store data from large medical trials and survey studies. However, a reliable data storage system does not guarantee data entering reliability.


Background
Large survey studies are important for public health policy making and to improve the effectiveness of interventions. Database systems and electronic networks have been developed to render surveys more manageable by providing data storing and analysis [1,2]. Data standardization and accuracy, as well as secure storage are of particular importance in multi-center studies. However, the availability of reliable electronic systems is not enough to guarantee the validity of population-based cross-sectional studies. Indeed, the relevance of a medical survey is largely dependent on two main steps: the quality of data collection in the medical-charts and the fidelity of data transferring from the charts to the electronic system. Any weakness in these two stages will invalidate the study [3][4][5][6][7].
The present study is focused on data-entering reliability. Many techniques, such as combo-boxes, filters that prevent fields being in logical contradiction to other values and the involvement of specialized data-managers or of a single data-manager have been successfully introduced to reduce transcriptional mistakes. However, the process of data entering could still represents a problem for data reliability.
In the present study (SESy-Europe project), conducted within a framework of a nationwide Hellenic survey of cancer screening assessment, we set out to evaluate if a double blind control of the inserted data might have a clear effect on the data-management, thus reducing mistakes during data entering. Furthermore, we evaluated if the inserting position in the insertion-sheet has any impact on occurrence of mistakes. Furthermore, we investigated whether an electronic identification of high-risk insertions might be more sensitive than random control of the questionnaires in identifying data-entering mistakes.

Methods
This study is a part of the Screening Evaluation System Europe (SESy-Europe) project, also known as the PACMeR_01.04 project because it is organized by the Panhellenic Association for Continual Medical Research. SESy-Europe project is a multinational study involving fourteen centres in ten European Nations and tailored to the development of a multilanguage database able to bridge European countries in cancer screening monitoring policy.
In this study, SESy-Europe project has used data coming from medical charts (questionnaires) of a Greek survey that aimed at the evaluation of Hellenic cancer preventive and screening practices (PACMeR_02 study). Details on PACMeR_02 study have been already reported [8,9].
The project was ethically approved by PACMeR's Scientific Committee (protocol number 08_020720) and conformed to the ethical guidelines of the 1975's Declaration of Helsinki.
Data coming from 6446 medical charts (3462 female, 2984 male) and their electronic storing constituted the substrate of the analyses.

Data entering and database
Data storing had been assured by SESy_Europe Database [10,11]. Despite the fact that the database has been tested for data-safety of insertion from multi-centric data-management [12], in this study section all data were inserted by a single data-manager. This has been reported to reduce the inter-data manager errors and facilitate analyses by avoiding data-manager related bias [13].

Study design and blinding
Data insertion had been conducted in four chronologically sequential phases. Each phase constituted of three stages: 1) data entering, 2) control applied to inserted data, 3) correction of mistakes.
During the phase I were recorded and controlled data from the first 325,773 questionnaires. Successively in the phase II were recorded and controlled data from 151,734 questionnaire. Sequentially data from 145,401 and 107,286 questionnaires were recorded and controlled during the phase III and the phase IV respectively.
Data manager could not progress to the next phase of data entering, until all the previous phase procedures (stage 1,2,3) had been concluded. Details for each stage are provided below: First stage (data entering) all data coming from a definite number of medical-charts was recorded in an established peripheral unit of the database (Nafpaktos, Greece).

Second stage (controls applied to inserted data)
The recorded data were electronically sent to the Central unit of the database (Ioannina, Greece) and then transferred to an external commission for electronic control (Milan, Italy). At the same time the registered medical charts were sent to the questionnaires' collection center (Ioannina, Greece) and then to the PACMeR archive (Lixouri Hospital, Greece). Neither the data manager operating in the peripheral unit (Nafpaktos), nor the control units (Milan and Lixouri) were aware of each other, thus assuring that the study was blind.
Data that entered the data-base underwent the following two analyses: We defined as potential mistakes all medical records flagged either by computerized controls (A) or by random selection (B). Potential mistakes triggered hand-searching in hard copies to validate the correspondence between the data contained in the medical records and those in the database. Non-corresponding data were considered real mistakes. Conversely corresponding data were identified as false positive. Lists of potential and real mistakes were thereafter registered for statistical analyses.

Third stage (corrections of mistakes)
A dedicated operator went to the peripheral unit to present the list of real mistakes to the data manager and discuss the related insertions. The same operator was crucial to assure that the data manager in the peripheral unit could not progress to new insertions, until all the real mistakes registered during controls for the previous phase had been corrected and discussed. The operator was instructed to change the data-base ID code of the peripheral unit prior to any new phase of the study for that purpose. The ID code identifies the peripheral unit and the phase of insertion for each electronic record.

Insertion-sheet
considering that the position of entry in the insertionsheet might influence the rate of mistakes (e.g., data entering errors from the last insertion field of a long insertionsheet), we recorded the proportion of real insertion mistakes at the beginning and at the end of the insertion sheet. Therefore, the parameters age and weight at 4 th , 5 th insertion position, respectively, were compared to the parameters age at marriage and age at first sexual intercourse at insertion positions114 and 115 respectively.

Outcomes
we set out: 1. To estimate if the double-blind control of the inserted data and the following corrections might have any effect on the data-manager, reducing mistakes during successive phases of data-entering.
2. To investigate if the position in the insertion sheet has any impact on mistakes occurrence during data-entering.
3. To examine differences in sensitivity for detection of data-entering mistakes by comparing the results obtained analyzing randomly selected insertion sheets against those identified by computerized filters for unreliable data.
Analyses were performed in Intercooled Stata 8.2 (Stata Corp, College Station TX, USA) using chi-square, Pearson chi-square and the metareg module. Unless otherwise specified, all statistical tests are two-tailed and statistical significance is set at p < 0.05.  Table 1.

Population and insertions
The number of "potential mistakes" identified by electronic controls (for each parameter analyzed per each phase) and the number of "real mistakes" encountered during the hand-searching check of "potentially mistakes" on medical charts are reported in Table 2.

Outcome analyses Effect of double blind control on data manager
Double-blind control and mistakes correction has not been found to have any benefit on data entering reliability. The proportion of mistakes in the four phases did not show a statistically significant difference (p = 0.66). On the contrary, meta-regression analysis by phase showed a trend for augmenting the risk of producing mistakes at each successive phase by 1.07, but also this was far from being statistically significant p = 0.27. These results were also confirmed when we calculated the risk ratio for dataentry mistakes in phase I (RR = 1.0) vs. each other phase (phase II RR = 1.082 p = 0.74; phase III RR = 1.059 p = 0.76; phase IV RR = 1.277 p = 0.21).

Position in the insertion-sheet
We found that parameter position in the insertion sheet plays a major role in mistake occurrence (real mistakes); with last insertions being statistically associated with higher rate of mistakes than the insertions at the beginning. This was evident during each phase of the study for any type of control considered (electronic or random selection). Proportion of mistakes observed in last insertion fields was notably lower for combo-boxes than those for numerical values. Table 3 Random vs. electronic check When electronic control was compared against the random selection of questionnaires, it was found to be statistically more effective in evidencing mistakes (real mistakes) in two of the three parameters analyzed: "Age" 1/800 vs. 11/416 p < 0.001, "number of children" 9/800 vs. 12/223 p < 0.001. Filter used for "age at marriage" produced a large number of false positive and displayed a positive trend but did not reach statistical significance (15/424 vs. 5/336 p = 0.080).

Discussion and conclusion
Large research projects offer significant advantages but there is always a problem concerning data collection and processing. It is important to ensure that information is entered into the database consistently and accurately [15,16]. Our study evaluated some methods for controlling data-entering. While modern data-entry technologies have greatly reduced entry errors by use of quality control mechanisms [4], even a small proportion of mistakes can have a great impact on a study's results. Inadvertent random and systemic errors introduced into datasets and their manipulation are well-defined sources of bias in the statistical evaluation of clinical trials. Recently, Marks suggested the elimination of paper from clinical data capture and the use of computers from the start in order to maximize data-reliability [14]. However, elimination of hardcopies is usually not possible, thus many efforts had been done to reduce data-entering mistakes.
Besides studying electronic control in data-entering, the consequence of double data entry compared to single entry had been investigated in a double-blind setting, but data entry error rates were not significantly reduced [13]. This result may be explained by the fact that a single datamanager may reduce the inter data-manager bias and since errors are systematic they will be more easily identified than in a double data entry setting. The use of a single data-manager is important also from economical standpoint since the cost of a single data-manager was notably lower than a double-blind control system with double data entry [13].
For all the above reasons, our study had been performed by a single data-manager and presents the novelty to test not only for the impact of a double-blind control but also for the sequential (by phase) educational sessions on data-entry mistakes, as well. While it was hypothesized that this high quality controls might reduce the rate of insertion mistakes, our study showed that this combined approach did not seem to be effective and its use is therefore not recommended. Not only there was absence of improved data-entry reliability, but the double blind control sessions were associated with interruptions in the workflow of the data-manager (time and working-hours Insertions concerning only females ( # ), insertion-fields type: combo-boxes (c) and numeric (n). lost), useless employment of personnel and waste of resources and consequently increased expenditures. These results might be partially explained by the fact that welltrained and well-monitored data entry staffs are not the weakest link in the data management chain [17].
Our study also suggests that the position in the insertion field plays a very important role in the proportion of mistakes. The last positions are associated with more mistakes than the initial ones, especially when numeric fields are considered. This has been attributed to the fatigue of the data-manager when questionnaires have too many entries. These results therefore suggest that to create more effective questionnaires the most important information should be collected in the first fields, the number of insertion-fields per insertion-sheet should be reduced and combo-boxes or text-boxes should be used instead of fields with direct numerical insertion (especially in the last part of the questionnaire).
Furthermore, we found that electronic controls for insertion mistakes are more effective than manual searching of randomly selected medical charts: electronic search is far simpler; it is associated with lower time loss and reduced need of personnel. Its use is therefore recommended in quality-control for data-storing processes.
One limitation of this study is that it was based on a single data manager, thus it is difficult to generalize our conclusions. However, it should be remembered that the decision to use a single data-manager was introduced to improve data entry-reliability by reducing inter data-manager bias [13]. Keeping in mind these limitations, we nevertheless believe that our conclusions are useful and may help guide data-management decisions and improve dataentering reliability. For random selection of questionnaires we evaluated the number of mistakes encountered vs. the number of questionnaire randomized for each field controlled.