- Research article
- Open Access
Double–blind control of the data manager doesn't have any impact on data entry reliability and should be considered as an avoidable cost
BMC Medical Research Methodology volume 8, Article number: 66 (2008)
Database systems have been developed to store data from large medical trials and survey studies. However, a reliable data storage system does not guarantee data entering reliability.
We aimed to evaluate if double-blind control of the data manager might have any effect on data-reliability. Our secondary aim was to assess the influence of the inserting position in the insertion-sheet on data-entry accuracy and the effectiveness of electronic controls in identifying data-entering mistakes.
A cross-sectional survey and single data-manager data entry.
Data from PACMeR_02 survey, which had been conducted within a framework of the SESy-Europe project (PACMeR_01.4), were used as substrate for this study. We analyzed the electronic storage of 6446 medical charts. We structured data insertion in four sequential phases. After each phase, the data stored in the database were tested in order to detect unreliable entries through both computerized and manual random control. Control was provided in a double blind fashion.
Double-blind control of the data manager didn't improve data entry reliability. Entries near the end of the insertion sheet were correlated with a larger number of mistakes. Data entry monitoring by electronic-control was statistically more effective than hand-searching of randomly selected medical records.
Double-blind control of the data manager should be considered an avoidable cost. Electronic-control for monitoring of data-entry reliability is suggested.
Large survey studies are important for public health policy making and to improve the effectiveness of interventions. Database systems and electronic networks have been developed to render surveys more manageable by providing data storing and analysis [1, 2]. Data standardization and accuracy, as well as secure storage are of particular importance in multi-center studies. However, the availability of reliable electronic systems is not enough to guarantee the validity of population-based cross-sectional studies. Indeed, the relevance of a medical survey is largely dependent on two main steps: the quality of data collection in the medical-charts and the fidelity of data transferring from the charts to the electronic system. Any weakness in these two stages will invalidate the study [3–7].
The present study is focused on data-entering reliability. Many techniques, such as combo-boxes, filters that prevent fields being in logical contradiction to other values and the involvement of specialized data-managers or of a single data-manager have been successfully introduced to reduce transcriptional mistakes. However, the process of data entering could still represents a problem for data reliability.
In the present study (SESy-Europe project), conducted within a framework of a nationwide Hellenic survey of cancer screening assessment, we set out to evaluate if a double blind control of the inserted data might have a clear effect on the data-management, thus reducing mistakes during data entering. Furthermore, we evaluated if the inserting position in the insertion-sheet has any impact on occurrence of mistakes. Furthermore, we investigated whether an electronic identification of high-risk insertions might be more sensitive than random control of the questionnaires in identifying data-entering mistakes.
This study is a part of the Screening Evaluation System Europe (SESy-Europe) project, also known as the PACMeR_01.04 project because it is organized by the Panhellenic Association for Continual Medical Research. SESy-Europe project is a multinational study involving fourteen centres in ten European Nations and tailored to the development of a multilanguage database able to bridge European countries in cancer screening monitoring policy.
In this study, SESy-Europe project has used data coming from medical charts (questionnaires) of a Greek survey that aimed at the evaluation of Hellenic cancer preventive and screening practices (PACMeR_02 study). Details on PACMeR_02 study have been already reported [8, 9].
The project was ethically approved by PACMeR's Scientific Committee (protocol number 08_020720) and conformed to the ethical guidelines of the 1975's Declaration of Helsinki.
Data coming from 6446 medical charts (3462 female, 2984 male) and their electronic storing constituted the substrate of the analyses.
Data entering and database
Data storing had been assured by SESy_Europe Database [10, 11]. Despite the fact that the database has been tested for data-safety of insertion from multi-centric data-management , in this study section all data were inserted by a single data-manager. This has been reported to reduce the inter-data manager errors and facilitate analyses by avoiding data-manager related bias .
Study design and blinding
Data insertion had been conducted in four chronologically sequential phases. Each phase constituted of three stages: 1) data entering, 2) control applied to inserted data, 3) correction of mistakes.
During the phase I were recorded and controlled data from the first 325,773 questionnaires. Successively in the phase II were recorded and controlled data from 151,734 questionnaire. Sequentially data from 145,401 and 107,286 questionnaires were recorded and controlled during the phase III and the phase IV respectively.
Data manager could not progress to the next phase of data entering, until all the previous phase procedures (stage 1,2,3) had been concluded. Details for each stage are provided below:
First stage (data entering)
all data coming from a definite number of medical-charts was recorded in an established peripheral unit of the database (Nafpaktos, Greece).
Second stage (controls applied to inserted data)
The recorded data were electronically sent to the Central unit of the database (Ioannina, Greece) and then transferred to an external commission for electronic control (Milan, Italy). At the same time the registered medical charts were sent to the questionnaires' collection center (Ioannina, Greece) and then to the PACMeR archive (Lixouri Hospital, Greece). Neither the data manager operating in the peripheral unit (Nafpaktos), nor the control units (Milan and Lixouri) were aware of each other, thus assuring that the study was blind.
Data that entered the data-base underwent the following two analyses:
A. Computerized controls for possible unreliable data (by electronic filters e.g.: height < 140 or > 195 cm, weight < 40 or > 120 kg, age at first parturition < 18 or > 40, BMI < 17 or > 41 etc.), [Milan]. [see additional file 1]
B. Random controls of 200 medical records (randomization by table of random numbers), [Ioannina]
We defined as potential mistakes all medical records flagged either by computerized controls (A) or by random selection (B). Potential mistakes triggered hand-searching in hard copies to validate the correspondence between the data contained in the medical records and those in the database. Non-corresponding data were considered real mistakes. Conversely corresponding data were identified as false positive. Lists of potential and real mistakes were thereafter registered for statistical analyses.
Third stage (corrections of mistakes)
A dedicated operator went to the peripheral unit to present the list of real mistakes to the data manager and discuss the related insertions. The same operator was crucial to assure that the data manager in the peripheral unit could not progress to new insertions, until all the real mistakes registered during controls for the previous phase had been corrected and discussed. The operator was instructed to change the data-base ID code of the peripheral unit prior to any new phase of the study for that purpose. The ID code identifies the peripheral unit and the phase of insertion for each electronic record.
considering that the position of entry in the insertion-sheet might influence the rate of mistakes (e.g., data entering errors from the last insertion field of a long insertion-sheet), we recorded the proportion of real insertion mistakes at the beginning and at the end of the insertion sheet. Therefore, the parameters age and weight at 4th, 5th insertion position, respectively, were compared to the parameters age at marriage and age at first sexual intercourse at insertion positions114 and 115 respectively.
we set out:
1. To estimate if the double-blind control of the inserted data and the following corrections might have any effect on the data-manager, reducing mistakes during successive phases of data-entering.
2. To investigate if the position in the insertion sheet has any impact on mistakes occurrence during data-entering.
3. To examine differences in sensitivity for detection of data-entering mistakes by comparing the results obtained analyzing randomly selected insertion sheets against those identified by computerized filters for unreliable data.
Analyses were performed in Intercooled Stata 8.2 (Stata Corp, College Station TX, USA) using chi-square, Pearson chi-square and the metareg module. Unless otherwise specified, all statistical tests are two-tailed and statistical significance is set at p < 0.05.
Population and insertions
PACMeR_02 surveyed 6446 individuals (2984 males,3462 females) for a total of 730,194 insertions were registered in the central table of the database (362,604 for females and 260,304 males respectively). The exact numbers of insertions per phase and for all analyzed fields are reported in Table 1.
The number of "potential mistakes" identified by electronic controls (for each parameter analyzed per each phase) and the number of "real mistakes" encountered during the hand-searching check of "potentially mistakes" on medical charts are reported in Table 2.
Effect of double blind control on data manager
Double-blind control and mistakes correction has not been found to have any benefit on data entering reliability. The proportion of mistakes in the four phases did not show a statistically significant difference (p = 0.66). On the contrary, meta-regression analysis by phase showed a trend for augmenting the risk of producing mistakes at each successive phase by 1.07, but also this was far from being statistically significant p = 0.27. These results were also confirmed when we calculated the risk ratio for data-entry mistakes in phase I (RR = 1.0) vs. each other phase (phase II RR = 1.082 p = 0.74; phase III RR = 1.059 p = 0.76; phase IV RR = 1.277 p = 0.21).
Position in the insertion-sheet
We found that parameter position in the insertion sheet plays a major role in mistake occurrence (real mistakes); with last insertions being statistically associated with higher rate of mistakes than the insertions at the beginning. This was evident during each phase of the study for any type of control considered (electronic or random selection). Proportion of mistakes observed in last insertion fields was notably lower for combo-boxes than those for numerical values. Table 3
Random vs. electronic check
When electronic control was compared against the random selection of questionnaires, it was found to be statistically more effective in evidencing mistakes (real mistakes) in two of the three parameters analyzed: "Age" 1/800 vs. 11/416 p < 0.001, "number of children" 9/800 vs. 12/223 p < 0.001. Filter used for "age at marriage" produced a large number of false positive and displayed a positive trend but did not reach statistical significance (15/424 vs. 5/336 p = 0.080).
Discussion and conclusion
Large research projects offer significant advantages but there is always a problem concerning data collection and processing. It is important to ensure that information is entered into the database consistently and accurately [15, 16]. Our study evaluated some methods for controlling data-entering. While modern data-entry technologies have greatly reduced entry errors by use of quality control mechanisms , even a small proportion of mistakes can have a great impact on a study's results. Inadvertent random and systemic errors introduced into datasets and their manipulation are well-defined sources of bias in the statistical evaluation of clinical trials. Recently, Marks suggested the elimination of paper from clinical data capture and the use of computers from the start in order to maximize data-reliability . However, elimination of hard-copies is usually not possible, thus many efforts had been done to reduce data-entering mistakes.
Besides studying electronic control in data-entering, the consequence of double data entry compared to single entry had been investigated in a double-blind setting, but data entry error rates were not significantly reduced . This result may be explained by the fact that a single data-manager may reduce the inter data-manager bias and since errors are systematic they will be more easily identified than in a double data entry setting. The use of a single data-manager is important also from economical standpoint since the cost of a single data-manager was notably lower than a double-blind control system with double data entry .
For all the above reasons, our study had been performed by a single data-manager and presents the novelty to test not only for the impact of a double-blind control but also for the sequential (by phase) educational sessions on data-entry mistakes, as well. While it was hypothesized that this high quality controls might reduce the rate of insertion mistakes, our study showed that this combined approach did not seem to be effective and its use is therefore not recommended. Not only there was absence of improved data-entry reliability, but the double blind control sessions were associated with interruptions in the workflow of the data-manager (time and working-hours lost), useless employment of personnel and waste of resources and consequently increased expenditures. These results might be partially explained by the fact that well-trained and well-monitored data entry staffs are not the weakest link in the data management chain .
Our study also suggests that the position in the insertion field plays a very important role in the proportion of mistakes. The last positions are associated with more mistakes than the initial ones, especially when numeric fields are considered. This has been attributed to the fatigue of the data-manager when questionnaires have too many entries. These results therefore suggest that to create more effective questionnaires the most important information should be collected in the first fields, the number of insertion-fields per insertion-sheet should be reduced and combo-boxes or text-boxes should be used instead of fields with direct numerical insertion (especially in the last part of the questionnaire).
Furthermore, we found that electronic controls for insertion mistakes are more effective than manual searching of randomly selected medical charts: electronic search is far simpler; it is associated with lower time loss and reduced need of personnel. Its use is therefore recommended in quality-control for data-storing processes.
One limitation of this study is that it was based on a single data manager, thus it is difficult to generalize our conclusions. However, it should be remembered that the decision to use a single data-manager was introduced to improve data entry-reliability by reducing inter data-manager bias . Keeping in mind these limitations, we nevertheless believe that our conclusions are useful and may help guide data-management decisions and improve data-entering reliability.
SESy_Europe task Force: Francisco Javier Rivas Flores (San Rafael Hospital, Madrid -Spain-); Hilal Altinoz (SSK Sureyyapasa, Thoracic Disease Center, Istanbul -Turkey-); Marzanna Chojnacka (Maria Sklodowska-Curie Memorial Cancer Center, Warsaw -Poland-) Irini Karentzou (University school of Medicine, Cologne -Germany-); Camelia Colichi (Institute of Oncology, Bucharest -Romania-) Tamara Oxiuzova and Eleni Kanavoura (University school of medicine, Ioannina -Greece-); Berta Adelaide Maia da Silva Alves de Sousa (Portuguese Oncology Institute IPOPFG-EPE, Porto -Portugal-); Diana Ivanova (PACMeR, Athens -Greece-), Mario Dambrosio (Multimedica Hospital, Milan -Italy-).
Lee N, Millman A, Osborne M, Cox J: ABC of medical computing. Storing and managing data on a computer. BMJ. 1995, 311 (7004): 562-565.
Millman A, Lee N, Brooke A: ABC of medical computing. Computers in general practice – I. BMJ. 1995, 311 (7008): 800-802.
Patel PP: Data validation. Clinical data management. Edited by: Rondel RK, Varley SA, Webb CF. 2000, West Sussex: Wiley and Sons
Mullooly JP: The effects of data entry error: an analysis of partial verification. Comput Biomed Res. 1990, 23: 259-267. 10.1016/0010-4809(90)90020-D.
Levitt SH, Aeppli DM, Potish RA, Lee CK, Nierengarten ME: Influences on inferences: effect of errors in data on statistical evaluation. Cancer. 1993, 72: 2075-2082. 10.1002/1097-0142(19931001)72:7<2075::AID-CNCR2820720704>3.0.CO;2-#.
Arndt S, Tyrrell G, Woolson RF, Flaum M, Andreasen NC: Effects of errors in a multicenter medical study: preventing misinterpreted data. J Psychiatr Res. 1994, 28: 447-459. 10.1016/0022-3956(94)90003-5.
Crombie IK, Irving JM: An investigation of data entry methods with a personal computer. Comput Biomed Res. 1986, 19: 543-550. 10.1016/0010-4809(86)90028-5.
Kamposioras K, Casazza G, Mauri D, Velisarios Lakiotis V, Cortinovis I, Xilomenos A, Peponi C, Golfinopoulos V, Milousis A, Kakaridis D, Zacharias G, Karathanasi I, Ferentinos G, Proiskos A: Screening chest radiography: results from a Greek cross-sectional survey. BMC Public Health. 2006, 29: 113-10.1186/1471-2458-6-113.
Kamposioras K, Mauri D, Golfinopoulos V, Ferentinos G, Zacharias G, Xilomenos A, Polyzos NP, Bristianou M, Chasioti D, Milousis A, Vittoraki A, Koukourakis G, Chatziioannou I, Papadopoulos P: Colorectal cancer screening coverage in Greece. PACMeR 02.01 study collaboration. Int J Colorectal Dis. 2007, 22: 475-81. 10.1007/s00384-006-0186-6.
Mauri D, Kamposioras K, Polyzos NP, Rivas Flores FJ, Altinoz H, Chojnacka M, Karentzou I, Dambrosio M, Colichi C, Oxiuzova T, Kanavoura E, da Silva Alves de Sousa BA, Ivanova D, Mauri J, Karampoiki V, Maragkaki A, Xilomenos A: Rethinking anticancer screening strategies saving lives at front line. Results from SESy_Europe task force. Exp Oncol. 2006, 28 (3): 252-3.
Mauri J, Mauri D, Pazarlis P, Altinoz H, Rivas Flores FJ, Karentzou I, Proiskos A, Lakiotis V, Alevizaki P, Terzoudi A, Dambrosio M, Spiliopoulou A, Alexandropoulou P, Kalogerakis D, Varsami A: PC 3–component database for community-based medical trials. A cost–effective solution both for voluntary associations and for institutions of the "Emerging World". Gazz Med Ital – Arch Sci Med. 2004, 163: 189-194.
Mauri D, Pazarlis P, Mauri J, Altinoz H, Rivas Flores FJ, Karentzou I, Proiskos A, Lakiotis V, Maragkaki A, Terzoudi E, Dambrosio EM, Spiliopoulou A, Varsami A, Alexandropoulou P, Tolis C, Pavlidis N, Vittoraki A: SESy–Europe: a multi–language database dedicated to cancer screening monitoring. J Exp Clin Cancer Res. 2004, 23: 441-445.
Reynolds-Haertle RA, McBride R: Single vs. double data entry in CAST. Control Clin Trials. 1992, 13: 487-494. 10.1016/0197-2456(92)90205-E.
Marks RG: Validating electronic source data in clinical trials. Control Clin Trials. 2004, 25: 437-446. 10.1016/j.cct.2004.07.001.
Los RK, van Ginneken AM, Roukema J, Moll HA, Lei van der J: Why are structured data different? Relating differences in data representation to the rationale of OpenSDE. Med Inform Internet Med. 2005, 30 (4): 267-76.
de Lusignan S: The barriers to clinical coding in general practice: a literature review. Med Inform Internet Med. 2005, 30 (2): 89-97. 10.1080/14639230500298651.
Day S, Fayers P, Harvey D: Double data entry: what value, what price?. Control Clin Trials. 1998, 19 (1): 15-24. 10.1016/S0197-2456(97)00096-2.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/8/66/prepub
The authors declare that they have no competing interests.
DM conceived of the study, and participated in its design and coordination and drafted the manuscript. VK participated in the study design, data collection and drafted the manuscript. JM and GF participated in the study design and were the responsible for the electronic-controls and statistical analyses. KK and GA were responsible for the hand-searching (manual controls). LT participated in study design and coordination and drafted the manuscript. IK participated in the study design and questionnaires collection. CP participated as single data-manager. All authors read and approved the final manuscript.
Electronic supplementary material
About this article
Cite this article
Mauri, D., Karampoiki, V., Mauri, J. et al. Double–blind control of the data manager doesn't have any impact on data entry reliability and should be considered as an avoidable cost. BMC Med Res Methodol 8, 66 (2008). https://doi.org/10.1186/1471-2288-8-66
- Medical Chart
- Electronic Control
- Double Data Entry
- Potential Mistake
- Thoracic Disease