Crowdsourcing citation-screening in a mixed-studies systematic review: a feasibility study

Background Crowdsourcing engages the help of large numbers of people in tasks, activities or projects, usually via the internet. One application of crowdsourcing is the screening of citations for inclusion in a systematic review. There is evidence that a ‘Crowd’ of non-specialists can reliably identify quantitative studies, such as randomized controlled trials, through the assessment of study titles and abstracts. In this feasibility study, we investigated crowd performance of an online, topic-based citation-screening task, assessing titles and abstracts for inclusion in a single mixed-studies systematic review. Methods This study was embedded within a mixed studies systematic review of maternity care, exploring the effects of training healthcare professionals in intrapartum cardiotocography. Citation-screening was undertaken via Cochrane Crowd, an online citizen science platform enabling volunteers to contribute to a range of tasks identifying evidence in health and healthcare. Contributors were recruited from users registered with Cochrane Crowd. Following completion of task-specific online training, the crowd and the review team independently screened 9546 titles and abstracts. The screening task was subsequently repeated with a new crowd following minor changes to the crowd agreement algorithm based on findings from the first screening task. We assessed the crowd decisions against the review team categorizations (the ‘gold standard’), measuring sensitivity, specificity, time and task engagement. Results Seventy-eight crowd contributors completed the first screening task. Sensitivity (the crowd’s ability to correctly identify studies included within the review) was 84% (N = 42/50), and specificity (the crowd’s ability to correctly identify excluded studies) was 99% (N = 9373/9493). Task completion was 33 h for the crowd and 410 h for the review team; mean time to classify each record was 6.06 s for each crowd participant and 3.96 s for review team members. Replicating this task with 85 new contributors and an altered agreement algorithm found 94% sensitivity (N = 48/50) and 98% specificity (N = 9348/9493). Contributors reported positive experiences of the task. Conclusion It might be feasible to recruit and train a crowd to accurately perform topic-based citation-screening for mixed studies systematic reviews, though resource expended on the necessary customised training required should be factored in. In the face of long review production times, crowd screening may enable a more time-efficient conduct of reviews, with minimal reduction of citation-screening accuracy, but further research is needed. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01271-4.

It's a little bit of a mouthful, but then review titles often are as they try to convey the key elements of the topic that the review is going to cover.
That isn't to say the rest are not important or relevant, but for this task, which seeks to make a judgment on records at title and abstract level, and not mistakenly throw anything out that might be relevant, it's safest to keep the inclusion criteria quite broad. [NEXT] +++++++++++++++++++++++++++++++++++ For this task you'll be presented records that will look a bit like this:

Health professional training for cardiotocography interpretation and management
They will all have a title, and the vast majority will have an abstract as well.
Classify records as Not relevant if they are not about training or cardiotocography.
When you aren't sure, simply select Unsure.
It means educational activities aimed at healthcare professionals that will include one of the following: • Improve/refresh existing knowledge • Learn new skills • Change behaviours/attitudes of health professionals The delivery of educational activities could take a wide range of formats such as lectures, workshops, online learning courses, or courses aimed at training whole teams: [NEXT]
Cardiotocography monitoring is a type of foetal heart rate monitoring. It is used around the world, and most often for high-risk pregnancies. There are other types of foetal heart rate monitoring but we are specifically interested in cardiotocography.
This type of monitoring measures both the foetus's heart rate and the woman's uterine contractions.
[ Here we were given only a title to work with. It can be harder to make a firm decision from a title. If you said either Possibly relevant or Unsure, then you did well. The title is promising as it appears to be about teaching nurses how to use foetal monitors. We would want to keep this record and try to get hold of the full article in order to make a final decision on whether it is for inclusion within the review. Record 9: 19915414 (PMID); 358083229 (Accession number) The Fetal heart rate collaborative practice project: Situational awareness in electronic fetal monitoring-A Kaiser Permanente perinatal patient safety program initiative BACKGROUND: Electronic fetal monitoring has historically been interpreted with wide variation between and within disciplines on the obstetric healthcare team. This leads to inconsistent decision making in response to tracing interpretation. PURPOSE: To implement a multidisciplinary electronic fetal monitoring training program, utilizing the best evidence available, enabling standardization of fetal heart rate interpretation to promote patient safety. METHODS: Local multidisciplinary expertise along with an outside consultant collaborated over a series of meetings to create a multimedia instructional electronic fetal monitoring training program. After production was complete, a series of conferences attended by nurses, certified nurse midwives, and physician champions, from each hospital, attended to learn how to facilitate training at their own perinatal units. All healthcare personnel across the Kaiser Permanente perinatal program were trained in NICHD nomenclature, emergency response, interpretation guidelines, and how to create local collaborative practice agreements. Metrics for program effectiveness were measured through program evaluations from attendees, the Safety Attitudes Questionnaire. RESULTS: Program evaluations rendered very positive scores from both physicians and clinicians. Comparing baseline to 4 years later, the perception of safety from the staff has increased over 10% in 5 out of the 6 factors analyzed. SUMMARY: Active participation from all disciplines in this training series has highlighted the importance of teamwork and communication.