This study found that most PLSs (80%) did not end with conclusive messages about the studied interventions. Linguistic analysis of the PLSs found that the PLSs were not engaging enough for readers but written in mostly formal and “cold” style. The average readability level for Cochrane PLSs was slightly above recommended in terms of reading age and education degree for proper readability among lay audiences, indicating that PLSs may be difficult to read for persons without medical education.
Furthermore, recent PLSs were associated with a higher number of authors and words, higher clout and authenticity tone, and lower SMOG index, indicating they were easier to read. Our results indicate there are some improvements over the years that increase the readability of PLSs. However, further effort is needed to produce PLSs that will be more informative and more readable for the lay audience.
Over the years, the frequency of “non-conclusive” conclusions remained similar. More than half of the PLSs did not provide any opinion about the investigated intervention in its conclusions, or provided unclear conclusion, thus depriving the readers of the final message about the efficacy and safety of an intervention. The aim of Cochrane PLSs is to create health information that patients can understand and use . Readers of PLSs likely seek a simple, clear and conclusive answer to their medical questions. It is acknowledged that systematic reviews were being criticized for not providing a specific guidance, and instead often concluding that there is little evidence to answer the question . However, we need to distinguish PLSs that concluded there is “no evidence” from those that provided no opinion or unclear conclusion. It needs to be highlighted that lack of conclusiveness should not be considered a weakness of a systematic review or its PLS. If the PLS accurately reflects information from a systematic review, the PLS should not be judged as good or bad based on the conclusiveness. On the contrary, if a PLS does not contain a clear concluding message, or it does not have a final opinion at all about the studied intervention, this should be considered a poorly written PLS.
A higher proportion of non-conclusive results might also be due to a more reliable and critical approach to research practices and reporting. This can particularly relate to Cochrane reviews’ methodology as they are considered higher quality reviews . Understanding inconclusiveness from the perspective of a lay person requires an awareness that methodologically sound systematic reviews are frequently inconclusive, a knowledge that comes with a certain level of science health literacy.
Comparing the results of the word count tests for inconclusive categories with the word count for conclusive categories of conclusiveness, it was visible that PLSs with vague conclusions were associated with a higher number of words in the review. Furthermore, PLSs categorized as “unclear” in terms of conclusiveness, resulted in the highest average word count score, implying that in cases with conflicting and debatable findings, summaries may require more words to support the explanation with broader, inconsistent reasoning, which led the authors to the inexplicit judgment. In general, as in the previous study , we found that Cochrane PLSs on average, were shorter than recommended by Cochrane , but we included additional variables and comparisons.
PLSs were found to have a relatively high number of words associated with analytical tone and clout, while levels of emotional tone were low, which is in line with results of the previous study on a smaller scale . High levels of analytical tone suggest formal, logical, and hierarchical thinking , and these characteristics comply with the recommended form and structure of PLSs [7, 9]. Moreover, the authors of PLSs are mostly scientists, trained to think and write in a formal and logical style, and this style remained noticeable when they write for diverse readers. This phenomenon was also detected in the studies with students who were already trained to write formally. When they were asked to write in a less formal science style, their texts revealed a higher LIWC analytic score . Therefore, higher analytical tone detected at PLSs is likely due to authors who are trained to write formal, and logical text structures.
Clout, a linguistic characteristic that implies confidence and expertise of the writer, was relatively high [19, 29]. Besides, higher clout found in PLSs complies with the consideration that the authors of PLSs are experts for the specific topic they write about. Lower numbers for authenticity in texts, found in PLSs, may be associated with a more guarded, distanced form of discourse . Comparing the categories of conclusiveness, we found that authenticity was slightly higher in vague, inconclusive summaries. As higher authenticity suggests honest, personal and disclosing characteristics of the text , perhaps authors in summaries with higher authenticity did not want to overstate the efficacy of the described interventions, resulting without definite conclusions. Furthermore, “no evidence” conclusiveness category, which resulted in the lowest numbers for authenticity, was the one using the least proportion of first-person pronouns, since the authors of PLSs definitely declared in those cases that their search did not result in eligible studies or RCTs, and thus could not give personal recommendations.
In general, PLSs had low emotional tone levels as a summary variable of LIWC, which is associated with the negative emotional tone, such as anxiety, sadness, and hostility . We assume that the reason of negative emotional tone lies in the presence of words related to negative emotions, such as pain, or disease. We noticed that texts from the Public Health and Health Systems (PH&HS) network provoke a slightly higher emotional tone in comparison to the other Cochrane networks. Unlike other networks, PLSs from the PH&HS network were associated with negative emotional tone to a lesser extent, probably due to its different thematic scope: from occupational health to global health, interventions related to consumers and communication , but still far from neutral, the middle level of emotional tone. PH&HS, together with the Children and Families network, also had the highest levels of clout and authenticity, making those two networks the most engaging ones compared to others. Future research should examine the reason for the existence of this difference, perhaps by using a text mining approach. We found that Cochrane networks differ in the number of words per review, with Cancer and Musculoskeletal, Oral, Skin and Sensory network having the highest number of words per review.
Most of the analyzed Cochrane PLSs were written with relatively high readability scores, which may not have an impact on comprehension of journalists, professionals, or audience with higher education. Yet, high readability could make PLSs difficult to read for the lay population without a university education, or a specific medical-oriented education or training. The readability score was similar across Cochrane Review Networks, as well as across different conclusion categories. In line with the studies which previously analyzed the readability of PLSs , we recommend PLS authors to use readability calculators as a tool in the process of writing PLSs. Acknowledging the considerable skills and time necessary to write a high quality PLS, authors may still simplify the language as much as possible. Difficulties with reading PLSs may prevent the public and patients from obtaining the best evidence from research, subsequently hindering the process of making informed decisions .
Our findings that showed an association between the year of publication and the number of authors are in line with the recent study that found an increasing number of authors in Cochrane reviews . Moreover, our detection of an association between the number of authors and the number of words per PLS was consistent with results in other fields . In addition to this, PLSs with higher word count showed slightly lower readability scores. This may lead to a possibility that contribution of each of the authors may result in additional text, resulting in longer texts, and longer texts allow explanations with higher number of simple, common, everyday words.
We also found that the PLSs written before the PLEACS standards were introduced differed in several dimensions compared to PLSs published after the PLEACS, although those differences were small. For this analysis we set the year 2014 as a cut-off since PLEACS were published in 2013 [7, 8]. After the introduction of PLEACS, the PLSs had lower readability scores and a slighty higher number of words related with analytic, authentic and emotional tone. However, those differences cannot be considered as causally associated with PLEACS, because we do not know if the writers of the PLSs were following PLEACS. A previous analysis on a large sample showed that PLSs rarely followed PLEACS . Further guidance about writing PLSs, published in 2019 as a supplement for the PLEACS  was published after our search date, and thus could not influence our results.
Future guidance for writing PLSs should include advice for authors regarding writing clear conclusions and using tools that will improve linguistic characteristics and readability of those summaries. With the aim of resolving possible doubts or misunderstandings for lay readers who come across inconclusive PLSs, authors of these PLSs could refer to one of the categories we used for conclusiveness. Authors can explicitly declare that the specific systematic review may not provide a clear answer regarding anlyzed intervention, and recommend further engagement for lay reader. This engagement could consider following future research, or consulting medical specialists for personalized approach in cases where the consumer of PLS is a patient seeking for interventions for a specific health issue.
The findings of our study should be interpreted in view of several limitations. Conclusiveness was evaluated by two authors who independently read the reviews and made judgments about the category of conclusiveness for each review. These categorizations could be considered subjective; however, we did our best to use methods that are associated with minimization of bias. We used a pilot assessment, calibration exercise and consensus with the rest of the authors.
Our sample was large (N = 4360), as this study analyzed all PLSs published till February 2019, available to research team in early autumn 2019 when the analyses was initiated. However, it is acknowledged that it is possible that the PLSs published after February 2019 might have different characteristics. Therefore, the results of our study can not be generalized to the PLSs published outside of the time frame covered in this study.
SMOG readability formula was chosen among available readability formulas, as recommended and the best suited for health care applications . Although a higher readability score indicates that the analyzed text could be difficult to read and consequently difficult to understand, we cannot automatically interpret texts with lower readability scores as more comprehensible/understandable to readers. It is acknowledged that the SMOG does not directly measure specific education needs in a deterministic way and does not take specific reader characteristics into account. Still, the assumption is that lower readability scores are a prerequisite for successful comprehension.