The study was designed to explore the feasibility of conducting online expert panels and to examine experimental replicability of their findings. We focused specifically on the issues of expert participation, consensus development, agreement across panels, and participant experiences. We also investigated the effects of the panel size on participation rates and satisfaction with the ExpertLens process used to conduct online panels. Our exploratory study shows that online expert panels may be a practical approach to engaging large and diverse groups of stakeholders in finding consensus on key language issues within an evolving field, such as QI. It also supports the results of previous research showing that virtual panels may potentially expedite the elicitation process, minimize burden on participants, allow the conduct of larger and more diverse panels, and include geographically distributed participants [8, 9].
Overall, CQI stakeholders demonstrated strong commitment to improving CQI language, and the study participation rate was high, with 66% of participants, who did not receive any honoraria, engaging in all phases of the online elicitation. This number compares favorably to both the 45-50% typically expected participation rate in a traditional Delphi study  and the 49% participation rate in a recent online Delphi with just two questions phases .
Moreover, our panelists generally expressed positive attitudes towards an online approach, finding the elicitation process interesting, the online system easy to use, and the discussion component helpful for improving their understanding of the issues and clarifying their positions. Typical average satisfaction scores were equal to, or above, "agreed slightly" on positively worded satisfaction items.
Although participation levels did not vary significantly across the panels of different size, the perception of a two-way information exchange, as measured by the post-completion survey questions, was significantly higher in larger than in smaller panels. Therefore, the number of invited participants in online consensus panels may need to be higher than in traditional panels to ensure that the critical mass of participants is achieved not only during the questions but also during the discussion phases . On the one hand, inviting a larger number of panelists may increase the panel's representativeness  and allow for exploring the differences not only between, but also within stakeholder groups. On the other hand, our largest panel (n = 40) was still of a size we considered reasonable for engaging a high percent of panelists in the discussion; having a very large number of panelists might have a deleterious effect on discussion participation.
Finally, our study suggests that the online approach can be used to conduct multiple parallel panels to test for the reproducibility of panel conclusions. In this study, the level of agreement between panels was fair as measured by four-way kappa [28, 29], and roughly a quarter of all potential features was judged important by all four panels. The comparison across panels is crucial information when evaluating the potential replicability of panel decisions and provides an indication of the degree of confidence in the robustness of decisions across panels. By the end of Phase III, all four panels agreed on the status of five out of eleven CQI features. The data feedback and discussion features of the online system appeared to reduce MAD-M values (i.e., increase the level of agreement) between Phase I and Phase III without forcing participants into consensus. By virtue of answering the same questions twice and discussing their perspectives, all four panels agreed on the importance of three out of eleven features to the definition of CQI, and on the lack of importance of two other features.
While our study illustrates the feasibility of conducting online expert panels, it, nonetheless, has some limitations. In terms of panel size, our results reflect only a modest panel size range; we did not test extremely small or large sizes. Furthermore, we do not know how well we represented QI researchers versus QI practitioners in our sample, because we only can categorize those who actually signed up to participate; however, our Phase I response rate of 77% does not suggest a high level of bias in this regard. Finally, in terms of achieved participation rates and panel results, the findings may primarily reflect the dedication of CQI stakeholders and may not apply to other topics and applications. Previous studies using this online approach , however, also indicate that this process can help obtain input from large, diverse, and geographically dispersed groups of stakeholders who try to foster exchange and find consensus on often controversial topics and policy questions. Nonetheless, further experimental research is necessary to validate these findings.