This manuscript has explicitly described a new approach to constructing a Q sample. Methodological issues that arose during the process are now discussed, including strategies to: reduce researcher bias; generate a comprehensive concourse; select the Q sample (size and representation, use of a theoretical framework); constitute a Delphi panel (size and membership); define consensus; and resolve language issues.
Reduction of researcher bias
The potential for researcher bias has been acknowledged in both quantitative and qualitative research and various strategies have been suggested to mitigate this risk [40]. Likewise, researcher bias has been identified as a significant challenge in the process of Q sample construction, with critics suggesting that “if reflexivity is not adequately considered, Q sorting has the inherent risk of turning into a Socratic dialogue, wherein Socrates (the researcher) with great certainty obtains the correct responses from Trasymachus (the respondent)” [18]. In other words, researcher bias may result in the selection of statements that solely represent the view that the researcher expects or seeks to find and could therefore produce mis-leading results. The combination of the three steps used to construct the Q sample in this study was specifically designed to reduce this risk.
Comprehensiveness of the concourse
Although it is common for concourse statements to be derived from existing literature [16], few studies describe in detail an extensive review process involving a wide range of sources. For the current study, a comprehensive review of the literature was undertaken, incorporating both scholarly and grey literature, to ensure that a large concourse was derived from a broad range of sources and to maximise the diversity of opinions sampled. With the exception of theses and professional websites, the final Q sample was represented by a relatively similar number of statements (range 4–10) from each document type. This representation was unintentional, and the Delphi panel were unaware of a statement’s specific origin in their decision-making process. It may not have been necessary to conduct a separate search of theses, as the one thesis considered to be most relevant was identified in the scholarly literature search. Professional websites were not particularly useful sources for identifying statements per se, although they identified some relevant linked articles not identified in other searches, from which statements were extracted. Online discussion forums provided many authentic phrases likely to resonate with OTC codeine misusers, representing a potentially underutilized source for obtaining concourse statements for Q studies.
Q sample size and representation
Similar to the way that R methodology is concerned with ensuring that a representative sample of participants is selected from the target population, in Q methodology the statements forming the Q sample should be representative of the concourse [41]. Stephenson suggests the use of Fisher’s variance design [14] as the most formal way to ensure comprehensiveness of the Q sample, with equal numbers of statements selected from each cell of a theoretically informed two-dimensional matrix. Some Q methodologists, however, advocate for a freer, more creative approach focussing on understanding and representing the statement population as a whole [16]. Fisher’s variance design was not used to structure our Q sample as we were not applying a two-dimensional theory suitable for a matrix design and did not want to force selection of statements to fulfil a predefined quota. Instead, concourse sampling was achieved by thematically grouping and reducing the number of statements using the COM-B model as a theoretical framework, with the final selection of statements decided by the Delphi panel.
The recommended Q sample size of 40–80 statements is based on the balance between providing enough statements to be representative of the concourse while not overtaxing participants [16]. While a number of studies have demonstrated that different Q samples drawn from a single concourse produce similar results [42, 43], further research is required to determine the effect, if any, of Q sample size.
Use of a theoretical framework
The COM-B model was used to add rigour to the sampling process by providing an evidence-based structure with previous application to addiction research [29]. It was specifically chosen as it is an overarching model incorporating multiple theories of addiction, rather than being based on a single theory. The objective was to reduce the likelihood of analytic bias on identification of themes, to base the themes on existing theory and to lessen the possibility of overlooking theoretically important statements. The COM-B domains and headings provided a useful starting point for the initial sorting of statements, particularly since the concourse was large. However, the COM-B is a broad framework and there was significant overlap between themes, with statements often fitting into more than one of the categories. It was sometimes difficult to decide which category to place statements in. For example, the statements “I use OTC codeine to overcome personal problems” and “I use OTC codeine because circumstances force me to do so” listed in the Opportunity domain under “Cues in the physical and social environment…” could have been placed in “Needs met by the addictive behaviour” in the domain of Motivation. It was also difficult to distinguish between some of the headings such as, “Beliefs about the positive features of the addictive behaviour” and “Pleasure and satisfaction derived from the addictive behaviour”.
Potentially the statements may have been grouped more definitively according to the temporal features of addiction, such as using concepts that describe the addiction life cycle; (1) initial enactment of the behaviour, (2) development of addiction, (3) attempts at recovery or mitigation and (4) relapse [29]. However, this approach may not have adequately represented the multiple theories of addiction, highlighting the importance of careful consideration of the choice and purpose of the theoretical framework. Overall, despite difficulties in allocating statements using the COM-B model, the statements did fit into one or more of the domains and it provided a useful framework to ensure coverage of the major theoretical aspects of addiction.
Delphi panel size and membership
The final decision on the statements to include in this Q sample was achieved using a Delphi technique with a multidisciplinary panel of addiction experts. Use of this technique aimed to reduce researcher bias in the selection of statements, with decisions being made collectively by experienced addiction experts representing a broad range of disciplines. The Delphi panel also helped to validate the content, representativeness and language of the Q sample. Experts also had the opportunity to comment on and contribute statements that they felt could be important to include.
There is no guiding rule about the number of members required for a Delphi panel [31]. The literature suggests that the size of a panel can range from eight to thousands of participants, with samples on the lower end of the range considered to be acceptable for homogenous panels [31]. Our Delphi panel could be considered to be relatively homogenous, with all members having specific knowledge about OTC codeine dependence. A small, fifteen member panel was therefore recruited, which is similar in size to many other health-related Delphi studies [44,45,46,47,48].
Four of the fifteen experts did not complete Round Two. The time delay of two months between rounds may have contributed to this attrition. Although this response rate of 73% exceeds the suggested 70% requirement to ensure rigour of the Delphi technique [49], a more rapid succession of rounds may have retained the interest of participants and improved retention [44].
Whilst difficult to assess [31], the choice of ‘experts’ to comprise the Delphi panel is based on the requirement that panel members have “knowledge and experience with the issues under investigation” [50]. We chose to consider addiction specialists as experts for our Delphi panel, rather than OTC codeine misusers. The purpose was to obtain a broad, external view of misuser beliefs and to incorporate knowledge of the theories of addiction in the decision making process, rather than focussing on the individual perspectives of misusers. This objective was achieved, as mapping the Q sample against the COM-B confirmed that each of the COM-B domains (and therefore the theories of addiction and the overall concourse) was represented. Codeine misusers themselves also verified that they were able to express their opinions using the Q sample in a subsequent phase of the study.
Deriving consensus
There are no universally accepted criteria for measuring consensus in Delphi studies [34, 51,52,53]. Percent agreement, measures of dispersion and stability of responses have each been applied as measures of panel member agreement using a variety of different cut-offs. Delphi studies also quantify the level of agreement with each individual statement. This is usually reported using the median score, rather than the mean, due to the level of measurement used (Likert-type scales are often categorical rather than continuous) and the results may not follow a normal distribution [34].
An interquartile range of less than or equal to one was chosen as the measure of panel consensus for our study on the basis that “IQR of 1 or less is found to be a suitable consensus indicator for 4- or 5- unit scales” [34]. However, a number of researcher reports [34, 38, 39] have made this claim based on the precedence of Raskin [54] and Rayens and Hayn [55], who actually use an interquartile deviation (IQD) of ≤1 as their measure of consensus as opposed to IQR ≤ 1. In addition, neither Raskin or Rayens and Hayn reported use of a 5-point scale. Paradoxically, the use of IQR is a more stringent requirement for consensus than IQD, as IQD is half the value of IQR. Other researchers [35, 37] have referenced Linstone and Turoff [32] when suggesting an IQR of 1 to be a good indicator of consensus for 5-point Likert scales. However, this primary source only mentions an “IQR no larger than 2 units on a 10 point scale” [32]. Despite these inconsistencies being identified in the literature, the use of IQR ≤ 1, in combination with the pragmatically chosen median cutoff of ≥4 was adopted for the determination of consensus for our study.
The number of rounds required for a Delphi study is not prescribed. Some researchers set the number of rounds in advance and others continue until the desired level of consensus is achieved [44]. Our Delphi study ceased after two rounds on the basis that consensus on 40–80 statements had been achieved and that the resultant Q sample was representative of the COM-B domains. Had appropriate COM-B representation not occurred, additional Delphi round(s) would have been undertaken. Alternative statements would have been selected from the remaining concourse to represent the missing COM-B domain(s). These new statements would have been presented to the panel using the same consensus criteria for statement inclusion as applied in previous rounds.
Language issues
In traditional survey design, the wording of questions should be closely aligned to the participants’ usual language to maximise comprehensibility [56]. The same principle applies to the wording of Q sample statements [14]. Modification may therefore be required, for example to simplify, clarify, or avoid the possibility of causing offense [56], particularly if the statements are not sourced directly from potential participants.
In this study, the decision was made to reword statements where possible to remove the words ‘addict’ and ‘addiction’, as panel members suggested that these terms could potentially stigmatise codeine misusers. This potential for stigmatisation was supported by existing literature [57, 58]. The choice of replacement words was difficult due to a lack of consistency in addiction diagnostic terminology and the changing nature and continued debate around the lexicon of addiction [59]. ‘Dependence’, as used by The International Classification of Diseases [60], was ultimately chosen as the most suitable replacement word over ‘substance use disorder’, as used by the Diagnostic and Statistical Manual of Mental Disorders [61], as the former implies compulsive use and is more concise. However, this was not done without recognising its limitations, as many of the statements were direct quotes from codeine misusers who referred to themselves as ‘addicts’. This suggested that the term may be a normal part of their vernacular and potentially a suitable choice for a survey attempting to use the language of the participants. In addition, the word dependence has a dual meaning, traditionally referring to the normal physiological adaptations that occur in response to repeated drug administration rather than being associated with compulsive use [62].
A limitation of this study is that the language used was not validated by codeine misusers prior to finalising the Q sample. The statements could potentially have been piloted with codeine misusers after completion of the Delphi component, however limited access to potential participants precluded this option.
The Delphi panel were provided with written information outlining the task, including the background of the study, the aim and instructions. However, three participants asked for further explanation and clarification about whether their responses should reflect their personal views of dependence or the views likely to be expressed by misusers. This potential ambiguity may have affected the reliability of the panel responses and highlights the importance of providing clear and specific instructions, particularly when using a methodology that participants may be unfamiliar with. In addition, the majority of experts had knowledge of and experience with other types of misusers as well as OTC codeine misusers. This may have led to the inclusion of some views of dependence not specific to OTC codeine. Despite these limitations, the Delphi technique was successfully incorporated into the process of Q sample construction as a mechanism to reduce researcher bias and produce a Q sample suited to codeine misusers.