Should samples be weighted to decrease selection bias in online surveys during the COVID-19 pandemic? Data from seven datasets

Background Online surveys have triggered a heated debate regarding their scientific validity. Many authors have adopted weighting methods to enhance the quality of online survey findings, while others did not find an advantage for this method. This work aims to compare weighted and unweighted association measures after adjustment over potential confounding, taking into account dataset properties such as the initial gap between the population and the selected sample, the sample size, and the variable types. Methods This study assessed seven datasets collected between 2019 and 2021 during the COVID-19 pandemic through online cross-sectional surveys using the snowball sampling technique. Weighting methods were applied to adjust the online sample over sociodemographic features of the target population. Results Despite varying age and gender gaps between weighted and unweighted samples, strong similarities were found for dependent and independent variables. When applied on the same datasets, the regression analysis results showed a high relative difference between methods for some variables, while a low difference was found for others. In terms of absolute impact, the highest impact on the association measure was related to the sample size, followed by the age gap, the gender gap, and finally, the significance of the association between weighted age and the dependent variable. Conclusion The results of this analysis of online surveys indicate that weighting methods should be used cautiously, as weighting did not affect the results in some databases, while it did in others. Further research is necessary to define situations in which weighting would be beneficial. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01547-3.


Page 2 of 11
Haddad et al. BMC Medical Research Methodology (2022) 22:63 borders are all arguments in its favor [1]. However, the use of web surveys has triggered a heated debate regarding their scientific validity [2,3]. The main argument against web surveys is the selection bias of the sample, which is not chosen at random, the target population being a convenience sample rather than a probability sample [1]. This non-probability method of selection is generally problematic, leading to an unequal probability of selection. Bias further occurs since specific characteristics (such as age, education, gender) are under-or over-represented in the gathered sample, thus impacting the reliability of the results [1]. Even a well-designed sampling plan would frequently result in the survey being completed by too many women and not enough men or by too many young people and not enough elderly individuals. Furthermore, all these factors might be linked to different health-related variables, attitudes, and behaviors that survey researchers are interested in [4].
Selection bias occurs in studies that use online surveys as it only reaches a subgroup of the target population [5]. Only literate people, those who have access to the internet, and those sufficiently interested in the topic can complete online surveys [5]. For example, when a subgroup is targeted (thus overrepresented, such as literate people or those with access to the internet), selection bias will generally increase as the target population becomes less diverse, resulting in biased findings [5,6]. Sometimes, a survey about COVID-19 would only attract a specific subgroup of people interested in the topic. However, during infectious disease outbreaks, a quick online survey is necessary to reach a large number of people in a short time to collect the needed information [7]. Moreover, various types of problems and errors are encountered in the data collected online (information bias), leading to concerns about the quality and reliability of the resulting scientific information [1].
To overcome biases and improve the quality of online survey findings, many authors have adopted weighting methods [1,4], such as rectifying imbalances between the survey sample and the population by applying these methods to adjust demographic characteristics (gender, age, ethnicity, educational background, and geographic area) [4]. Because some factors of interest may not always have a strong enough link with demographic weighting variables, weighting methods can only compensate for proportionality, not always representativeness [8]. Hence, the considerable debate about weighting methods and their effect on variance during analysis, as some researchers claim that weighting has little potential for eliminating biases in web surveys [9]. As variance is used to calculate confidence intervals and hypothesis tests, weighting data would raise the variance of estimates [10], leading to a loss of accuracy [10]. Nevertheless, researchers are often willing to accept inaccuracy to obtain unbiased estimates [10].
A direct comparison of unweighted and weighted samples has rarely been performed in the literature [11,12]. From a practical perspective, comparing the two techniques is critical because they may provide different findings of the overall impact strength, outcome consistency across studies, and other variables' effect on the association.
Two studies comparing weighted and unweighted estimates from online samples have revealed that demographic weighting decreased bias in some situations while it substantially increased it in others [11,12]. Recent research using aggregated data to evaluate racial/ ethnic inequities in COVID-19 mortality has found that weighted population distributions underestimated the excess burden of COVID-19 among African American and Latin individuals, compared with analyses conducted with an unweighted population [13].
Consequently, this work aims to compare weighted and unweighted association measures after adjustment over potential confounding, taking into account dataset properties such as the initial gap between the population and the selected sample, the sample size, and the variable types.

Databases
This study assessed seven datasets of different sample sizes collected by our team between 2019 and 2021 during the COVID-19 pandemic through online cross-sectional surveys using the snowball sampling technique. All seven datasets consisted of basic demographic variables (including age and gender), major independent variables, and different outcome variables.

Procedure
Identical questions measuring basic demographics were used in each database. Weighting techniques were applied and mostly accounted for sociodemographic differences between the online sample and the target population.
The formula of such weights [14] was: w i = p p /p s , where p p is the population proportion, and p s is the (web) sample proportion.
In each database, a major outcome variable associated with the demographic variables was chosen, in addition to an independent variable. Weighted versus unweighted results were compared in all datasets. Details about each dataset are presented in Table 1.

Data analysis
Data were analyzed using SPSS software version 25. Weighting was performed according to the number of inhabitants by age group and gender, as described by the latest official version of the Lebanese population estimates [15]. In descriptive statistics, means and standard deviations were considered for continuous variables and counts and percentages for categorical variables. Associations between dichotomous variables were calculated using OR, while beta coefficients served to assess associations between quantitative variables.
In each dataset, the relative difference between estimates was calculated to assess the gap between the sample and the population figures, measured by the absolute change between weighted and unweighted values in comparison to the unweighted value (Relative difference = (unweighted value-weighted value)/ unweighted value). The function log base 10 (Log10) was used to stabilize variation within the values of the used variables with non-normal distribution. A further step in the analysis was to compare the correlation of the values of the variables in all datasets between weighted and unweighted methods using Pearson's correlation coefficient. Multiple regressions were conducted, comparing weighted versus unweighted results from datasets primary data: multiple linear regressions when the dependent variable (DV) was continuous and logistic regressions when the DV was dichotomous.
Finally, multivariable regression analyses were conducted on secondary data to assess the effect of the gap in independent variables on the adjusted OR or beta coefficient (between the independent and dependent variables). In other words, this effect was assessed through the impact of the relative difference of age and gender on the relative change in adjusted OR or beta coefficient. The presence of a significant association between age, gender, and independent variable (IV) with the DV, using the weighted and unweighted methods in each dataset, was also taken into account. In all cases, a p-value < 0.05 was considered statistically significant. Table 2 shows the distribution of age and gender in the seven datasets using simple unweighted and weighted methods. The proportions differed regarding age and gender. For example, in the first dataset, a high relative difference was mainly found in participants older than 45 (250%); a similar result was found in the third dataset for age < 35 years. Similarly, in the fifth dataset, a high relative difference was found between the two groups, essentially in those aged over 45 years (251.72%). In other subgroups, the relative difference could be as low as 3% in dataset 5 and 6.5% in dataset 6. Table 3 summarizes the description of dependent variables (DV) and independent variables (IV) using simple (unweighted) and weighted methods. The weighting applied on demographic characteristics showed low relative differences, and the values were very similar between the two groups, whether variables were continuous or categorical. The bivariate analysis between the independent variables and the dependent variables are presented in the supplementary table 1.

Correlation between unweighted and weighted values
A strong positive correlation was found between the values of weighted and unweighted data taking into account the values of gender, age, dependent variables, and independent variables (r = 0.918, p < 0.001) (Fig. 1). Although lower than that of dependent variables (r = 1.000, p < 0.001), a positive correlation was found between unweighted and weighted values of age (r = 0.824,  p < 0.001), gender (r = 0.780, p = 0.001), and independent variables (r = 1.000, p < 0.001).

Correlation between relative differences of variables and association measure
A strong correlation was found between age relative difference (r = 0.863, p = 0.012) and the sample size (r = -0.891, p = 0.007) with the adjusted OR relative difference (Table 4). No significant association was found between the adjusted OR relative difference, gender, and the independent variable relative differences. Table 5 displays the results of weighted and unweighted multivariable models (linear or logistic regressions), showing discrepancies between models.

Multivariable analysis comparing weighted and unweighted samples
In the first dataset (N = 310), the association of the independent variable (attitude toward COVID-19) with the dependent variable (practice toward COVID- 19) remained not significant (p-value > 0.05) between the two methods used. However, there was an increase in the relative difference by 133.33% between unweighted and weighted values.
In the second dataset (N = 509), the association of the independent variables (fear of COVID-19 and financial well-being) with the dependent variables (stress, anxiety, and insomnia) remained significant in both methods when considering the three dependent variables, except for the model where the dependent variable was anxiety (LAS-10). In the latter, the financial well-being scale (IV) yielded a significant association in the unweighted regression (p = 0.02) but a non-significant result in the weighted regression (p = 0.38). The weighted beta value was 98% lower than the unweighted beta value.
In the third dataset (N = 202), the association of the independent variable (fear of COVID-19) with the dependent variables (knowledge and practice) was not significant in the unweighted sample. However, a statistically significant association was found in the weighted sample. A relative increase in beta value was found for gender in the weighted method, with a beta decrease of 150% for the independent variable. When considering the attitude scale as the dependent variable, no significant association was found between the IV and the DV using the two methods.
In the fourth dataset (N = 2336), the association of the independent variable (preventive measure scale) with the dependent variable (having been diagnosed or not with COVID- 19) was not significant in the unweighted sample. However, a statistically significant association was found in the weighted sample. Relative differences in OR varied between -1% and 1% after weighting.
In the fifth dataset (N = 324), the association of the independent variables (soft skills and emotional intelligence) with the dependent variable (burnout scale) yielded different results. It was significant for soft skills in both methods, while emotional intelligence remained non-significant when using the two methods, with a p-value tending to be significant in the weighted sample.
A negative relative difference was found for the independent variable after weighting.
In the sixth dataset (N = 405), the association of the independent variable (knowledge scale) with the dependent variable (stigma discrimination scale) was significant in both methods. The fear of COVID-19 and anxiety remained non-significant when using the two methods. A decrease or increase in the relative difference was found after weighting.
In the seventh dataset (N = 410), the association of the major independent variables (fear of COVID-19 and anxiety) with the dependent variable (eating behaviors) was significant in both methods. The boredom scale remained non-significant when using the two methods. Relative differences varied after weighting. Table 6 displays the association between age, gender, independent variable gaps (between sample and population), associations significance, and the sample size with the major association relative change. The results showed that a larger sample size (Beta = -0.001, p = 0.001), a higher gender gap (Beta = -0.007, p = 0.003), and the presence of a significant association between weighted age and the DV (Beta = -0.221, p = 0.013) would significantly decrease the relative change of the major association. However, a higher age gap (Beta = 0.010, p = 0.005) was significantly associated with a higher relative change in the major association. In terms of absolute impact, the highest impact on the association measure was related to sample size, followed by age relative difference, gender relative difference, and finally, the significance of the association between weighted age and the dependent variable.

Discussion
Our study compared weighted and unweighted samples of online surveys and assessed the extent to which weighting methods can adjust the web sampling to the reference sample and how it would affect the results. Our findings revealed a high variation of age and gender between weighted and unweighted samples within the same population; however, high similarities were found for dependent and independent variables in terms of relative difference measures. The regression analysis results showed a high relative difference between weighting and unweighting methods in some datasets and for some variables, while a low difference was found for others; association measures would increase or decrease after weighting. These discrepancies could be explained by the large sample size and the high relative difference in gender, related to lower relative differences in association measures between weighted and unweighted methods. However, a high relative difference of age was associated with the high relative difference of association measure. These results indicate that proportions of the sociodemographic variables are adjusted after applying the weighting methods; however, it does not necessarily affect the association between variables.
The impact of weighting was limited in some datasets, while differences were found in others. The discrepancies between weighted and unweighted databases were   significantly affected by the sample size, followed by age relative difference and gender relative difference. A possible explanation could be that when analyzing the use of weights to compensate for the distributions of different variables, some factors of interest may not always have a strong enough link with the demographic variables; thus, the weighting method could not correct any biases. Consequently, the impact of weighting depends on the variables of interest and how these variables are related to the sociodemographic variables. As a result, the decision to weight samples will be based on the study objective, design, and type of outcome.
Our work showed that the initial gap between the sample and the population, in addition to the sample size and the presence of a significant association between some sociodemographic variables and the dependent variable, could all impact the association measure, but in differential ways: correcting for age gap would improve association measures, but not gender gap correction. Similarly, other researchers had previously reported that weighting techniques can compensate for proportionality but not always representativeness because some factors of interest do not always have a strong enough link with the demographic weighting variables [16]. Thus, adjusting for proportionate overrepresentation and  underrepresentation of specific respondent categories does not imply that the substantive responses of online access panel respondents are equal to those of the general population [16]. Oppositely, according to Bethlehem and Stoop, one or more qualitative auxiliary variables are required for the weighting method. Nevertheless, even if the target variable and the stratification variables have a strong relationship, the change in the target variable's values appears very low [16].
Our results showed that the larger the sample size, the lower the impact on the association measure; in other words, lower samples derived association measures are more affected if not weighted. This finding corroborates the principle that large sample sizes and high response rates positively influence the quality of estimates, according to the theoretical framework of probability sampling [17]. Similarly, a large-scale study that used 17 samples from online surveys found that bigger sample size (lower margin of sampling error around the estimate) is associated with a better level of precision [12]; large-scale online surveys have the advantage that specific subgroups can be identified [16]. The fundamental assumption is that people who engage in an online survey, whether elderly single women, less educated people, ethnic minorities, or other usually underrepresented groups, are equivalent to those who do not engage in online surveys [16], even though people who belong to these groups are hard to reach or unlikely to participate in surveys [16]. However, one study has addressed the erroneous idea that larger samples imply more valid replies [18], showing that larger samples do not always yield better estimates than smaller ones from non-probability samples, while a larger sample size can lead to greater accuracy only with probability samples [18]. Similarly, according to Bryman and Bell, precision cannot be guaranteed with a large sample size [19]. Thus, additional studies are required to further depict these findings.
Our findings reinforce the variability of results found in the literature about the application of weighting methods in scientific surveys. It is unclear whether or not online surveys can be made more representative [20]. While weighted samples are expected to be more representative than unweighted ones, this study could demonstrate that this is not always the case. As a result, one cannot simply assume that using the weighted method will always result in a more accurate estimate of the population studied. Also, it cannot be concluded that the unweighted technique will always yield more conservative sample homogeneity suggestions than the sample-weighted method, as demonstrated by previous findings showing that demographic weighting reduced bias in some cases and increased it in others [11,12]. A study compared data from a self-administered online survey with the answers collected in a face-to-face interview and found that the results were not significantly affected by weights on age, gender, or education [8]. Another study compared two datasets collected online and showed that the impact of the weighting method on the results was very limited [1]. Other findings revealed that non-probability samples significantly differed from probability samples, particularly in terms of attitudes and behaviors, even after making them demographically similar to target groups [11,[21][22][23].

Limitations
This study has several limitations. The online samples were not compared to face-to-face interviews, which could have presented more reasonable results. The sampling selectivity and the inconsistency of variables used on each survey may have affected the results. Conclusions comparing inequities in weighted and unweighted populations may change depending on the variable of interest. Variables other than demographics were not taken into account for adjustment, which could also affect the results. Different weighting techniques, such as the propensity score technique, were not applied.

Conclusion
The results of this analysis of online surveys indicate that weighting methods should be used cautiously, as weighting did not affect the results in some datasets, while it did in others. Weighting methods might yield unpredictable results, depending on variable gaps, sample size, and the association between sociodemographic characteristics used for adjustment and dependent variables. Further research is necessary to define situations in which weighting would be beneficial.