Skip to main content

Should samples be weighted to decrease selection bias in online surveys during the COVID-19 pandemic? Data from seven datasets

Abstract

Background

Online surveys have triggered a heated debate regarding their scientific validity. Many authors have adopted weighting methods to enhance the quality of online survey findings, while others did not find an advantage for this method. This work aims to compare weighted and unweighted association measures after adjustment over potential confounding, taking into account dataset properties such as the initial gap between the population and the selected sample, the sample size, and the variable types.

Methods

This study assessed seven datasets collected between 2019 and 2021 during the COVID-19 pandemic through online cross-sectional surveys using the snowball sampling technique. Weighting methods were applied to adjust the online sample over sociodemographic features of the target population.

Results

Despite varying age and gender gaps between weighted and unweighted samples, strong similarities were found for dependent and independent variables. When applied on the same datasets, the regression analysis results showed a high relative difference between methods for some variables, while a low difference was found for others. In terms of absolute impact, the highest impact on the association measure was related to the sample size, followed by the age gap, the gender gap, and finally, the significance of the association between weighted age and the dependent variable.

Conclusion

The results of this analysis of online surveys indicate that weighting methods should be used cautiously, as weighting did not affect the results in some databases, while it did in others. Further research is necessary to define situations in which weighting would be beneficial.

Peer Review reports

Background

Generally used for marketing purposes, online surveys have recently become a popular data-gathering tool in scientific research [1], mainly helpful during the COVID-19 pandemic. Besides protecting data collectors from infection, cost savings, simplicity of data collection, ease of processing findings, flexibility in questionnaire design, and the ability to contact respondents across national borders are all arguments in its favor [1]. However, the use of web surveys has triggered a heated debate regarding their scientific validity [2, 3].

The main argument against web surveys is the selection bias of the sample, which is not chosen at random, the target population being a convenience sample rather than a probability sample [1]. This non-probability method of selection is generally problematic, leading to an unequal probability of selection. Bias further occurs since specific characteristics (such as age, education, gender) are under- or over-represented in the gathered sample, thus impacting the reliability of the results [1]. Even a well-designed sampling plan would frequently result in the survey being completed by too many women and not enough men or by too many young people and not enough elderly individuals. Furthermore, all these factors might be linked to different health-related variables, attitudes, and behaviors that survey researchers are interested in [4].

Selection bias occurs in studies that use online surveys as it only reaches a subgroup of the target population [5]. Only literate people, those who have access to the internet, and those sufficiently interested in the topic can complete online surveys [5]. For example, when a subgroup is targeted (thus overrepresented, such as literate people or those with access to the internet), selection bias will generally increase as the target population becomes less diverse, resulting in biased findings [5, 6]. Sometimes, a survey about COVID-19 would only attract a specific subgroup of people interested in the topic. However, during infectious disease outbreaks, a quick online survey is necessary to reach a large number of people in a short time to collect the needed information [7]. Moreover, various types of problems and errors are encountered in the data collected online (information bias), leading to concerns about the quality and reliability of the resulting scientific information [1].

To overcome biases and improve the quality of online survey findings, many authors have adopted weighting methods [1, 4], such as rectifying imbalances between the survey sample and the population by applying these methods to adjust demographic characteristics (gender, age, ethnicity, educational background, and geographic area) [4]. Because some factors of interest may not always have a strong enough link with demographic weighting variables, weighting methods can only compensate for proportionality, not always representativeness [8]. Hence, the considerable debate about weighting methods and their effect on variance during analysis, as some researchers claim that weighting has little potential for eliminating biases in web surveys [9]. As variance is used to calculate confidence intervals and hypothesis tests, weighting data would raise the variance of estimates [10], leading to a loss of accuracy [10]. Nevertheless, researchers are often willing to accept inaccuracy to obtain unbiased estimates [10].

A direct comparison of unweighted and weighted samples has rarely been performed in the literature [11, 12]. From a practical perspective, comparing the two techniques is critical because they may provide different findings of the overall impact strength, outcome consistency across studies, and other variables’ effect on the association.

Two studies comparing weighted and unweighted estimates from online samples have revealed that demographic weighting decreased bias in some situations while it substantially increased it in others [11, 12]. Recent research using aggregated data to evaluate racial/ethnic inequities in COVID-19 mortality has found that weighted population distributions underestimated the excess burden of COVID-19 among African American and Latin individuals, compared with analyses conducted with an unweighted population [13].

Consequently, this work aims to compare weighted and unweighted association measures after adjustment over potential confounding, taking into account dataset properties such as the initial gap between the population and the selected sample, the sample size, and the variable types.

Methods

Databases

This study assessed seven datasets of different sample sizes collected by our team between 2019 and 2021 during the COVID-19 pandemic through online cross-sectional surveys using the snowball sampling technique. All seven datasets consisted of basic demographic variables (including age and gender), major independent variables, and different outcome variables.

Procedure

Identical questions measuring basic demographics were used in each database. Weighting techniques were applied and mostly accounted for sociodemographic differences between the online sample and the target population.

The formula of such weights [14] was: wi = pp/ps, where pp is the population proportion, and ps is the (web) sample proportion.

In each database, a major outcome variable associated with the demographic variables was chosen, in addition to an independent variable. Weighted versus unweighted results were compared in all datasets. Details about each dataset are presented in Table 1.

Table 1 Description of the seven datasets used

Data analysis

Data were analyzed using SPSS software version 25. Weighting was performed according to the number of inhabitants by age group and gender, as described by the latest official version of the Lebanese population estimates [15]. In descriptive statistics, means and standard deviations were considered for continuous variables and counts and percentages for categorical variables. Associations between dichotomous variables were calculated using OR, while beta coefficients served to assess associations between quantitative variables.

In each dataset, the relative difference between estimates was calculated to assess the gap between the sample and the population figures, measured by the absolute change between weighted and unweighted values in comparison to the unweighted value (Relative difference = (unweighted value–weighted value)/unweighted value). The function log base 10 (Log10) was used to stabilize variation within the values of the used variables with non-normal distribution. A further step in the analysis was to compare the correlation of the values of the variables in all datasets between weighted and unweighted methods using Pearson’s correlation coefficient. Multiple regressions were conducted, comparing weighted versus unweighted results from datasets primary data: multiple linear regressions when the dependent variable (DV) was continuous and logistic regressions when the DV was dichotomous.

Finally, multivariable regression analyses were conducted on secondary data to assess the effect of the gap in independent variables on the adjusted OR or beta coefficient (between the independent and dependent variables). In other words, this effect was assessed through the impact of the relative difference of age and gender on the relative change in adjusted OR or beta coefficient. The presence of a significant association between age, gender, and independent variable (IV) with the DV, using the weighted and unweighted methods in each dataset, was also taken into account. In all cases, a p-value < 0.05 was considered statistically significant.

Results

Description of age and gender using simple unweighted and weighted methods

Table 2 shows the distribution of age and gender in the seven datasets using simple unweighted and weighted methods. The proportions differed regarding age and gender. For example, in the first dataset, a high relative difference was mainly found in participants older than 45 (250%); a similar result was found in the third dataset for age < 35 years. Similarly, in the fifth dataset, a high relative difference was found between the two groups, essentially in those aged over 45 years (251.72%). In other subgroups, the relative difference could be as low as 3% in dataset 5 and 6.5% in dataset 6.

Table 2 Description of age and gender using simple (unweighted) and weighted methods

Description of variables using the weighted and unweighted methods

Table 3 summarizes the description of dependent variables (DV) and independent variables (IV) using simple (unweighted) and weighted methods. The weighting applied on demographic characteristics showed low relative differences, and the values were very similar between the two groups, whether variables were continuous or categorical. The bivariate analysis between the independent variables and the dependent variables are presented in the supplementary table 1.

Table 3 Description of the dependent and independent variables used in the databases

Correlation between unweighted and weighted values

A strong positive correlation was found between the values of weighted and unweighted data taking into account the values of gender, age, dependent variables, and independent variables (r = 0.918, p < 0.001) (Fig. 1). Although lower than that of dependent variables (r = 1.000, p < 0.001), a positive correlation was found between unweighted and weighted values of age (r = 0.824, p < 0.001), gender (r = 0.780, p = 0.001), and independent variables (r = 1.000, p < 0.001).

Fig. 1
figure 1

Correlation between weighted and unweighted data according to age, gender, dependent variables (DV), and independent variables (IV)

Correlation between relative differences of variables and association measure

A strong correlation was found between age relative difference (r = 0.863, p = 0.012) and the sample size (r = -0.891, p = 0.007) with the adjusted OR relative difference (Table 4). No significant association was found between the adjusted OR relative difference, gender, and the independent variable relative differences.

Table 4 Correlation between Relative differences

Multivariable analysis comparing weighted and unweighted samples

Table 5 displays the results of weighted and unweighted multivariable models (linear or logistic regressions), showing discrepancies between models.

Table 5 Multivariable analysis taking an outcome variable as the dependent variable

In the first dataset (N = 310), the association of the independent variable (attitude toward COVID-19) with the dependent variable (practice toward COVID-19) remained not significant (p-value > 0.05) between the two methods used. However, there was an increase in the relative difference by 133.33% between unweighted and weighted values.

In the second dataset (N = 509), the association of the independent variables (fear of COVID-19 and financial well-being) with the dependent variables (stress, anxiety, and insomnia) remained significant in both methods when considering the three dependent variables, except for the model where the dependent variable was anxiety (LAS-10). In the latter, the financial well-being scale (IV) yielded a significant association in the unweighted regression (p = 0.02) but a non-significant result in the weighted regression (p = 0.38). The weighted beta value was 98% lower than the unweighted beta value.

In the third dataset (N = 202), the association of the independent variable (fear of COVID-19) with the dependent variables (knowledge and practice) was not significant in the unweighted sample. However, a statistically significant association was found in the weighted sample. A relative increase in beta value was found for gender in the weighted method, with a beta decrease of 150% for the independent variable. When considering the attitude scale as the dependent variable, no significant association was found between the IV and the DV using the two methods.

In the fourth dataset (N = 2336), the association of the independent variable (preventive measure scale) with the dependent variable (having been diagnosed or not with COVID-19) was not significant in the unweighted sample. However, a statistically significant association was found in the weighted sample. Relative differences in OR varied between -1% and 1% after weighting.

In the fifth dataset (N = 324), the association of the independent variables (soft skills and emotional intelligence) with the dependent variable (burnout scale) yielded different results. It was significant for soft skills in both methods, while emotional intelligence remained non-significant when using the two methods, with a p-value tending to be significant in the weighted sample. A negative relative difference was found for the independent variable after weighting.

In the sixth dataset (N = 405), the association of the independent variable (knowledge scale) with the dependent variable (stigma discrimination scale) was significant in both methods. The fear of COVID-19 and anxiety remained non-significant when using the two methods. A decrease or increase in the relative difference was found after weighting.

In the seventh dataset (N = 410), the association of the major independent variables (fear of COVID-19 and anxiety) with the dependent variable (eating behaviors) was significant in both methods. The boredom scale remained non-significant when using the two methods. Relative differences varied after weighting.

Secondary data analysis: factors affecting the relative change of major association measures

Table 6 displays the association between age, gender, independent variable gaps (between sample and population), associations significance, and the sample size with the major association relative change. The results showed that a larger sample size (Beta = -0.001, p = 0.001), a higher gender gap (Beta = -0.007, p = 0.003), and the presence of a significant association between weighted age and the DV (Beta = -0.221, p = 0.013) would significantly decrease the relative change of the major association. However, a higher age gap (Beta = 0.010, p = 0.005) was significantly associated with a higher relative change in the major association. In terms of absolute impact, the highest impact on the association measure was related to sample size, followed by age relative difference, gender relative difference, and finally, the significance of the association between weighted age and the dependent variable.

Table 6 Linear regression taking the relative change in the major association as the dependent variable

Discussion

Our study compared weighted and unweighted samples of online surveys and assessed the extent to which weighting methods can adjust the web sampling to the reference sample and how it would affect the results. Our findings revealed a high variation of age and gender between weighted and unweighted samples within the same population; however, high similarities were found for dependent and independent variables in terms of relative difference measures.

The regression analysis results showed a high relative difference between weighting and unweighting methods in some datasets and for some variables, while a low difference was found for others; association measures would increase or decrease after weighting. These discrepancies could be explained by the large sample size and the high relative difference in gender, related to lower relative differences in association measures between weighted and unweighted methods. However, a high relative difference of age was associated with the high relative difference of association measure. These results indicate that proportions of the sociodemographic variables are adjusted after applying the weighting methods; however, it does not necessarily affect the association between variables.

The impact of weighting was limited in some datasets, while differences were found in others. The discrepancies between weighted and unweighted databases were significantly affected by the sample size, followed by age relative difference and gender relative difference. A possible explanation could be that when analyzing the use of weights to compensate for the distributions of different variables, some factors of interest may not always have a strong enough link with the demographic variables; thus, the weighting method could not correct any biases. Consequently, the impact of weighting depends on the variables of interest and how these variables are related to the sociodemographic variables. As a result, the decision to weight samples will be based on the study objective, design, and type of outcome.

Our work showed that the initial gap between the sample and the population, in addition to the sample size and the presence of a significant association between some sociodemographic variables and the dependent variable, could all impact the association measure, but in differential ways: correcting for age gap would improve association measures, but not gender gap correction. Similarly, other researchers had previously reported that weighting techniques can compensate for proportionality but not always representativeness because some factors of interest do not always have a strong enough link with the demographic weighting variables [16]. Thus, adjusting for proportionate overrepresentation and underrepresentation of specific respondent categories does not imply that the substantive responses of online access panel respondents are equal to those of the general population [16]. Oppositely, according to Bethlehem and Stoop, one or more qualitative auxiliary variables are required for the weighting method. Nevertheless, even if the target variable and the stratification variables have a strong relationship, the change in the target variable's values appears very low [16].

Our results showed that the larger the sample size, the lower the impact on the association measure; in other words, lower samples derived association measures are more affected if not weighted. This finding corroborates the principle that large sample sizes and high response rates positively influence the quality of estimates, according to the theoretical framework of probability sampling [17]. Similarly, a large-scale study that used 17 samples from online surveys found that bigger sample size (lower margin of sampling error around the estimate) is associated with a better level of precision [12]; large-scale online surveys have the advantage that specific subgroups can be identified [16]. The fundamental assumption is that people who engage in an online survey, whether elderly single women, less educated people, ethnic minorities, or other usually underrepresented groups, are equivalent to those who do not engage in online surveys [16], even though people who belong to these groups are hard to reach or unlikely to participate in surveys [16]. However, one study has addressed the erroneous idea that larger samples imply more valid replies [18], showing that larger samples do not always yield better estimates than smaller ones from non-probability samples, while a larger sample size can lead to greater accuracy only with probability samples [18]. Similarly, according to Bryman and Bell, precision cannot be guaranteed with a large sample size [19]. Thus, additional studies are required to further depict these findings.

Our findings reinforce the variability of results found in the literature about the application of weighting methods in scientific surveys. It is unclear whether or not online surveys can be made more representative [20]. While weighted samples are expected to be more representative than unweighted ones, this study could demonstrate that this is not always the case. As a result, one cannot simply assume that using the weighted method will always result in a more accurate estimate of the population studied. Also, it cannot be concluded that the unweighted technique will always yield more conservative sample homogeneity suggestions than the sample-weighted method, as demonstrated by previous findings showing that demographic weighting reduced bias in some cases and increased it in others [11, 12]. A study compared data from a self-administered online survey with the answers collected in a face-to-face interview and found that the results were not significantly affected by weights on age, gender, or education [8]. Another study compared two datasets collected online and showed that the impact of the weighting method on the results was very limited [1]. Other findings revealed that non-probability samples significantly differed from probability samples, particularly in terms of attitudes and behaviors, even after making them demographically similar to target groups [11, 21,22,23].

Limitations

This study has several limitations. The online samples were not compared to face-to-face interviews, which could have presented more reasonable results. The sampling selectivity and the inconsistency of variables used on each survey may have affected the results. Conclusions comparing inequities in weighted and unweighted populations may change depending on the variable of interest. Variables other than demographics were not taken into account for adjustment, which could also affect the results. Different weighting techniques, such as the propensity score technique, were not applied.

Conclusion

The results of this analysis of online surveys indicate that weighting methods should be used cautiously, as weighting did not affect the results in some datasets, while it did in others. Weighting methods might yield unpredictable results, depending on variable gaps, sample size, and the association between sociodemographic characteristics used for adjustment and dependent variables. Further research is necessary to define situations in which weighting would be beneficial.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Abbreviations

COVID-19:

Coronavirus disease 2019

BDS-22:

Beirut Distress Scale

LAS-10:

Lebanese Anxiety Scale

LIS-18:

Lebanese Insomnia Scale

SDS-11:

Stigma Discrimination Scale

EDE:

Eating Disorder Examination questionnaire

SPSS:

Statistical Package for the Social Sciences

DV:

Dependent variable

IV:

Independent variable

OR:

Odds Ratio

References

  1. Steinmetz S, Tijdens K, de Pedraza P. Comparing different weighting procedures for volunteer web surveys: Lessons to be learned from German and Dutch WageIndicator data. 2009.

    Google Scholar 

  2. Couper MP. Web surveys: A review of issues and approaches. The Public Opinion Quarterly. 2000;64(4):464–94.

    CAS  Article  Google Scholar 

  3. Fricker RD, Schonlau M. Advantages and disadvantages of Internet research surveys: evidence from the literature. Field Methods. 2002;14(4):347–67.

    Article  Google Scholar 

  4. Pew Research Center. How different weighting methods work. 2018. Available at : https://www.pewresearch.org/methods/2018/01/26/how-different-weighting-methods-work/. Accessed 9 July 2021.

  5. Eysenbach G, Wyatt J. Using the Internet for surveys and health research. J Med Int Res. 2002;4(2):e13.

    Google Scholar 

  6. Andrade C. The limitations of online surveys. Indian J Psychol Med. 2020;42(6):575–6.

    Article  Google Scholar 

  7. Geldsetzer P. Use of rapid online surveys to assess people’s perceptions during infectious disease outbreaks: a cross-sectional survey on COVID-19. J Med Int Res. 2020;22(4):e18790.

    Google Scholar 

  8. Loosveldt G, Sonck N, editors. An evaluation of the weighting procedures for an online access panel survey. Surv Res Methods. 2008;2(2):93–105.

  9. Lee S. Propensity score adjustment as a weighting scheme for volunteer panel web surveys. Journal Off Stat. 2006;22(2):329.

    Google Scholar 

  10. Sheffel A, Wilson E, Munos M, Zeger S. Methods for analysis of complex survey data: an application using the Tanzanian 2015 demographic and health survey and service provision assessment. J Glob Health. 2019;9(2):020902.

    Article  Google Scholar 

  11. Yeager DS, Krosnick JA, Chang L, Javitz HS, Levendusky MS, Simpser A, et al. Comparing the accuracy of RDD telephone surveys and internet surveys conducted with probability and non-probability samples. Public Opin Q. 2011;75(4):709–47.

    Article  Google Scholar 

  12. Gittelman SH, Thomas RK, Lavrakas PJ, Lange V. Quota controls in survey research: a test of accuracy and intersource reliability in online samples. J Advert Res. 2015;55(4):368–79.

    Article  Google Scholar 

  13. Cowger TL, Davis BA, Etkins OS, Makofane K, Lawrence JA, Bassett MT, et al. Comparison of weighted and unweighted population data to assess inequities in coronavirus disease 2019 deaths by race/ethnicity reported by the US centers for disease control and prevention. JAMA Netw Open. 2020;3(7):e2016933-e.

    Article  Google Scholar 

  14. Royal KD. Survey research methods: a guide for creating post-stratification weights to correct for sample bias. Educ Health Prof. 2019;2(1):48.

    Article  Google Scholar 

  15. Central Administration of Statistics. Population statistics. 2020. http://www.cas.gov.lb/index.php/demographic-and-social-en/population-en, Accessed 20 Aug 2021.

  16. Bethlehem J, Stoop I. Online panels. A paradigm theft? The challenges of a changing world. 2007. p. 113–31.

    Google Scholar 

  17. Taherdoost H. Sampling methods in research methodology; how to choose a sampling technique for research. How to choose a sampling technique for research (April 10, 2016). 2016.

    Google Scholar 

  18. Couper M. The promises and perils of web surveys, ASC conference. The challenge of the internet, latimer, grande-bretagne. 2001.

    Google Scholar 

  19. Bell E, Bryman A, Harley B. Business research methods: Oxford. United Kingdom: Oxford University Press; 2018.

    Google Scholar 

  20. Taylor H. Does internet research work? Int J Mark Res. 2000;42(1):1–11.

    CAS  Article  Google Scholar 

  21. Baim J, Galin M, Frankel MR, Becker R, Agresti J, editors. ‘Sample Surveys Based on Internet Panels: 8 Years of Learning. Valencia, Spain: Worldwide Readership Symposium; 2009.

    Google Scholar 

  22. Dever JA, Rafferty A, Valliant R, editors. Internet surveys: Can statistical adjustments eliminate coverage bias? Surv Res Methods. 2008;2(2):47–60.

  23. Piekarski L, Galin M, Baim J, Frankel M, Augemberg K, Prince S, editors. Internet access panels and public opinion and attitude estimates. 63rd Annual conference of the American Association for Public Opinion Research. New Orleans, LA. 2008.

Download references

Acknowledgements

Not applicable.

Funding

None.

Author information

Authors and Affiliations

Authors

Contributions

PS designed the study; CH drafted the manuscript; CH and PS carried out the analysis and interpreted the results; RZ, AH, MA, KI, and HS assisted in drafting and reviewing the manuscript; HS critically reviewed and edited the article for English language; PS supervised the course of the article. All authors reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Chadia Haddad.

Ethics declarations

Ethics approval and consent to participate

The seven datasets were approved by different IRBs: Psychiatric Hospital of the Cross Ethics Committee (HPC-010–2020), Review Board of the American University of Science and Technology, AUST-IRB-20200527–01, Lebanese Hospital Geitawi (2020-IRB-023), Zahraa Hospital Ethics Committee (9/2020), HPC-013–2018, HPC-038–2020, and HPC-018–2020 respectively. All methods were carried out in accordance with relevant guidelines and regulations. Informed consent was obtained from each person on the first page of the questionnaire.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Haddad, C., Sacre, H., Zeenny, R.M. et al. Should samples be weighted to decrease selection bias in online surveys during the COVID-19 pandemic? Data from seven datasets. BMC Med Res Methodol 22, 63 (2022). https://doi.org/10.1186/s12874-022-01547-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12874-022-01547-3

Keywords

  • Weighting
  • Online surveys
  • Relative difference
  • Bias
  • COVID-19
  • Pandemic