 Research
 Open Access
 Published:
COVID19 impact on mental health
BMC Medical Research Methodology volume 22, Article number: 15 (2022)
Abstract
Background
The coronavirus disease 2019 (COVID19) pandemic has posed a significant influence on public mental health. Current efforts focus on alleviating the impacts of the disease on public health and the economy, with the psychological effects due to COVID19 relatively ignored. In this research, we are interested in exploring the quantitative characterization of the pandemic impact on public mental health by studying an online survey dataset of the United States.
Methods
The analyses are conducted based on a large scale of online mental healthrelated survey study in the United States, conducted over 12 consecutive weeks from April 23, 2020 to July 21, 2020. We are interested in examining the risk factors that have a significant impact on mental health as well as in their estimated effects over time. We employ the multiple imputation by chained equations (MICE) method to deal with missing values and take logistic regression with the least absolute shrinkage and selection operator (Lasso) method to identify risk factors for mental health.
Results
Our analysis shows that risk predictors for an individual to experience mental health issues include the pandemic situation of the State where the individual resides, age, gender, race, marital status, health conditions, the number of household members, employment status, the level of confidence of the future food affordability, availability of health insurance, mortgage status, and the information of kids enrolling in school. The effects of most of the predictors seem to change over time though the degree varies for different risk factors. The effects of risk factors, such as States and gender show noticeable change over time, whereas the factor age exhibits seemingly unchanged effects over time.
Conclusions
The analysis results unveil evidencebased findings to identify the groups who are psychologically vulnerable to the COVID19 pandemic. This study provides helpful evidence for assisting healthcare providers and policymakers to take steps for mitigating the pandemic effects on public mental health, especially in boosting public health care, improving public confidence in future food conditions, and creating more job opportunities.
Trial registration
This article does not report the results of a health care intervention on human participants.
Background
Since the outbreak of the COVID19 pandemic, people's lifestyle has been changed significantly. However, no sufficient resources have been available to attenuate the pandemic effects on mental health and wellbeing [1]. Various studies have been conducted to investigate how the COVID19 pandemic may affect people psychologically. For example, Cao et al. [2] conducted a survey on college students in China and showed that more than 24% of the students were experiencing anxiety. Spoorthy et al. [3] investigated the mental health problems faced by healthcare workers during the COVID19 pandemic.
While those studies provided descriptive results by summarizing the information obtained from the questionnaire, it is unclear how the impact of COVID19 changes over time; what factors are relevant to describe the impact of the pandemic; and how the severity of the mental health issues is quantitatively associated with the risk factors. In this paper, we examine these questions and aim to provide some quantitative insights. Our explorations are carried out using a large scale online public survey study conducted by the U.S. Census Bureau [4]. The data include twelve data sets each collected in a 1week window over 12 consecutive weeks from April 23, 2020 to July 21, 2020. Different data sets contain the measurements from different participants on the same questions. Among the 12 data sets, the smallest one contains 41,996 subjects and the largest one has 132,961 participants. We treat the survey in each week as an independent study. We are interested in assessing how the effects of the associated risk factors may change over time by applying the same method to each of the 12 data sets separately.
The survey includes multiple questions perceived to be relevant to describing the impact of the pandemic on the public. To quantitatively identify the risk factors for impacting the mental health by the pandemic, we engage the penalized logistic regression method, with the least absolute shrinkage and selection operator (Lasso) penalty [5]. However, a direct application of the Lasso method is not possible due to the presence of missing observations. To handle missing values, we employ the multiple imputation by chained equations (MICE) method (e.g., [6, 7]). Further, survey data commonly involve measurement error due to recall bias, the inability of providing precise descriptions of some answers, and reporting errors. It is imperative to address this issue when preprocessing the data. To this end, we combine the levels of those highly related categorical variables to mitigate the measurement error effects.
Methods
Original survey data
The data used in this project are from phase 1 of the Household Pulse Survey conducted by the U.S. Census Bureau [4] from April 23, 2020 to July 21, 2020 for 12 consecutive weeks, giving rise to 12 data sets each for a week. The survey aims to study the pandemic impacts on the households across the United States from social and economic perspectives. The survey contains 50 questions ranging from education, employment, food sufficiency, health, housing, social security benefits, household spending, stimulus payments, to transportation. The participants of the survey come from all the 50 states plus Washington, D.C., United States, aging from 18 to 88. The gender ratio (the ratio of males to females) remains fairly stable ranging between 0.6 and 0.7 over the 12 weeks. Figure S1 in the Supplementary Material shows the curves of the number of cumulative confirmed cases for all the states which are grouped into four categories of the severity of the pandemic, derived from the data from the Centers for Disease Control and Prevention [8]. Table 1 lists the state members for each category, together with the total number of participants over the 12 weeks and the corresponding percentage for each category. It is seen that the majority (72.5%) of the participants of the survey come from the states with mild pandemic and the least proportion (2.3%) of subjects are from the states with a serious pandemic.
Preprocessing the data to reduce errors
Among the initial 50 questions, nine questions, such as “Where did you get free groceries or free meals” and “How often is the Internet available to children for educational purposes”, are excluded because they are not perceived as sustainable factors on affecting mental health. Measurement error is typically involved in survey data. Prior to a formal analysis of the data, we implement a preprocessing procedure to mitigate the measurement error effects by combining questions to create new variables, or collapsing levels of variables to form binary variables.
Information on mental health is collected via four questions concerning anxiety, worry, loss of interest, and feeling down. Each question is a fourlevel Likert item [9] with values 1, 2, 3 and 4, showing the degree of each aspect for the past 7 days prior to the survey time. In contrast to Twenge and Joiner [10] who combined the measurements of the first two questions anxiety and worry to indicate the anxiety level and the last two questions loss of interest and feeling down to show the depression level, we define a single binary response to reflect the mental health status of an individual by combing measurements of the four variables. The response variable takes value 1 if the average of the scores of the four variables is greater than 2.5, and 0 otherwise, where the threshold 2.5 is the median value for each question. This binary response gives a synthetic way to indicate the mental health status which is easier thaeach question. This binary response gives a synthetic wayn examining measurements of multiple variables.
Two variables describe the loss of work: Wrkloss indicates whether an individual in the household experiences a loss of employment income since March 13, 2020; Expctloss indicates if the individual expects a member in the household to experience a loss of employment income in the next 4 weeks because of the COVID19 pandemic. These two variables are combined to form a single indicator which is denoted Wrkloss, with value 1 if at least one of these two events happens. Two ordinal variables, Prifoodsuf and Curfoodsuf, are used to describe the food sufficiency status before the pandemic and at present, respectively. The Foodcon.change variable is constructed by comparing the current and the previous food sufficiency status to form a binary variable, taking 1 if the current food sufficiency status is no worse than the food status before the pandemic, and 0 otherwise. Variable Med.delay.notget is combined from two indicator variables Delay (indicating if medical care is delayed) and Notget (indicating if the medical care is not received), taking value 1 if either medical care is delayed or no medical care is received, and 0 otherwise. Predictor Mort.prob is combined from one binary variable and an ordinal variable, taking 1 if a participant does not pay last month’s rent or mortgage or does not have enough confidence in paying the next rent or mortgage payment on time, and 0 otherwise. In addition, three ordinal variables, Emppay, Healins, and Schoolenroll, are modified by collapsing their levels to form binary categories. Emppay has value 1 if he/she gets paid for the time he/she is not working, and 0 otherwise. Healins has value 1 if the individual is currently covered by the health insurance, and 0 otherwise. Schoolenroll has value 1 if there is a child in the household enrolled in school, and 0 otherwise. Except for the variables discussed above, the remaining variables are kept as in the original form.
The final data include the binary response (indicating the mental health status of an individual) and 25 predictors measuring various aspects of individuals. To be specific, nine predictors show basic information: State, Age, Male, Rhispanic, Race, Educ, MS (marital status), Numper (the number of people in the household), and Numkid (the number of people under 18 in the household); five variables concern the income and employment: Income, Wrkloss, Anywork, Kindwork, and Emppay; five variables are related to food: Foodcon.change, Freefood, Tspndfood, Tspndprpd, and Foodconf; three variables pertain to health and insurance: Hlthstatus, Healins, and Med.delay.notget; one variable, Mort.prob, is for mortgage and housing; and two variables, Schoolenroll and Ttch_Hrs, reflect child education. The variable dictionary for the preprocessed data is shown in Table 2.
Missing observations
In the data sets, 17 covariates together with the response variable have missing observations. To provide a quick and intuitive sense of the missingness proportions for different variables over the 12 data sets, we combine those data sets by individual variable to form a single pooled data set. Then we calculate the missingness proportion for each variable by dividing the number of missing observations in the variable by the total number of subjects in the pooled data set. We display in Fig. 1 the missingness rates for those 17 risk factors and the response variable (mental health status) for the pooled data. The risk factors having the three highest missingness rates are the variables Ttch_hrs, Schoolenroll and Emppay, and the corresponding missingness rates are 76.7%, 66.9% and 60.5%, respectively. Five variables incur higher than 30% missingness proportions, and the missingness proportion for 12 risk factors is larger than 5%. The missingness proportion for the response variable is about 8.6%.
Missing values present a challenge for data analysis and model fitting. One may perform the socalled complete data analysis by deleting those subjects with missing observations or the socalled available data analysis by using all available data, and then repeating a standard analysis procedure. Such analyses are easy to implement, however, biased results are expected if the missing completely at random (MCAR) mechanism is not true. Here we consider a broader setting where missing data do not necessarily follow the MCAR but follow the missing at random (MAR) mechanism. We employ the MICE method which is developed under the MAR mechanism and applies to various types of variables such as continuous, binary, nominal, and ordinal variables subject to missingness. A detailed discussion on this method was provided by van Buuren et al. [11].
Here we employ the MICE method to accommodate missing observations that are present in both the predictors and the response. Following the suggestion of Allison [12], we choose to do five imputations for the data in each week by employing the same algorithm with different random seeds. The implementation is conducted in R (version 3.6.1) with the R package: Multivariate Imputation by Chained Equation (mice). The details on the R code are presented in the code availability in the Declarations section.
To empirically assess the imputation results, we take the data in week 6 as an example and compare the five imputed data sets to the original data by displaying their distribution using the R function density for the continuous variables; the results are reported in Figure S2 in the Supplementary Material. It is seen that the distributions of the 5 imputed data sets for the three continuous variables, Tspndfood, Tspndprpd, and Ttch_hrs, are fairly similar to that of the original data. Further, in Tables S1, S2, and S3 in the Supplementary Material, we report the proportions of different levels for the categorical variables for both the imputed and original data, showing the similarity in the distributions of the imputed data and of the original data.
Model building and inference
We intend to employ logistic regression with the Lasso penalty to analyze the data that contain a binary response and potentially related predictors or covariates. First, we introduce the basic notation and discuss the method in general terms. For i = 1, …, n, let Y_{i} represent the binary response with value 1 indicating that the mental health problem occurs for subject i and 0 otherwise. Let X_{ij} denote the jth covariate for subject i, where j = 1, …, p, and p is the number of predictors. Write X_{i} = (X_{i1}, X_{i2}, …, X_{ip})^{T} and let π_{i} = P(Y_{i} = 1 X_{i}).
Consider the logistic regression model
where β = (β_{1}, …,β_{p})^{T} denotes the vector of regression parameters. Consequently, the loglikelihood function for β is given by
To select the predictors associated with the dichotomous response, we employ the Lasso method. The Lasso estimates are the values that maximize the penalized loglikelihood function obtained by adding an L_{1} penalty to the expression (2):
where λ is the tuning parameter. The 10fold crossvalidation is employed to obtain a proper value for the tuning parameter and the onestandarderror rule [13] is applied to pick the most parsimonious model within one standard error of the minimum crossvalidation misclassification rate (e.g., [14]).
Model fitting and variable selection
The Lasso logistic regression is applied to each of the five imputed data sets for each week. The predictors corresponding to the nonzero coefficient estimates are considered the risk factors selected, which may be different across five imputed data sets for each of the 12 weeks. To explore in a full spectrum, we start with two extreme models, called the full model by including the union of all the selected risk factors by the Lasso logistic regression, and the reduced model by including only the common factors selected for all five imputed data sets in any week. The full model includes all the 25 predictors in the original data, and the reduced model contains 11 predictors: Age, Male, MS, Numkid, Wrkloss, Anywork, Foodconf, Hlthstatus, Healins, Med.delay.notget, and Mort.prob. We expect the predictors in the final model to form a set inbetween the sets of the predictors for the reduced mode and the full model. Now, the problem is how to find the final model using the reduced and full models. To this end, we carry out the following four steps.
In Step 1, we fit logistic regression with predictors in the full model and in the reduced model, respectively, to each of the five surrogate data sets for each of the 12 weeks.
In Step 2, the estimates and standard errors of the model coefficients for a given week are obtained using the algorithm described by Allison [12]. To be specific, let M = 5 be the number of surrogate data sets for the original incomplete data. Let β_{j} be the jth component of the model parameter vector β. For k = 1, …, M, let \({\hat{\beta}}_j^{(k)}\) denote the estimate of the model parameter β_{j} obtained from fitting the kth surrogate data set in a week and let \({S}_j^{(k)}\) be its associated standard error. Then the point estimate of β_{j} is given by the average of those estimates of β_{j} derived from the M imputed data sets:
To determine the variability associated with \({\hat{\beta}}_j\), one needs to incorporate both the within imputation variance, denoted V_{w}, and the between imputation variance, denoted V_{b}. According to Rubin’s rule [6], the total variance associated with the multiple imputation estimate \({\hat{\beta}}_j\) is given by \(Var\left({\hat{\beta}}_j\right)={V}_w+\left(1+\frac{1}{M}\right){V}_b\), where \({V}_w=\frac{1}{M}\sum_{k=1}^M{\left\{{S}_j^{(k)}\right\}}^2\), and the between imputation variance, \({V}_b=\frac{1}{M1}\sum_{k=1}^M{\left\{{\hat{\beta}}_j^{(k)}{\hat{\beta}}_j\right\}}^2\), is inflated by a factor \(\frac{1}{M}\) . As a result, the standard error associated with \({\hat{\beta}}_j\) is given by \(se\left({\hat{\beta}}_j\right)=\sqrt{Var\left({\hat{\beta}}_j\right)}\), i.e.,
We report in Tables S4 and S5 in the Supplementary Material the estimated results of the covariate effects obtained, respectively, from the full and reduced models for the data in 12 weeks, where the covariates marked with an asterisk are statistically significant with pvalues smaller than 0.05 for more than 6 weeks. It is found that in addition to those covariates included in the reduced model, fitting the full model also shows that five additional covariates, State, Rhispanic, Race, Numper, and Schoolenroll, are statistically significant for more than 6 weeks’ data. Table S5 shows that almost all the covariates in the reduced model are statistically significant, with all the pvalues derived from the data in 12 weeks smaller than 0.05.
Consequently, in Step 3, we take the 11 significant risk factors from the reduced model, and the 5 additional partially significant covariates suggested by fitting the full model, State, Rhispanic, Race, Numper, and Schoolenroll, to form the list of risk factors for the final model.
In Step 4, we construct the final model using the model form (1) to include the selected variables in Step 3 as predictors, where dummy variables are used to express categorical variables State, Race, MS, Foodconf, and Hlthstatus with levels more than two, yielding 28 variables in the model. The final model is then given by
where β_{j} is the regression coefficients for j = 0, 1, …, 28, and the subscript i is suppressed in π and the covariates for ease of exposition.
Then, we fit the final logistic model (6) to each of the imputed data sets for each of the 12 weeks; in the same manner as indicated by (4) and (5), we obtain the point estimates of the model parameters and the associated standard errors. To have a visual display, we plot in Fig. 2 the estimates of the coefficients for all the factors in the final model for 12 weeks; to precisely show the estimates, we report in Table 3 the point estimates for the covariate effects obtained from the final model, where we further calculate the average of the 12 estimates for each covariate and report the results in the last column. The associated standard errors and the pvalues are deferred to Table S6 in the Supplementary Material. The results suggest that the factors Numper, Healins and Schoolenroll are only significant in some of 12 weeks, while other factors in the final models are significant in all 12 weeks.
Results
Figure 2 shows that the absolute values of coefficient estimates for some levels of variables Foodconf and Hlthstatus are greater than 1 (in Fig. 2K and L). The coefficient estimates of Med.delay.notget over 12 weeks are between 0.5 and 1 (in Fig. 2N). Other variables have coefficient estimates between 0.5 and 0.5.
To have an overall sense of the estimates of the predictor effects in the final model, we now utilize the averages reported in the last column of Table 3 to estimate the relative change in the odds of having mental issues with one unit increase in a predictor from its baseline while keeping other predictors unchanged, yet leaving the associated variability uncharacterized. Let \({\overline{\hat{\beta}}}_j\) represent the average of those estimates of the covariate effect β_{j} over the 12 weeks for j = 1, …, 28, which is a sensible estimate of β_{j}, because the arithmetic average preserves the consistency if all the estimators obtained for the 12 weeks are consistent for β_{j}. Using \({\overline{\hat{\beta}}}_j\) is advantageous in offering us a single estimate of β_{j} with generally expected smaller variability than those estimates obtained from each of the 12 weeks. If \({\overline{\hat{\beta}}}_j\) is negative, then \(1{\exp}\left({\overline{\hat{\beta}}}_j\right)\) shows an estimate of the decrease in the odds of having mental issues relative to the baseline; if \({\overline{\hat{\beta}}}_j\) is positive, then \({\exp}\left({\overline{\hat{\beta}}}_j\right)1\) suggests an estimate of the increase in the odds of having mental issues relative to the baseline.
To be specific, for the variable State with large daily increases of cases as the baseline, people from mild pandemic States exhibit an estimate of 1 − exp (−0.139) ≈ 13% decrease in the odds of having mental issues; people from the States with moderate daily increases show an estimate of 1 − exp (−0.053) ≈ 5.16% degrease in the odds; people from serious pandemic States are generally associated with an estimate of 1 − exp (−0.039) ≈ 3.82% decrease in the odds.
For Age and Gender, their averages of the estimates over the 12 weeks are 0.030 and 0.228, respectively, implying that one unit increase in Age is associated with about an estimate of 1 − exp (−0.030) ≈ 2.96% decrease in the odds of occurrence of mental health problems; and being a male relative to a female is associated with an estimate of 1 − exp (−0.228) ≈ 20.39% decrease in the odds of having mental health issues. Similarly, the 12week estimated effects of Rhispanic indicate that the origin of Hispanic, Latino or Spanish is associated with a smaller odds of having mental issues than others. The 12week mean of the coefficient estimates of Rhispanic is 0.172, leading to an estimate of the odds of mental health problem occurrence being reduced by around 1 − exp (−0.172) ≈ 15.80%.
For the variable Race with White as the baseline, the 12week mean of coefficient estimates for Black (Race2) and Asian (Race3) are 0.446 and 0.262, respectively, yielding an estimate of the odds of occurrence of mental health issues for Black and Asian to be, respectively, 1 − exp (−0.446) ≈ 35.98% and 1 − exp (−0.262) ≈ 23.05% less than White.
For MS (marital status) with now married as the baseline, an estimate of the increase in the odds of having mental issues relative to the baseline, is exp(0.206) − 1≈22.88%, exp(0.236) − 1≈26.62%, exp(0.242) − 1≈27.38%, and exp(0.181) − 1≈19.84%, respectively, for people who are widowed (MS2), divorced (MS3), separated (MS4), or never married (MS5).
For predictors Numper and Numkid, the averages of the estimates suggest that the increase of the number of people and kids in the household is associated with the decrease of the odds of having mental issues. Specifically, one person increase in the household is associated with an estimate of 1 − exp (−0.024)≈2.37% decrease in odds, and one more kid in the household is associated with an estimate of 1 − exp (−0.106)≈10.06% decrease in the odds.
For the workrelated factors Wrkloss and Anywork, the results shown in the last column in Table 3 indicate that experiencing a loss of employment income since March 13, 2020 is associated with an estimate of exp(0.352) − 1≈42.19% increase in the odds of having mental issues, and doing any work during the last 7 days is associated with an estimate of 1 − exp (−0.141)≈13.15% decrease in the odds.
The 12week results of Foodconf in Table 3 show that, with the not at all confident on the future food affordability as the baseline, an increase in the confidence of food affordability is negatively associated with the odds of having mental issues. On average of 12 weeks, shown in the last column in Table 3, the more confident in the food affordability, the less the odds of having mental issues. For example, the person who is very confident (Foodconf4) in the food affordability for the next four weeks demonstrates an estimate of 1 − exp (−1.348)≈74.02% decrease in the odds of having mental issues, relative to the person who is not at all confident.
With excellent health conditions as the baseline, the estimates of Hlthstatus in Table 3 say that the worse the selfevaluated health condition, the larger the odds of having mental issues. Considering the worst level of health condition poor (Hlthstatus5) as an example, the average of the estimates over the 12 weeks yields that people in poor health conditions have an estimate of the odds of having mental issues exp(2.021)≈7.55 times higher than people of excellent health conditions. For other healthrelated predictors, Healins and Med.delay.notget, people who are currently covered by health insurance are associated with an estimate of 1 − exp (−0.083)≈7.96% decrease in the odds of mental issues occurrence, and people who do not get medical care or have delayed medical care are associated with an estimate of exp(0.684) − 1≈98.18% increase in the odds.
For Mort.prob and Schoolenroll, people having rental or mortgage problems are associated with an estimate of exp(0.232) − 1≈26.15% increase in the odds of having mental health problems, and people whose household has kids enrolled in school are associated with an estimate of exp(0.109) − 1≈11.52% increase in the odds of having mental issues.
In summary, the factors in the final model associated with a reduction in the odds of having mental health issues include: States not having large daily increases of cases, older in age, being male, having a Hispanic, Latino or Spanish origin, being nonWhite, more people or kids in the household, having job during the last 7 days, having confidence in the food affordability in the future, and being covered by insurance. The factors in the final model associated with the increase in the odds of getting mental issues are: not married, experiencing loss of job, poor selfevaluations on health conditions, having problems in getting medical care and mortgage, and having kids enrolled in school.
Discussion
In this paper we investigate the impact of the COVID19 pandemic on the public mental health using an online survey data set from the United States. Prior to the analysis, we preprocess the data by combining some levels of certain variables in the hope to ameliorate the effects of the errors that are often induced in survey data, including recall bias, reporting error, uncertainty in providing a precise assessment of the situation, inability to decide a right scale to a question, and inconsistency in the answers to the same question that is phrased differently [15]. In addition, some variables are quite similar or even identical in nature, thus, combining them can help alleviate unwanted noise. Further, we employ multiple imputation to account for the missingness effects, and use the penalized logistic regression with the Lasso penalty to select important risk factors for mental health.
While this study offers us quantitative evidence how the COVID19 pandemic can psychologically challenge the public, several limitations need to be pointed out. Firstly, the online survey data were designed to assess the pandemic impact from the social and economic perspectives, and they may not contain enough necessary factors related to mental health issues. In addition, the interaction effects between the predictors are not considered in our analysis, which may restrict the capacity of the model. Secondly, while the choice of M = 5 in our analysis follows the suggestion of Allison [12], it would be interesting to study how the variability may be incurred by setting different values for M.
Thirdly, though it is easy to see that the data exhibit arbitrary missingness patterns, or the socalled intermittent missing data patterns, it is difficult to tell what exactly the underlying missing data mechanism is, as in many other missing data problems [16]. Though the multiple imputation method is useful for handling missing data with the MAR mechanism [16], its performance can be considerably impacted by different proportions of missing values. Efforts of accounting for missingness effects do not always come to be rewarding. In the presence of excessive missing observations, the multiple imputation method, like any other method, can fail to yield sensible results even if the MAR mechanism is true. In such instances, one needs to be cautious to interpret the analysis results and be aware of potentially induced biases due to a high proportion of missing information.
Finally, in the analysis, we define the response variable to be binary by combining the information collected from four questions about mental health. While this approach gives a simple way to indicate the mental health status and is similarly taken by other authors (e.g., [10]), it is heuristic, as pointed out by a referee. It is thereby interesting to take the original four categorial variables as outcomes and conduct multivariate analysis to examine how those outcomes are associated with the covariates with missingness effects accommodated. Such analyses would be more sophisticated and require extra care to facilitate the association structures among the multiple response variables. Further, the yielded results may be less intuitive to interpret than those derived from using a single response variable.
Conclusions
The analysis results unveil evidencebased findings to identify the groups who are psychologically vulnerable to the COVID19 pandemic. This study provides helpful evidence to assist healthcare providers and policymakers to take steps for mitigating the pandemic effects on public mental health, especially in boosting public health care, improving public confidence in future food conditions, and creating more job opportunities.
Availability of data and materials
The data sets analyzed here are available in the Bureau of the Census, Household Pulse Survey Public Use File (PUF) repository [4], https://www.census.gov/programssurveys/householdpulsesurvey/datasets.html.
Abbreviations
 COVID19:

Coronavirus disease 2019
 MICE:

multiple imputations by chained equations
 Lasso:

least absolute shrinkage and selection operator
References
 1.
Taylor S. The psychology of pandemics: Preparing for the next global outbreak of infectious disease. Cambridge Scholars Publishing; 2019.
 2.
Cao W, Fang Z, Hou G, Han M, Xu X, Dong J, et al. The psychological impact of the COVID19 epidemic on college students in China. Psychiatry Research. 2020;287:112934.
 3.
Spoorthy MS, Pratapa SK, Mahant S. Mental health problems faced by healthcare workers due to the COVID19 pandemicA review. Asian J Psychiatr. 2020;51:102119.
 4.
United States Census Bureau (USCB). Household pulse survey public use file (PUF) [Internet]. Bureau of the Census; 2020. Available from: https://www.census.gov/programssurveys/householdpulsesurvey/datasets.html Accessed Oct 2020.
 5.
Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Series B Stat Methodol. 1996;58(1):267–88.
 6.
Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.
 7.
Yu LM, Burton A, RiveroArias O. Evaluation of software for multiple imputation of semicontinuous data. Stat Methods Med Res. 2007;16(3):243–58.
 8.
Center for Disease Control and Prevention (CDC). United States COVID19 cases and deaths by state over time [Internet]. CDC; 2020. Available from: https://data.cdc.gov/Casesurveillance/UnitedStatesCOVID19CasesandDeathsbyStateo/9mfqcb36/data Accessed Oct 2020.
 9.
Joshi A, Kale S, Chandel S, Pal DK. Likert scale: explored and explained. Curr J Appl Sci Technol. 2015;7(4):396–403.
 10.
Twenge JM, Joiner TE. U.S. Census Bureauassessed prevalence of anxiety and depressive symptoms in 2019 and during the 2020 COVID19 pandemic. Depress Anxiety. 2020;37(10):954–6.
 11.
Buuren SV, GroothuisOudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw. 2011;45:1–67.
 12.
Allison PD. Multiple imputation for missing data: a cautionary tale. Sociol Methods Res. 2000;28(3):301–9.
 13.
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: Data mining, inference, and prediction. 2ed ed. New York: Springer Science & Business Media; 2009. 244 p.
 14.
Krstajic D, Buturovic LJ, Leahy DE, Thomas S. Crossvalidation pitfalls when selecting and assessing regression and classification models. J Cheminform. 2014;6(1):1–15.
 15.
Yi GY. Statistical Analysis with Measurement Error or Misclassification: Strategy, Method and Application. New York: Springer; 2017.
 16.
Little RJ, Rubin DB. Statistical Analysis with Missing Data. 2nd ed. Hoboken: Wiley; 2002.
 17.
Cui J, Lu J, Weng Y. R code for COVID 19 Impact on mental health over time [Internet]. GitHub; 2021 [updated 2021 Sep 9; cited 2021 Sep 9]. Available from: https://github.com/JingyuCui639/RcodeforCOVID19ImpactonMentalHealthoverTime
Acknowledgements
The authors thank the reviewers for their helpful comments on the initial submission. The research was partially supported by the grants of the Discovery Grants Program and the Emerging Infectious Disease Modeling Program from the Natural Sciences and Engineering Research Council of Canada. Yi is Canada Research Chair in Data Science (Tier 1). Her research was undertaken, in part, thanks to funding from the Canada Research Chairs program. The grants provide support to JC, JL and YW to conduct the study.
Code availability
All the computation in this study is conducted in R (version 3.6.1) and the R code is posted in GitHub at: https://github.com/JingyuCui639/RcodeforCOVID19ImpactonMentalHealthoverTime [17].
Funding
The research was partially supported by the grants of the Discovery Grants Program and the Emerging Infectious Disease Modeling Program from the Natural Sciences and Engineering Research Council of Canada. Yi is Canada Research Chair in Data Science (Tier 1). Her research was undertaken, in part, thanks to funding from the Canada Research Chairs program. The grants provide support to JC, JL and YW to conduct the analyses.
Author information
Affiliations
Contributions
JC leads the project; YW identifies the data; JC, JL, and YW jointly analyze the data and prepare the initial draft. Professors WH and GYY offer ideas and discussions for the project; GYY writes the manuscript. All authors have read and approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Cui, J., Lu, J., Weng, Y. et al. COVID19 impact on mental health. BMC Med Res Methodol 22, 15 (2022). https://doi.org/10.1186/s1287402101411w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1287402101411w
Keywords
 COVID19
 Lasso
 logistic regression
 mental health
 missing data
 multiple imputation
 survey data