 Research
 Open access
 Published:
Negative logbinomial model with optimal robust variance to estimate the prevalence ratio, in crosssectional population studies
BMC Medical Research Methodology volume 23, Article number: 219 (2023)
Abstract
Background
Crosssectional studies are useful for the estimation of prevalence of a particular event with concerns in specific populations, as in the case of diseases or other public health interests. Most of these studies have been carried out with binary binomial logistic regression model which estimates OR values that could be overestimated due to the adjustment of the model. Thus, the selection of the best multivariate model for crosssectional studies is a priority to control the overestimation of the associations.
Methods
We compared the precision of the estimates of the prevalence ratio (PR) of the negative Logbinomial model (NLB) with Mantel–Haenszel (MH) and the regression models Cox, LogPoisson, Logbinomial, and the OR of the binary logistic regression in populationbased crosssectional studies. The prevalence from a previous crosssectional study carried out in Colombia about the association of mental health disorders with the consumption of psychoactive substances (e.g., cocaine, marijuana, cigarette, alcohol and risk of consumption of psychoactive substances) were used. The precision of the point estimates of the PR was evaluated for the NLB model with robust variance estimates, controlled with confounding variables, and confidence interval of 95%.
Results
The NLB model adjusted with robust variance showed accuracy in the measurements of crude PRs, standard errors of estimate and its corresponding confidence intervals (95%CI) as well as a high precision of the PR estimate and standard errors of estimate after the adjustment of the model by grouped age compared with the MH PR estimate.
Obtained PRs and 95%CI entre NLB y MH were: cocaine consumption (2.931,IC95%: 0.723–11.889 vs. 2.913, IC95%: 0.786–12.845), marijuana consumption (3.444, IC95%: 1.856–6.391 vs. 3.407, IC95%: 1.848, 6.281), cigarette smoking (2.175,IC95%: 1.493, 3.167 vs. 2.209, IC95%: 1.518–3.214), alcohol consumption (1.243,IC95%: 1.158–1.334 vs. 1.241, IC95%: 1.157–1.332), and risk of consumption of psychoactive substances (1.086, IC95%: 1.047–1.127 vs. 1.086, IC95%: 1.047, 1.126). The NLB model adjusted with robust variance showed mayor precision when increasing the prevalence, then the other models with robust variance with respect to MH.
Conclusions
The NLB model with robust variance was shown as a powerful strategy for the estimation of PRs for crosssectional populationbased studies, as high precision levels were identified for point estimators, standard errors of estimate and its corresponding confidence intervals, after the adjustment of confounding variables. In addition, it does not represent convergence issues for high prevalence cases (as it occur with the Logbinomial model) and could be considered in cases of overdispersion and with greater precision and goodness of fit than the other models with robust variance, as it was shown with the data set of the crosssectional study used in here.
Background
The objective of crosssectional populationbased research is to estimate prevalence of diseases or other events of interest for the human public health such as mortality, rehospitalization, quality of life, psychoactive substances consumption, among some others, and their associated (exposure) factors of these events of interest. Crosssectional studies at the population level are usually designed with probabilistic samples with random selection, inferred or representative of the study population giving as a result unbiased, efficient, consistent, and sufficient estimators of prevalence and its corresponding 95% confidence intervals, with the estimation of standard and relative standard errors as precision indicators of the point estimates of the prevalence of the events of interest [1, 2].
In addition, in the crosssectional studies as there is a construction of explanatory factors of the disease or the event, it should be an adjustment of the confounding factors that could lead to misinterpretations of the results if it is not well adjusted, considering individual factors but also its possible interactions among the variables of the model [3, 4]. For crosssectional studies, binary logistic regression model has been used as a common and frequent strategy for the estimation of odds ratio (OR) and 95%CI as a measurement for the association of the outcome (disease or event) with potential explanatory variables. However, OR value overestimates the associations regarding to the prevalence ratios (PR = p1/p2) estimates, with an increased bias in the case of diseases or events with high prevalence [3,4,5].
Since 2003, Cox constant time model, LogPoisson model, and Logbinomial regression model were proposed for controlling the bias of the overestimation of OR values in the case of binary logistic regression models, giving as a result good accuracies for the estimation of prevalence ratios (PR), although it should be evaluated the mathematical assumptions to validate these models [3, 4]. In the Poisson model one of the assumptions is the equality between the expected value and the estimated variance, in which is frequent to obtain greater variance values compared with the expected values, generating overdispersion of the data (extraPoisson variance) in the model [3]. The overdispersion of the data leads to the underestimation of the standard error coefficients, deriving significant associations of explanatory factors of the event of interest which, do not exist [6]. In the case of the Logbinomial regression model, it has convergence issues when comparing numerical covariables as well as in the cases of high prevalence of the diseases or events [3, 4]. Finally, in the Cox model it is frequent the nonfulfillment of the assumption of proportional risks in the population at a time t of the observation, which is adjusted considering a constant time for the estimation of the prevalence ratios in the crosssectional studies [3, 4].
Thus, although the abovementioned models have been used in the research field for the estimation of PRs in crosssectional studies, there are still some issues in the models that could be improved. The negative Logbinomial regression model (NLB) has shown high mathematical consistency in the application of longitudinal cohort analytical studies and has been used in cases of overdispersion of the Poisson model [6,7,8,9,10]. NLB is proposed in this study as a novel generalized linear model for the estimation of prevalence ratios (PR), as a measure of association and control for confounding categorical and numerical variables in crosssectional populationbased studies as a strategy for controlling the bias obtained in unconditional binary logistic regression models due to the overestimation of OR values. In this study, a comparison of the NLB model with the Mantel–Haenszel (MH) stratification method and the three current models for the estimation of PRs is proposed, using a previous study carried out in Colombia about the consumption of psychoactive substances and mental disorders as a precision and accuracy indicator for the estimates of the PR using the NLB model.
Methods
Study description
The data used to compare the point and interval estimates of the models of the present study were taken from a previous crosssectional populationbased study, with the specific objective of estimating the prevalence of psychoactive substances consumption and its associated factors in a population of 140,000 workers in Colombia, in which a stratified random probability sample of 5810 workers was selected.
The outcomes or events of interest in this study were: lifetime cocaine use, marijuana consumption in the last year, current cigarette consumption, lifetime alcohol consumption risk of consumption of psychoactive substances. The association factor used in the model was depression measured with the Zung test and grouped ages were used as a confounding variable with the following categories: 1) under 25, 2) between 26 and 29, 3) between 30 and 34, and 5) older than and equal to 35 years. It was used as a numerical variable as well.
Negative Logbinomial regression model (NLB)
For the use of the negative Logbinomial regression model, it is important to keep in mind that the probability distribution of the negative binomial discrete random variable X measures the number of trials necessary to obtain rsuccesses, with independent trials and its parameters are r (total number of successes) and p which is the probability of success, therefore the probability of failure is q = 1p and the negative binomial probability distribution is as follows (Eq. 1):
This Poissonderived distribution, adjusted for overdispersion, has an expected value different from the variance, as shown below (Eqs. 2 and 3):
In the construction of the generalized linear model (GLM) of the proposed negative logbinomial model, the three components of the GLM were taken into account: in the random component, the random variable of the dependent variable of the negative binomial model (Y), with exponential family distribution is measured in counts occurring at a time t and is also used in continuous and dichotomous variables evaluating the mathematical assumptions of the model (linearity of the model parameters and independence of the observations of the study subjects), therefore it is applicable in crosssectional studies in the estimation of the prevalence of the disease or dichotomous variables of crosssectional studies. The systematic components of the model which are the explanatory variables or associated factors Xi, numerical or categorical variables with their respective estimators (β_{i}), standard errors and confidence intervals, which are used for the construction of the associative models of the crosssectional studies. Finally, for the link function, in this case was the logarithm per se, which was proposed the name of "negative logbinomial" model for crosssectional studies.
Maximum likelihood was used as the estimation method of the NLB and to compare iteratively reweighted least squares (IRLS), with Fisher scoring, Newton–Raphson and hybrid iterative optimization methods for convergence tolerances estimations or epsilon (ε) < 0.000001 or 1e6 (ε > 0).
NLB model is derived from a compound Poisson distribution with fitted Gamma distribution [8], with the log link g(μ) = ln(μ)=Xi βi, vi=1/α, then Yi⁄Xi =BN(1/α,1/(1+αμi) )as follows (Eq. 4):
In the negative logbinomial model taking as independent variable dummy X_{i}, with the values 0 and 1 (k = 0 and k = 1), the incidence rate ratio (IRR) of the binomial model was taken as the relative risk in analytical cohort studies and thus as the prevalence ratio (PR) in crosssectional studies, as follows (Eqs. 5 and 6) [8,9,10]:
With k = 1, in crosssectional studies, the RP estimator as follow:
The inherent bias of the OR versus the PR
The inherent bias of the OR versus the PR for the NLB model was calculated to estimate the PRs for the five outcomes for the Colombian study.
In analytical epidemiological studies, it has been shown that the odds ratio (OR) calculation is an accurate estimator of the relative risk (RR) for the cases in which the prevalence of the disease is small (p < 10%) and therefore for the prevalence ratio in crosssectional studies [5,6,7,8,9,10]. The PR is measured as the proportion of the prevalence of individuals with disease exposed to a specific factor over the proportion of the prevalence of individuals with disease without exposure (PR = p1/p2). The odds ratio is defined as a ratio of odds (odds = prevalence/(1prevalence) = p/q = p/1p) [7] and the calculation of the OR in crosssectional studies is the odds of disease in exposed compared to the odds of disease in unexposed [7].
In a crosssectional study the prevalence of the disease are taken, in the category of exposure p_{1} and in the category without exposure p_{2}, when evaluating the association with higher prevalence in the exposed than in the nonexposed (risk), it could be seen that \({p}_{2}<{p}_{1}\), thus \({(1p}_{2})> {(1p}_{1})\) and in the case of protector factor, \({p}_{1}<{p}_{2}\), thus \({(1p}_{2})< {(1p}_{1})\) , which is the inherent bias of OR versus PR, as shown in Eqs. 7 and 8 [7].
This inherent bias is controlled when the prevalence \({p}_{1}\) and \({p}_{2}\) belong to small values (p < 10%) and increases with higher prevalence of the disease or event of interest [7].
Estimator precision and confusion equations
Crude prevalence ratios (PR) and standard errors of estimation were considered as the reference measures of the associations (2 × 2 tables) in conjunction with the Mantel–Haenszel (MH) stratification method to control confounding variables.
The precision of the PR estimates of the regression models was measured compared with PR MH reference standard, as described in equation
The indicator of the percentage of confounding effect between the association of depression with the outcomes of legal and illegal psychoactive substances consumption was also measured controlling for the grouped age confounding factor, for both crude PR of the NLB and the other three models, and the reference PR value of MH model as indicated in Eq. 10.
This equation is also used to compare the standard error of estimation of the models with respect to the standard error of MH.
The PR, standard errors of estimation and 95% confidence intervals with and without robust variance adjustment of the NLB model were compared with the three models (Cox timeconstant, LogPoisson and Log binomial) and with the unconditional logistic regression model with and without robust variance adjusted by age, with both grouped and numerical variable. Estimations were performed in STATA 15.0 [11] and SPSS version 25.0 [12]. The BIC Bayesian criterion is also used to select the best model.
Results
Crosssectional study description
In the crosssectional study of mental health and psychoactive substance consumption, a probabilistic, stratified random sample with proportional allocation was designed in 5810 workers in Colombia. The age of the workers varied between 18 and 56 years, with an average of 28.2 ± 7.1 years (median = 27.0 years) and age groups of ≤ 25 years (41.9%), 26 to 29 years (24.0%), 30 to 34 years (14.0%) and 35 and over (20.2%), with a predominance of male gender (93.4%), single marital status (48.8%), followed by married (30.0%).
In this crosssectional study the consumption of psychoactive substances was measured, estimating a lifetime prevalence of cocaine consumption of 1.8% CI 95% (1.4% 2.1%), prevalence of marijuana consumption of 9.6% CI 95%:(8.8% 10.3%), prevalence of cigarette consumption of 21.3% CI 95%: (20.1% 22.4%), lifetime prevalence of alcohol intake of 85.7% CI 95%:(84.8.0% 86.6%) and risk of consumption of psychoactive substances of 96.1% CI 95%: (95.6% 96.6%).
The inherent bias of crude OR versus crude PR was increased the overestimation by higher prevalence of consumptions; for cocaine 1.2%, marijuana 7.9%, cigarette 17.5%, alcohol 133.3% and risk of consumption of psychoactive substances 233.3%, the overestimation of the association of the OR versus the PR being very high in the last two cases due to their high prevalence. The OR and PR estimators of association were different in the 5 outcomes, the overestimation of the OR being greater as the prevalence increases (Table 1).
Significant associations were identified between depressive symptoms with marijuana, cigarette, alcohol use and risk of consumption of psychoactive substances and close to significant differences with cocaine consumption, in the bivariate and multivariate analysis with MH (Table 1 y 2).
The PR, standard errors and 95% CI of the NLB model with robust variance were exactly equal to the raw values (Table 1). In the Cox and Poisson models with robust variance, the same results as NLB were found, although in the binomial regression the difference was in the risk of consuming psychoactive substances, which did not show convergence.
Comparison of the estimation of the PR and its precision with the current models: Cox model with constant time, LogPoisson, and Logbinomial
Comparison of the estimation of the PR and its precision of the Cox models with constant time, LogPoisson, and Logbinomial with adjustment for confounding variables was included considering MH as reference. The results of the association of depressive symptoms with psychoactive substance consumption, controlled for the confounding factor of worker age, were like those obtained in the bivariate analysis, when controlling for age, adjusting for a confounding effect, the confounding effects of the models with respect to MH were similar (Table 2).
The confounding control of the association of depressive symptoms with psychoactive substance use, controlling for age groups, was performed taking as reference the PR of the Mantel and Haenszel (MH) stratification method, comparing with the estimation of the PR of the three models, the differences were less than 1% in the five outcomes, using Eq. 9. The standard errors of estimation of the models showed greater precision when adjusted for robust variance in the three models, showing similar 95% confidence intervals in the five outcomes of psychoactive substance consumption in the three models compared to the 95% CIs using the MH stratification method (Table 2).
Estimation and precision of the PR with the negative Logbinomial model
The estimated prevalence ratio of the association between depressive symptoms with the different prevalence of psychoactive substance consumption in the study with the negative logbinomial model with and without robust variance adjustment were equal to the crude PR (Table 1).
The prevalence ratio of the association of depressive symptoms controlling for confounding variable by grouped age with the five outcomes showed very high precision with respect to the MH PR, measured with the Eq. 10, with very small percentage of difference of the NLB PR respecting to MH method, being for lifetime cocaine consumption 0.61%, for marijuana 1.07%, for cigarette 1.56%, for lifetime alcohol 0.16% and Risk of consumption of psychoactive substances 0%.
The standard errors of estimation of the NLB model showed greater precision when adjusted with robust variance, showing similar standard errors and 95% confidence intervals for the five outcomes, compared with 95% CIs using the MH stratification method. Using the Bayesian BIC indicator, the NLB showed to be the best model with smaller values than the other models with robust variance. (Fig. 1, Table 2).
The standard errors of PR estimation showed high precision in the robust Cox/Poisson and NLB models with respect to MH, for the 5 consumption outcomes (a small difference in cocaine consumption), with the robust binomial model. showed very large differences in prevalence (85.7% and 96.1%), the standard error of estimation being higher in alcohol consumption and no convergence was found to calculate the risk of consumption of psychoactive substances estimator (Fig. 1).
In the estimation of the prevalence ratio of the association of depressive symptoms with the five psychoactive substance consumption outcomes, controlling for confounding variable by age measured with a numerical scale with robust variance adjustment, the estimated PR,
the standard errors of estimation and its 95% CI were very similar among the de Cox/Poisson and NLB and in the prevalence of alcohol consumption, which was high (85.7%), the robust log binomial model showed lower precision with a higher standard error of estimate compared to the other models with robust variance and in the highest prevalence of the 5 outcomes (96.1%), no convergence was found to calculate the estimators, neither when age was adjusted categorically, nor numerically (Tables 2 and 3). being higher as the prevalence increases in the psychoactive substance outcomes of the study (Table 3, Fig. 2).
Discussion
The unconditional binary logistic regression model has been used for the construction of the explanatory associative models of the events of interest in crosssectional studies of different epidemiological investigations [3,4,5]. However, this model estimates ORs generating overestimation of the PRs, this bias being greater as the prevalence of the disease or event of interest increases, as was shown in this study in the association between depressive symptoms and the five prevalence of psychoactive substance consumption and risk. The prevalence of psychoactive substance consumption ranged from 1.8% to 96.1%, showing an increment in the inherent bias in the association between depressive symptoms and psychoactive substance consumption as the prevalence of intake increased.
In estimated prevalence less than 10%, it is expected that the inherent overestimation bias of OR versus PR was minimal in the cases of smaller prevalence of the disease [6]. In this study it was shown with the prevalence of cocaine consumption which was 1.8%, which was associated with depressive symptoms, the estimates were very similar between OR and PR (OR = 3. 337, 95%CI: 0.819,13.600 vs. PR = 3.294, 95%CI: 0.815 vs. 13.315), and when controlling for confounding factor by grouped age with the NLB model with robust variance and with MH (OR = 2.965, 95%CI: 0.725, 12.121 vs. PR = 2.931, 95%CI: 0.723, 11.889) and was very accurate controlling for numerical age as shown in Table 3.
In studies since 2003 and 2008, several explanatory models were proposed for the estimation of the PR and solved the overestimation of the OR of binary logistic regression models [3, 4]. Among the analyzed models, Cox constant time model of regression for proportional bias and two generalized linear models (Poisson regression and Logbinomial regression) were proposed by the authors using applied research examples carried out in STATA version 7.0 and 9.0. Thus, in order to compare our proposed method of NLB, we included these three previous models for the estimation of PRs an 95% CIs with and without robust variance, as a validation for our results with the NLB model. In previous studies, Cox constant time model, LogPoisson regression, and Logbinomial regression were used to estimate PRs, giving the same results as those obtained for the crude PR and 95% CI in 2 × 2 tables and good accuracy in controlling for confounding variables when compared to PRs obtained with the MH stratification method [3, 4].
In this study we found concordant results with the 2003 and 2008 studies, in the crude estimates and in the control of confounding variables with the PR and their 95% CIs, as well as in the respective standard errors, in which the joint conclusion of these studies is that Cox constant time model and LogPoisson regression with robust variance give as a result really accurate estimates for the PRs and its standard errors for the estimations in crosssectional studies [3, 4, 13,14,15,16,17], whereas the Log binomial model without robust variance adjustment showed accurate estimates at intermediate and even small prevalence [17,18,19], as it was visualized with our results for the case of cocaine and marijuana consumption. However, it is important to highlight that for the case of the Logbinomial model adjusted with robust variance in small prevalence < 10%, such as for cocaine and marijuana consumption, the standard errors were larger than those without adjustment, in contrast for what it was expected with the adjustment of robust variance for the three models, including the proposed negative log binomial model additionally, this model showed greater precision with lower estimation standard errors with respect to MH, than the other models with robust variance.
In the previous studies, issues related with the estimations and mathematical models were identified for the three methods. In the Logbinomial generalized linear model, due to nonconvergence in the outcomes with very high prevalence, outcomes in the estimates of the PR and in its standard errors for this GLM model are performed with the maximum likelihood estimation method using iterative methods to reach estimates with convergence tolerances with epsilon (ε) less than 0.000001. In our study, the log binomial model with robust variance estimated a very high standard error compared to high prevalence such as alcohol, compared to the Cox/Poisson and NLB models with robust variance, and without convergence to obtain the PR estimators, for the risk of consumption of psychoactive substances, which has a high prevalence of 96.1%.Also, the nonconvergence in the numerical confounding variables was solved, controlled with the abovementioned estimation and optimization methods, available in the statistical software such as STATA version 15 and later, in high prevalence as in our study that found convergence for the prevalence of alcohol of 85.7%, although this was not the case for very high prevalence, such as the risk of consumption of psychoactive substances of 96.1%, which did not generate convergence to obtain the estimates of the PR [11].
For the overdispersion (extraPoisson variance), the adjustment with robust variance proposed by Lin and Wei [18] was the alternative for the issues in the Poisson model for the crosssectional studies as it was shown previously [3, 4]. However, even after the adjustment of the estimates of the Poisson model with the robust variance, it still shows low efficiency with a higher sampling variability than that required for the model estimators [19]. In our study it was observed that using the LogPoisson model with robust variance it was found an adjustment in the accuracy of the standard errors and therefore to the 95% CI, finding the same results as in the Cox constant time regression model with robust variance adjustment, which were very concordant with the NLB and logbinomial model with robust variance for the estimation of PRs in prevalence that are not high, in crosssectional studies.
The negative Logbinomial (NLB) regression model, which was the GLM proposed model in this study, showed accuracy with respect to the crude values of PR, standard errors and 95% CI, with the adjustment of robust variance for the standard errors’ estimations. In the case of adjustment and control of confounding variable of grouped age, results showed a very high precision with those obtained with the MH stratification method, as well as for the estimations of PR, standard errors and 95% CI with the Cox constant time model, LogPoisson, and Logbinomial regression models when adjusted with robust variance. Very concordant results were obtained for the associations between depressive symptoms with the five psychoactive substances consumption, controlling for the numerical age, for the estimates obtained with the three available models with the adjustment for robust variance.
The NLB model with robust variance adjustment is optimal in epidemiological crosssectional studies through surveys and in some cases with clinical approaches, to estimate the prevalence of a particular disease or event of interest and its associated factors through the multivariate NLB model with robust variance adjustment, without bias of overestimation in the associations and with high accuracy for the estimates. It allows the construction of associative models to identify the groups with the highest risk of the disease or event of interest within a population. The use of NLB model could also be applied for the development of prevention and control programs for specific conditions, and in the future to decrease the prevalence of those particular conditions. Also, it could impact policy makers and decisionmaking for the control or followup of the conditions of interest within a population with high prevalence values.
Conclusions
The negative logbinomial generalized linear model with robust variance is an optimal multivariate model for the construction of explanatory associated factors of disease or binary events of interest in crosssectional studies, generating estimates with very high precision of the prevalence ratio, standard errors of estimation and confidence intervals, when adjusting for categorical and numerical confounding variables. This NLB model is mathematically constructed to identify the variance that could not be explained by the Poisson model in cases of overdispersion, and thus, NLB model is proposed as an alternative for those cases of overdispersion. Finally, the NLB model does not present convergence issues in the estimates of the PR and the standard errors of PR estimation, in large or small prevalence in crosssectional studies. NLB model is proposed as a novel alternative for the analyses of prevalence ratios in crosssectional studies, independent of high or low prevalence of the disease or the event of concern and with greater precision than the other models with robust variance with respect to MH and with the BIC Bayesian indicator, as it was shown in the current data set of this crosssectional study.
Availability of data and materials
The data of the database used is available under request to the authors (basefinal.sav) due to confidentiality issues.
Abbreviations
 CI:

Confidence interval
 GLM:

Generalized linear model.
 IRLS:

Iteratively reweighted least squares
 IRR:

Incidence rate ratio
 MH:

Mantel Haenszel stratification model
 NLB:

Negative Logbinomial regression model
 OR:

Odds ratio
 PR:

Prevalence ratio
References
VelascoMondragón HE, Hernández B. Encuestas transversales Rev Cubana Hig Epidemiol. 2007;45:447–55.
Ibáñez PM. Mentefactos conceptuales como estrategia didácticopedagógica de los conceptos básicos de la teoría de muestreo aplicados en investigación en salud. Rev Ciencias la Salud. 2006;4:62–72.
Barros AJ, Hirakata VN. Alternatives for logistic regression in crosssectional studies: an empirical comparison of models that directly estimate the prevalence ratio. BMC Med Res Methodol. 2003;3:21.
Coutinho LMS, Scazufca M, Menezes PR. Methods for estimating prevalence ratios in crosssectional studies. Rev Saude Publica. 2008;42:992–8.
Thompson ML, Myers JE, Kriebel D. Prevalence odds ratio or prevalence ratio in the analysis of cross sectional data: what is to be done? Occup Environ Med. 1998;55:272–7.
Navarro A, Utzet F, Puig P, Caminal J, Martín M. La distribución binomial negativa frente a la de Poisson en el análisis de fenómenos recurrentes. Gac Sanit. 2001;15:447–52.
Szklo M, Nieto FJ. Epidemiología intermedia: conceptos y aplicaciones. Madrid  España: Díaz de Santos; 2003.
Vittinghoff E, Glidden DV, Shiboski SC, McCulloch CE. Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. Boston, MA: Springer, US; 2012.
AlcaideDelgado M. Modelo de regresión binomial negativa. Facultad de matemáticas Departamento de estadística e investigación operativa: Universidad de Sevilla; 2015.
Lawless JF. Negative Binomial and Mixed Poisson Regression. Can J Stat / La Rev Can Stat. 1987;15:209–25.
StataCorp. Stata statistical software: release 15. College Station: StataCorp LLC; 2017.
IBM Corp. Released 2017. IBM SPSS Statistics for Windows, Version 25.0. Armonk: IBM Corp.
Kleinbaum DG, Klein M. Survival Analysis: A SelfLearning Text. 3rd edition. New York, NY: Springer New York; 2012.
Huber PJ. The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Statistics. 1967;1(1):221–33.
Wacholder S. Binomial regression in glim: estimating risk ratios and risk differences. Am J Epidemiol. 1986;123:174–84.
Hilbe JM. Modeling Count Data. Cambridge: Cambridge University Press; 2014.
Petersen MR, Deddens JA. A comparison of two methods for estimating prevalence ratios. BMC Med Res Methodol. 2008;8:9.
Lin DY, Wei LJ. The robust Inference for the Cox Proportional Hazards Model. J Am Stat Assoc. 1988;84:1074–8.
Allison P. Logistic regression using the SAS system: theory and application. Cary (NC: SAS Institute; 2001.
Acknowledgements
Special acknowledgements to the researchers in epidemiology and public health from Colombia.
Funding
This study was carried out with own resources.
Author information
Authors and Affiliations
Contributions
MIP, SVN and NOG participated in the study design. MIP processed and analyzed the results of the investigation of the study models and wrote the article. SVN and NOG participated in the construction of the article, in the materials and methods, and in the discussion of the results. MIP and NOG wrote and revised the manuscript. All authors read and approved the final document.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Data from a secondary source of a study conducted in Colombia were used, keeping confidentiality. Study was approved by the Ethics Committee of the Colombian National Police for the study “Segundo estudio de Salud Mental en trabajadores de la Policía Nacional, 2010–2011”. Informed consent and confidentiality letter was signed by the participants for the fulfilment of the study. This study did not represent risk for the participants and was aligned with Helsinki principles.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
IbáñezPinilla, M., VillalbaNiño, S. & OlayaGalán, N.N. Negative logbinomial model with optimal robust variance to estimate the prevalence ratio, in crosssectional population studies. BMC Med Res Methodol 23, 219 (2023). https://doi.org/10.1186/s12874023019991
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874023019991