A major advantage of the odds ratio is that it can be estimated for all study types. However, investigators should avoid interpreting odds ratios as an approximation to prevalence ratios when the prevalence of the event of interest is high (greater than 10%). In such situations, the odds ratio generally overestimates the prevalence ratio. The importance of differences in the interpretation of the OR compared to PR/RR, particularly when prevalence is high, has been discussed by others [3, 11, 16].
If the adjusted prevalence ratio (PR) is the measure of interest, logistic regression is one of the approaches that can be used for its estimation [10, 16]. However, the choice of standardization procedure may affect the point estimates and, most importantly, its interpretation. To our knowledge, there are few reports discussing implications of the choice of standardization for the interpretation of PR in the context of logistic regression . The most recent effort to discuss this issue was done by Localio and colleagues (2007), in which the standardization procedure is linked to the question of interest. In contrast to OR, which is computed regardless of the values of other covariates, the calculation of PR using logistic regression is dependent on the fixed levels of covariates included in the model. Thus, a clear interpretation of PR depends on the definition of the reference values used on the computational procedure.
There is also no consensus about the the best way to interpret regression coefficients in the the context of random effects models. Some authors interpret the fixed regression coefficients similarly to the usual logistic regression model, conditioning on the random effects [27–29]. When modeling explicitly the source of heterogeneity in the logistic regression with random effects, the fixed regression parameters should be interpreted as effects of covariates on a typical subject in the study [30, 44]. Thus, as an illustration using our application regarding impact of ivermectin in the prevalence of Trichuris infection, the estimated PR using logistic model with random effects represents the ratio of the probability of a given child having Trichuris infection if he/she receives ivermectin compared to the probability that the same child having Trichuris infection if he/she does not receive treatment. In this way the PR is adjusted for unobserved individual characteristics.
Alternatively, population-averaged estimates for the regression coefficients can be obtained using approximate formulae as suggested by Zeger and colleagues (1988), which can be interpreted in terms of the response averaged over the population . In some situations, however, the subject specific interpretation is of more interest than its average effect on a population as a whole . Another approach was proposed by Larsen and colleagues (2000), who discussed the interpretation of both fixed and random effects parameters in the context of logistic regression with random effects . They proposed a measure for the fixed effect called median odds ratio (MOR) in order to take into account the fact that, in practice, the procedure of conditioning in the random effects is unrealistic because the random effects are unobservable.
The confidence intervals for prevalence ratio using logistic regression should be defined using appropriate approaches, such as delta and bootstrap methods. Other methods discussed in the literature, such as the substitution method , have been shown to have theoretical limitations leading to unsatisfactory statistical performance [10, 16]. The use of delta and bootstrap methods have been discussed in the literature for situations where the observations are uncorrelated. In such cases, the performance of these methods seems to be equivalent.
Other model-based approaches that have been commonly used to estimate PR are the Poisson and log-binomial models [3, 10, 16–18]. The main advantage of these methods is the direct estimation of PR and its confidence intervals . At the same time, both models can present estimation problems due to restrictions to avoid predicting probabilities out of interval [0,1]. When this happens, the model does not converge. There has been no consensus about the best model-based approach for estimating PR. Barros and Hirakata (2003) suggested that more than one modeling strategy should be used to evaluate the robustness of the results. A shortcoming of this strategy is that different models imply different relationships between the outcome and covariates, even when the same covariates are included in the model. Furthermore, identification of interaction effects may differ across models.
All previous discussions about the estimation of PRs has been done in the context of independent observations. In this paper we have extended this discussion to include clustered design studies, in which the dependence between observations is taken into account. We used random effects logistic models to deal with intracluster correlation. We evaluated the performance of methods for defining confidence intervals through simulation studies with several levels of correlation between observations in the same cluster. For the scenarios considered here the delta method outperformed the clustered bootstrap method when there are data for a small number of clusters. However, for situations where size and number of clusters are large, they show equivalent performance. We also noticed a poorer performance of the Poisson model with random effects, especially with increasing level of clustering and number of clusters, and there were problems with convergence when the number of clusters was small.