The Brazilian Multicenter Study on Preterm Birth consisted of a multicenter cross-sectional study plus a nested case–control study to assess their associated factors implemented in referral obstetrical units (clusters) from several states of the country. The full research proposal has already been published elsewhere [11].
A single-stage cluster sampling was used. Clusters were selected by an invitation to 27 healthcare institutions that build a national network called Brazilian Network for Studies on Reproductive and Perinatal Health. They are located in the five geographical regions of the country, almost all of them are public institutions, and all of them receive both low and high risk pregnant women. Initially 26 centers accepted to participate, but 20 selected institutions were able to fully take part in the study.
The sample size was calculated using the official prevalence of preterm births in Brazil of around 6.5% [12]. Considering an acceptable absolute difference of about 0.25% between the sample and the population prevalence, and a type I error of 5%, initial surveillance of a sample size of 37,000 deliveries was necessary. For the case–control study component, the estimated sample size was 1,055 women in each group (cases and controls). The total number of preterm births estimated to be followed in both components of the study was around 3,600.
The participating centers performed a prospective surveillance of all patients admitted to give birth in order to identify preterm births. For this purpose and according to standard international definitions, preterm birth was considered that occurring before 37 completed weeks of gestational age evaluated by an ultrasound scan performed early in pregnancy, by a known date of the last menstrual period, or alternatively by the evaluation of the somatic age of the newborn. During the first months of the study, in order to complete the sample for the appropriate analysis of the factors associated with spontaneous preterm birth, a random sample of women who had full-term birth was also selected.
Data was collected during six to twelve months for each center, from April 2011 to March 2012, in a detailed form called “Questionnaire” including 306 variables from four sources: interview with women in the postpartum period, medical records and prenatal chart of the mother (before hospital discharge), and newborn medical records (within sixty days after birth, even if it remained in hospital for longer period). An electronic system of data entry called OpenClinica® was selected and a proper clinical research form (CRF) was designed for the input of data after the questionnaire of each case was completed and reviewed.
High quality data and reliable information was guaranteed by several steps: preparatory meetings, development of detailed manuals of operation, monitoring technical site visits to the centers, close monitoring of data collection and data entry, concurrent query management, checking for logical inconsistencies, and correction of database. The research proposal was firstly approved by the Institutional Review Board of the coordinating center and then confirmed by IRB of each other participating center.
Data analysis
In this study, each of the 20 participating centers (hospital) was considered a primary sampling unit (PSU) and there was no stratification of the PSU or weighting of the data. The subject (unit of analysis) was woman who delivered preterm (case) or at term (control).
Estimated prevalence (categorical variables) or means (continuous numeric variables), intracluster correlation coefficients (ICC), their respective 95% confidence intervals (CI), design effects (Deff) and mean cluster size of each variable were calculated. Software programs used for analysis were SPSS® version 20.0 [13] and Stata version 7.0 [14], taking into consideration the cluster sampling plan (centers) for data analysis.
According to Kish [2], ICC (Roh) is: ρ = (s
2
a
− s
2
b
/b)/sˆ2, where s
2
a
is the variance between clusters; s
2
b
is the variance within clusters, b is the size of clusters and sˆ2 is the estimate of S
2 (variance in individual level). The estimate sˆ2 is obtained by: sˆ2 = s
2
a
+ [(b − 1)/b]s
2
b
. Stata’s equivalent computing formula for ICC [14] is: ICC = [(F − 1)a/n]/1 + (F − 1)a/n, where ‘F’ is the Snedecor’s F-value from the ANOVA table and ‘a’ is the number of groups. The variance estimate for ICC is obtained by an extensive asymptotic formula and because this it was not showed.
For this study, the Design effect - DEFF [2] is Deff = varactual(r)/varSRS(r) = s
2
a/a/s
2/n) where varactual(r) is the estimated variance according to the complex design being studied and varSRS(r) is the variance in the estimator considering the design as if it were calculated using a SRS of the same size, n.