Estimation of gestational age in early pregnancy from crown-rump length when gestational age range is truncated: the case study of the INTERGROWTH-21st Project

Background Fetal ultrasound scanning is considered vital for routine antenatal care with first trimester scans recommended for accurate estimation of gestational age (GA). A reliable estimate of gestational age is key information underpinning clinical care and allows estimation of expected date of delivery. Fetal crown-rump length (CRL) is recommended over last menstrual period for estimating GA when measured in early pregnancy i.e. 9+0-13+6 weeks. Methods The INTERGROWTH-21st Project is the largest prospective study to collect data on CRL in geographically diverse populations and with a high level of quality control measures in place. We aim to develop a new gestational age estimation equation based on the crown-rump length (CRL) from women recruited between 9+0-13+6 weeks. The main statistical challenge is modelling data when the outcome variable (GA) is truncated at both ends, i.e. at 9 and 14 weeks. We explored three alternative statistical approaches to overcome the truncation of GA. To evaluate these strategies we generated a data set with no truncation of GA that was similar to the INTERGROWTH-21st Project CRL data, which we used to explore the performance of different methods of analysis of these data when we imposed truncation at 9 and 14 weeks of gestation. These 3 methods were first tested in a simulation based study using a previously published dating equation by Verburg et al. and evaluated how well each of them performed in relation to the model from which the data were generated. After evaluating the 3 approaches using simulated data based on the Verburg equations, the best approach will be applied to the INTERGROWTH-21st Project data to estimate GA from CRL. Results Results of these rather “ad hoc” statistical methods correspond very closely to the “real data” for Verburg, a data set that is similar to the INTERGROWTH-21st project CRL data set. Conclusions We are confident that we can use these approaches to get reliable estimates based on INTERGROWTH-21st Project CRL data. These approaches may be a solution to other truncation problems involving similar data though their application to other settings would need to be evaluated.


Background
Fetal ultrasound scanning is considered an essential part of routine antenatal care with first trimester scans recommended for confirming viability, accurate estimation of gestational age and determining the number of fetuses [1,2]. Fetal crown-rump length (CRL) is measured in early pregnancy primarily to determine the gestation age (GA) of a fetus and is most reliable between 9 +0 to 13 +6 weeks' gestation, but not beyond [3]. Assessment of gestational age based on ultrasound (US) biometry was first introduced in 1969 by Campbell [4], and it has become the preferred method for dating pregnancy.
A reliable estimate of gestational age is key information as it underpins clinical care and allows estimation of the expected date of delivery. There are 3 ways to estimate gestational age early in pregnancy: a) based on a reliable first day of the last menstrual period (LMP) alone; b) based on an early (9 +0 to 13 +6 weeks) ultrasound alone, or c) LMP and ultrasound combined. Use of LMP is based on the assumption that pregnancy has a constant duration from the first day of the LMP with ovulation on the 14 th day [3]. This method of dating pregnancies, even for women whose menstrual history is certain, has been shown to be unreliable [5,6]. Caution is recommended regarding use of last menstrual period (LMP) alone for dating because up to 50% of women are uncertain of their dates, have an irregular cycle, have recently stopped the oral contraceptive pill, are lactating or did not have a normal last menstrual period [7].
The National Institute for Health and Care Excellence (NICE) Guideline for Routine Antenatal Care (2008) and International Society of Ultrasound in Obstetrics and Gynaecology (ISUOG) recommend that all pregnant women should be offered an early US examination to date pregnancies [1,7,8]. It is stated that ideally this should be performed by the measurement of CRL between 10 and 13 +6 weeks which can reduce the need for induction of labour after 41 weeks of gestation. Although there is always a margin of error in US-based estimation [9], this error is relatively small compared to LMP-based estimations [8,10].
Many dating charts are now in use though developed from different populations resulting in discrepancies when compared or applied to a specified population hence there is a need for an international reference dating equation and chart [11][12][13][14][15]. The INTERGROWTH-21 st Project, described below, aims to generate fetal growth charts and also a new dating chart. In the study gestational age is based on the first day of LMP and corroborated by CRL using a known dating equation [16]. Therefore, only women between 9 +0 -13 +6 weeks gestation whose estimation by both methods agreed within 7 days were recruited into the fetal growth longitudinal study.
To develop charts of fetal size we need to model CRL as a function of GA while for dating we interchange the variables and model GA as a function of CRL. This latter analysis is problematic if the available data are constrained by a restricted range of GA [17]; such a restriction is commonly in place, as fetal curling prevents accurate measurement beyond 13 +6 weeks. In this paper we describe an exploration of strategies to overcome truncation of GA when developing equations and charts for dating pregnancies from CRL measurements.

Methods
The International Fetal and Newborn Growth Consortium for the 21 st Century (INTERGROWTH-21 st ) is a largescale, population-based, multi-centre project involving health institutions from eight geographically diverse countries (i.e. Brazil, China, India, Oman, Kenya, UK, USA and Italy), which aims to assess fetal, newborn and preterm growth under optimal conditions, in a manner similar to that adopted by the WHO Multicentre Growth Reference Study [18]. This approach is important in the creation of fetal growth standards by selecting women regarded as "healthy", educated, affluent and living in areas with minimal environmental constraints on growth [19].
The INTERGROWTH-21 st Project has three major components, which were designed to create: 1) Longitudinally derived, prescriptive, international, fetal growth standards using both clinical and ultrasound measures; 2) Preterm, postnatal growth standards for those infants born ≥26 +0 but <37 +0 weeks of gestation in the longitudinal cohort, and 3) Birth weight, newborn length, and head circumference for gestational age standards derived from all newborns delivering at the study sites over an approximately 12 month period [19]. To ensure that ultrasound measurements are accurate and reproducible, centres adopted uniform methods, used identical ultrasound equipment in all the study sites; adopted standardised methodology to take fetal measurements, and employed locally accredited ultra-sonographers who underwent standardisation training and monitoring.
One aim of the longitudinal study of the INTERGROWTH-21 st Project is to develop a new gestational age estimation equation based on the crown-rump length (CRL) from women recruited between 9 +0 -13 +6 weeks. This will be the largest prospective study to collect data on CRL in geographically diverse populations, and with a high level of quality control measures in place.
Several reliable statistical methods exist for developing age-related reference centiles [20][21][22]. These can be applied in a straightforward way for developing equations for fetal size as function of GA. For dating, however, we need to estimate GA as a function of fetal size, specifically the fetal CRL. We sought to use the INTERGROWTH-21 st data to develop centiles for the distribution of GA for CRL values between 15 mm and 100 mm. The statistical challenge is this: How can we model data when the outcome variable (GA) is truncated at both ends, i.e. at 9 and 14 weeks, given the need to obtain estimates in the truncated regions? This restriction is part of the design of the INTERGROWTH-21 st study based on the fact that CRL measurements are less reliable outside this range of GA [1,7,[23][24][25].
Ignoring the truncation of GA would lead to seriously biased estimates. We explored three alternative statistical approaches to overcome the truncation of GA. To evaluate these strategies we generated a data set with no truncation of GA that was similar to the INTERGROWTH-21 st Project CRL data, which we used to explore the performance of different methods of analysis of these data when we imposed truncation at 9 and 14 weeks of gestation. The choice of which approach is best is hard to justify through formal statistical testing, and is likely to depend on the specific data being analysed.

Statistical methods
Data were explored visually by a scatter plot of CRL by GA and vice versa. The relationship between GA and CRL is non-linear though the distribution of CRL is conditionally normal at any given gestational age. By contrast GA has a positively skewed distribution for a given CRL [17]. We applied fractional polynomial (FP) models (which are very flexible) to the data by fitting separate models to the mean and standard deviation (SD) of GA to account for increase in variance with greater CRL and gestation [20,22]. Using equations of the mean and standard deviation one can easily compute any desired centiles using the relation where K is the normal equivalent deviate (z score) corresponding to a particular centile, e.g. K = 1.88 for the 97 th centile and −1.88 for the 3 rd centile, and the SD in this equation are the predicted estimates from the regression analysis. Fitted curves (3 rd , 50 th , and 97 th centiles) from different models were assessed visually for a good fit and by comparing the deviances from each model. The choice of centiles presented was purely based on what is commonly reported in the literature and also used in clinical practice as standard centiles. In addition; the INTERGROWTH-21 st Project aims to complement the WHO-Multi-centre Growth Reference Study (MGRS) which produced reference standards for children aged 0-5 years where they also presented the 3 rd and 97 th centiles [18]. Goodness of fit was assessed by a scatter plot of the distribution of residuals in z scores by CRL and also by counting the number of observations below the 3 rd and above the 97 th centiles.
We explored three approaches to deal with truncation of gestational age at 9 and 14 weeks by (a) Simulation, Restriction and Extrapolation (b) Simulation (c) Inversion of model for predicting CRL from GA. Extrapolation was applied purely for the purposes of obtaining reliable estimates between 9 and 14 weeks in the presence of truncation at 9 weeks and 14 weeks. The resultant equation will not be used for dating beyond 14 weeks as this is not recommended in clinical practice. The reliability of fractional polynomial models for extrapolation has been discussed previously by Royston & Altman where they show that fractional polynomial models extrapolate well at least for fetal measurements [22]. These 3 methods were first tested in a simulation based study using a previously published dating equation by Verburg et al. [2]. We evaluated how well each of the 3 approaches performed in relation to the model from which the data were generated.
The Verburg equation was selected from the many dating equations in use as it is one of the five preferred dating equations according to a recent systematic review of the methodology used for creating dating charts [13]; it is also recommended by the International Society of Ultrasound in Obstetrics and Gynaecology (ISUOG) [1,13]. The great strength of performing a simulation study based on a known dating equation is that it allows us to evaluate how well our proposed methods of dealing with truncation perform in a situation where we know the "truth" (i.e. the equations from which simulated data were obtained). After evaluating the 3 approaches using simulated data based on the Verburg equations, the best approach will be applied to the INTERGROWTH-21 st Project data to estimate GA from CRL. Data were simulated from Verburg's dating equations [2]: Here and throughout all logarithms are natural logarithms.
These equations assume that log GA has a normal distribution for any value of CRL. From these equations we simulated 100 observations for each CRL value from 5 mm and 110 mm in 1 mm increments, resulting in 10,600 observations in total. A sample size of 100 was chosen as it represented the average number of CRL observations for each GA in the INTERGROWTH-21 st data and is large enough to remove effects of sampling variation. The GA was between 5 and 17 weeks, the GA range of original data from which the equations were obtained. We log transformed GA in all analyses to stabilise variance [2,15,20,26].

Validation of the simulated data
We modelled the simulated data using fractional polynomial regression of log transformed GA on CRL and compared the fractional polynomial (FP) terms and the predicted median GA from the equation obtained to the original dating equation reported by Verburg et al. The equations obtained from simulated data were remarkably similar to Verburg's original equations: Both equations for the median were FP models of degree 2 with powers 0 and 1 (i.e. terms in CRL and log CRL). The equation for SD was a FP model of degree 1, power 1 (linear), compared to the SD obtained by Verburg which was a constant. The predicted GA from the two equations agreed within 0.08 days ( Figure 1, Table 1).
After successful validation of the simulated data we truncated gestational age at 9 and 14 weeks to match the INTERGROWTH 21 st data set. We note that truncation is only a problem when we want to model GA as a function of CRL and not CRL as a function of GA (size chart) ( Figure 2, panel A). All three suggested approaches make use of this fact, but in different ways.
We applied the three proposed approaches to the truncated simulated data shown in Figure 2. Figure 3 shows a flow diagram summarising all the three methods.

Approach 1-simulation for small crown-rump length, restriction and extrapolation
The first approach is based on first modelling CRL as a function of GA ( Figure 4, panel A). From the obtained equation of the median GA, we simulate 100 CRL observations (about the same number of observations for each day of GA in the un-truncated data set) for each day of gestation between 7 and 9 weeks, to overcome the truncation at the bottom end of the distribution of CRL measurements. The choice of 7 weeks as a lower limit for extrapolation was based on the desire to be able to obtain a good fit to the data at 9 weeks where the actual data is truncated and it was also the lowest limit where the fitted equations and range of gestational age remained plausible when extrapolated. Then, using the augmented data set, we model GA as a function of CRL with CRL restricted to ≤ 65 mm (lowest CRL measurement reported at 14 weeks in the INTERGROWTH-21 st data set) as there remains a truncation problem at the upper end of the CRL distribution ( Figure 4, panel B). We then extrapolated the mean and SD equations obtained to the rest of the data ( Figure 4, panel C). The predicted GA from this approach was compared to that originally reported by Verburg (Table 2). A sensitivity analysis to establish which lower cut-off, i.e. truncating CRL at 10 mm, 15 mm or 20 mm had the best prediction, was performed by comparing the predicted GA obtained using the derived equation to that reported by Verburg. We note that the choice of a cut-off affects the fit for large CRL and so has clinical implications, because it is desirable to have predictions of GA from CRL between 15 mm and 95 mm ( Table 2).

Approach 2simulation for small and large crown-rump length
Approach 2 is very similar to Approach 1, with data simulated from fitting a size equation and using the mean and SD equations of CRL by log GA (Figure 5, panel A). We use the model for CRL to simulate 100 observations of CRL (about the same number of observations for each day of GA in the un-truncated data set) for each day of gestation at both ends of the distribution, i.e. below 9 weeks (between 7 and 9 weeks) and above 14 weeks (between 14 and 17 weeks) of gestation ( Figure 5, panel B). The choice of 7 weeks as a lower limit and 17 weeks as an upper limit for extrapolation was based on the desire to be able to obtain a good fit to the data between 9 and 14 weeks where the actual data is truncated. The two cut-offs (at 7 and 17 weeks) were also the lowest and upper limits where the fitted equations and range of gestational age remained plausible when extrapolated. The simulated CRL measurements below 9 weeks and above 14 weeks overcomes the truncation problem presented by the data thereby allowing us to model GA as a function of CRL more efficiently and obtain the respective median and SD equation ( Figure 5, panel C). The predicted GA from this approach was compared to that originally reported by Verburg (Table 3). A sensitivity analysis assessment was performed in relation to the value of the lower end cut-off of CRL.
Approach 3interchanging the X and Y axes from a model for size The third approach does not require simulating data. As before, we model CRL (Y axis) as a function of GA (X axis) using all the available data. We then extrapolate the obtained equations to larger GA to cover the desired range of CRL ( Figure 6, panel A). We then interchange the X and Y axes to give GA (Y-axis) as a function of CRL (X-axis) ( Figure 6, panel B). We do not now  have equations for the median and SD describing the relationship between GA to CRL but rather three sets of X, Y coordinates of GA giving the predicted 3 rd , 50 th and 97 th centiles for CRL. We can obtain a new equation for the median by regressing GA on the predicted median CRL. Similarly, we can obtain equations for the 3 rd and 97 th centiles ( Figure 6, panel C). The predicted GA from this approach was compared to that originally reported by Verburg (Table 4). Since we do not have an equation for the SD, the full model cannot be written down simply. We describe how we obtained an equation for the SD as function of CRL that also allows prediction of any desired centiles.

Computing an equation for the standard deviation
We have described above how to obtain equations for say the 3 rd , 50 th and 97 th centiles by regressing GA on the predicted p th centile of CRL measurements. Using  Figure 5 Crown-rump length measurements in relation to gestational age with fitted centiles (Approach 2). Full title: Crown-rump length (CRL) measurements in relation to gestational age (grey circles) with 3 rd , 50 th , and 97 th fitted centiles (Panel A). Yellow small crosses in panels B and C represent data simulated from the fitted equation of the mean and SD from panel A. Panel C shows the model fit relating GA and CRL (Approach 2). these equations (3 rd , 50 th and 97 th centile) relating log GA and CRL we can get two estimates of the SD at a given CRL from the difference between 97 th and 50 th centiles and between the 50 th and 3 rd centiles. Note that the two are not exactly the same but are very similar because GA was modelled on the log scale. It is thus reasonable to estimate the SD for each value of CRL by simply taking the average of the 2 SDs. An equation for the SD relating GA to CRL was then obtained by regressing this SD (of GA) on CRL. Estimates of any desired centiles can then be obtained using the relation: where K is the normal equivalent deviate (z score) corresponding to a particular centile, e.g. K = 1.88 for the 97 th  centile and −1.88 for the 3 rd centile, and the SD in this equation are the predicted estimates from the regression analysis just described.

Results
The agreement in estimated median GA between approach 1 and Verburg's original fit was within 0.4 days for CRL between 20 mm and 100 mm. The largest difference was at the lower range of CRL i.e. 4.8 days and 1.5 days for CRL values of 10 mm and 15 mm respectively ( Figure 4, Table 2, and Figure 7). This is notably because the model was first fit for CRL between 20 mm and 65 mm and extrapolated to the rest of the data. Model fits beginning with lower CRL values i.e. 10 mm and 15 mm did not perform as well when extended to the rest of the data. There were 135/4600 (2.9%) observations below the 3 rd centile and 120/4600 (2.6%) above the 97 th centile for CRL between 20 mm and 100 mm (Figure 4).
The predicted values of median GA from approach 2 agreed within 1 day for CRL between 15 mm and 85 mm with the largest difference at the 2 extremes of CRL, i.e. 1.5 days for CRL of 10 mm and 1.8 days for CRL of 100 mm ( Figure 5, Table 3, and Figure 7). There were 207/7640 (2.7%) observations below the 3 rd centile and 232/7640 (3.0%) above the 97 th centile for CRL between 20 mm and 100 mm ( Figure 5).
Approach 3 agreed within 1 day for CRL between 15 mm and 100 mm with the largest difference of 1.5 days observed at CRL of 10 mm. Approach 3 underestimated the predicted median GA across the whole range by~0.6 days ( Figure 6, Table 4, and Figure 7). There were 128/6448 (2.0%) observations below the 3 rd centile and 221/6448 (3.4%) above the 97 th centile for CRL between 20 mm and 100 mm ( Figure 6). The estimates obtained from the computation of SD for approach 3 were remarkably similar to those obtained from the three sets of X, Y coordinates of GA and the predicted 3 rd , 50 th and 97 th centiles for CRL (Figure 6 panels B and C).
We have shown that these rather "ad hoc" approaches correspond very closely to the "real data" for Verburg (Figure 7), which is a data set that has similarities to the INTERGROWTH-21 st project CRL data set (Figure 8). Hence we are confident that we can use these approaches to get reliable estimates based on INTERGROWTH-21 st CRL data as demonstrated in the next section (Figures 9,  10, 11 and 12). We do not discuss any results of the INTERGROWTH-21 st CRL data as the data collection is  Figure 7 Crown-rump length measurements in relation to gestational age comparing the 3 approaches with Verburg. Full title: Crown-rump length (CRL) measurements in relation to gestational age for the simulated data for CRL from 9 +0 to 13 +6 weeks gestational age comparing each of the 3 approaches with Verburg (Panel A, B and C) and all the 3 approaches with Verburg (Panel D).
A B  Figure 8 Crown-rump length versus gestational age using a sample of the INTERGROWTH-21 st CRL project data. Full title: Crown-rump length (CRL) versus gestational age for creating a size chart (Panel A) and gestational age versus crown-rump length data for creating a dating chart (Panel B) using a sample of the INTERGROWTH-21 st project data (~35% of the overall target sample) for CRL from 9 +0 to 13 +6 weeks gestational age. still on-going and for demonstration purposes we have used~35% of the overall target sample in this paper. Results of the full sample and the new international dating equation will be published in a separate paper. Figure 8 shows data from 1600 fetuses (~35% of the overall target sample) included in the INTERGROWTH 21 st study, in the same format as Figure 2. The close similarity between the two data sets is apparent. The collection of INTERGROWTH-21 st data will be completed in 2013.

Discussion
The main aim of this study was to explore the best methodology for modelling data when the outcome variable (GA) is truncated at both ends, i.e. at 9 and 14 weeks. We evaluated 3 approaches to overcome this difficulty by generating data from an existing equation (Verburg). The three approaches provided a good fit to the data ( Figure 6) when compared to the original equation reported by Verburg. We appreciate that the choice of which approach is the best is hard to justify through formal statistical testing. Approach 2 was considered the best since it gives excellent results (i.e. estimates agreed within 1 day for CRL between 15 mm and 85 mm with the largest difference of 1.8 days at the very extreme end) when compared to approach 1 which had the largest difference (4.7 days) at the lower end of CRL distribution while approach 3 consistently underestimated GA by about half a day over the entire range of CRL.
A recent systematic review of CRL dating equations and charts showed large variations between studies with only very few studies reporting complete information on inclusion/exclusion criteria, maternal demographics, ultrasound quality control, last menstruation reliability and sample selection [13]. This potential for bias, methodological heterogeneity and limitations would affect clinical decision-making depending on the equation used; hence the need for an international dating equation and  chart. The INTERGROWTH-21 st population which is carefully selected and actively followed up during pregnancy with a known outcome at birth provides a population that is ideal for developing such an international standard equation and chart. The INTER-GROWTH-21 st project is the biggest study so far to prospectively collect data on CRL. These data are of very high quality, with ultrasound measurements made by highly trained sonographers following a standardised protocol using standard ultrasonography equipment with latest technology across 8 geographically diverse sites. Gestational age estimation is an important component of clinical care and epidemiological studies. We believe that, as in other fields of medicine, all available information should be used for assessment, i.e. both LMP and ultrasound should be taken into account and agreement between the two required to be certain of its validity. One should consider that discrepancy between LMP and ultrasound could be due to disturbances in early fetal growth rather than an automatic assumption of incorrect dates, leading to re-dating. There is wide agreement that CRL is the best measure for assessing gestational age, certainly up to 14 weeks GA, since LMP is affected by both random error and systematic tendency to overstate the duration of gestation, biological variability and errors of the method including recall bias, digit preference, and additional bleeding after conception [5,[27][28][29][30][31][32]. Ultrasound-based methods measure fetal size and use reliable LMP-based formulas (of which many are in use) to estimate gestational age; however this assumes no biological variability as all fetuses of a given size are estimated to have the same gestational  Figure 11 INTERGROWTH-21 st crown-rump length measurements in relation to gestational age with fitted centiles (Approach 3).
Full title: Crown-rump length (CRL) measurements in relation to gestational age (GA) (grey small hollow circles) with 3 rd , 50 th and 97 th fitted centiles (Panel A).
Panel B and C represents shows the relation between GA and CRL after interchanging the axes and refitting the model (Approach 3). age. However, biological variability exists and this is compounded by variability due to measurement error due to equipment and observer. Thus, accurate measurements of CRL require rigorous standardisation before initiation of the study and continuous quality control measures should be implemented similar to those routinely used in laboratory practices. The implications of these different methods on research findings have recently been discussed [12]. Ultrasound can accurately determine the day of conception to within 5 days either way for 95% of cases and may be closer than LMP by an average of 2-3 days in predicting the date of a spontaneous delivery [1,17,27,28,33,34].
The unusual problem of truncation that we encountered in the INTERGROWTH-21 st CRL data is not unique in that it has been present in other studies, but has never been adequately addressed. This feature of the data has the potential to introduce considerable bias, mostly at the extremes of CRL, unless analysed carefully. Altman et al. [17] addressed a similar problem in the estimation of GA using head circumference by restricting the range of measurements included in the regression analyses. As opposed to their HC data, for which the GA range was 12-42 weeks, the INTERGROWTH-21 st CRL data span only 5 weeks so using CRL data unaffected by truncation leads to a large loss of data and limited clinical usefulness.

Conclusion
Although these approaches do not follow standard statistical analysis paradigms for modelling, we have shown empirically that the results of these rather "ad hoc" statistical methods correspond very closely to the "real data" based on the study of Verburg et al. [2], which is a data set similar to CRL data set of the INTERGROWTH-21 st project. They are more suitable for large data sets to reduce the effect of sampling variation and ensure reasonable extrapolation. We are thus confident that we can use these approaches to get reliable estimates based on INTERGROWTH-21 st CRL data. Although only examined for CRL, these methods may be a solution to other truncation problems involving similar data and their applicability to other settings would need to be evaluated.

Details of ethics approval
The INTERGROWTH-21 st Project was approved by the Oxfordshire Research Ethics Committee 'C' (reference: 08/H0606/139) and the research ethics committees of the individual participating institutions and corresponding health authorities where the Project was implemented.