Data sets
The Australian Bureau of Statistics (ABS) has conducted seven National Health Surveys (NHS) since 1995 [16, 17]. These are nation-wide cross-sectional surveys each with 6000–8000 female participants aged 18 years and over. The surveys conducted in 1995, 2007–08, 2011–12, 2014–15 and 2017–18 included measured heights and weights, whereas those in 2001 and 2004–05 had only self-reported heights and weights.
The Australian Longitudinal Study on Women’s Health (ALSWH) began in 1996 with the recruitment of more than 47,000 women in three age groups: women aged 18–23 years (born 1973–73, n = 14,247), 45–50 years (born 1946–51, n = 13,714) and 70–75 years (born 1921–26, n = 12,432). These women were randomly sampled from the database of the Australian universal health insurance scheme, now called Medicare Australia, which includes all residents of Australia. Since then, they have been surveyed on average every 3 years, initially by mailed questionnaires and more recently with the option of online completion of the surveys. Details of the study methods and representativeness of the samples have been published elsewhere [18]. In 2013, another cohort of women then aged 18–23 (born 1989–95, n = 17,012) was recruited using a variety of methods and these women have been surveyed annually using a web-based questionnaire [19]. At every survey, women are asked to report their weight and height. Women who were pregnant at the time of completing the survey were asked to report their pre-pregnancy weight (except for the first three surveys of the 1973–78 cohort where pregnant women’s weight was treated as missing). Height and weight data were collected over the following periods: 1989–95 cohort, 2013–2017; 1973–78 cohort, 1996–2018; 1946–51 cohort, 1996–2016; 1921–26 cohort, 1996–2011 (because these items were not asked for these elderly women after that date. Response and attrition rates for each survey are available at the study website http://www.alswh.org.au/.
Measures
From both data sources, body mass index (BMI) was calculated as weight (kilograms) divided by the square of height (metres). The World Health Organization classification was used, namely: underweight BMI < 18.5 kg/m2, normal weight BMI 18.5–24.99 kg/m2, overweight BMI 25–29.99 kg/m2, and obese BMI ≥ 30 kg/m2. In this study, we focus on obesity because other studies have shown that excess burden on the healthcare system is largely associated with obesity rather than overweight [21].
Statistical methods
The prevalence of obesity for each year of age was calculated for each ALSWH cohort using information provided by participants at each survey (i.e., each period). This means that most ALSWH participants contributed data at multiple periods. Prevalence of obesity from the NHS was extracted from age-group and sex specific data in various ABS publications and summary tables [16, 17]. The NHS data were re-arranged using a Lexis diagram [20] to create synthetic cohorts centred at ages comparable with the ALSWH cohort surveys.
For each data set the prevalence of obesity was then presented in four plots:
- a)
Prevalence by age for different periods;
- b)
Prevalence by period for different age groups;
- c)
Prevalence by age for different cohorts;
- d)
Prevalence by cohort for different age groups.
If plots a) and b) both show parallel curves this supports an age-period model and if plots c) and d) show parallel curves this supports an age-cohort model [20].
Based on evidence from these plots, APC models were fitted using the Stata procedure apcfit described by Rutherford, Lambert and Thompson [14]. This method uses a generalized linear model framework with age, period, and cohort treated as continuous variables. The number of obese people was modelled using a Poisson distribution with a log link function (to give rate ratios), an offset given by log(number of people surveyed), and functions for age, period, and cohort as the explanatory variables. This model is based on the assumption that the observations are independent. This is reasonable for the NHS data which were from new samples at each survey. But the ALSWH participants contributed repeated observations, so the samples at each age and period/cohort time are not independent; this could lead to bias, particularly underestimation of variability.
To obtain functions for age, period and cohort restricted cubic splines are used with transformations to the spline basis vectors for period and cohort terms [20]. Due to the systematic difference between the NHS self-reported data on height and weight in the 2001 and 2004–5 surveys and the measured data from the other surveys, we omitted the former from the modelling.
For each data set models were fitted with terms for: age, period and cohort; age and period; and age and cohort. Period effects were estimated relative to the reference year of 2007, the median year for the ALSWH survey data, and cohort effects were estimated relative to 1951, the median year of birth for ALSWH participants. Due to the identifiability problem for first order effects APC models are over-parameterised, and for the type of models considered here three constraints are needed [20]. The choice of constraints does not affect the model fit but does affect the estimates and hence the graphical displays of effects. If, as for obesity, age is a major unmodifiable factor, the age function is of primary importance. A linear temporal change, or drift, can be arbitrarily attributed to either the cohort function or the period function. The age function can be represented in two ways:
APC: As age-specific rates for a particular period, after adjustment for cohort effects;
ACP: As age-specific rates for a particular cohort, after adjustment for the period effect.
For the APC version the drift is included in the period function. The period function is set to zero for the reference date and the period effects are relative risks relative to that date. The cohort function has both the average and the slope set to zero and represents a residual relative risk relative to fitted values for age and period effects. This model shows the cross-sectional age-specific rates at the reference year and how the pattern varies over time.
For ACP version the drift is included in the cohort function. The cohort function is set to zero at the reference birth year and the cohort effects are relative risks relative to that year. The period function has both the average and the slope set to zero and represents a residual relative risk relative to fitted values for age and cohort effects. The model can be interpreted as showing the biological or longitudinal effect of age for the reference cohort and how this differs across cohorts.
Model fit was assessed using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) – smaller absolute values indicate better fit. The log-likelihood values and degrees of freedom (d.f.) were also used to calculate the deviance = (− 2 × the difference in log-likelihood values for the nested models) and obtain p-values using the chi-squared distribution. To assess period and cohort effects the fit of models without each of these terms (i.e., models with A + C and A + P) was compared with the fit of the model with A + P + C. The numbers of knots for the cubic splines were selected using AIC, BIC and the following principles: parsimony (using as few parameters as needed to capture the main features but avoid overfitting); the same number of equally spaced internal knots for cohort and period effects so these are treated symmetrically; and the same number of nodes for both data set, in order to facilitate comparisons.