Birth cohort study
The design of the study has been reported earlier [17,18,19]. Longitudinal birth cohort study was conducted in three neighbouring urban slums in Vellore measuring 2.2 sq.km with a population density of approximately 17,000 per sq.km, South India. The data were collected from these three slums Kaspa, Ramnaickanpalayam and Chinnallapuram where the living environment is poor such as open drains, without water and toilets, without secure tenancy, overcrowded clustered houses with many rubbish dumps. The common occupation in the study area is the manual production of tobacco based beedi products for a daily wage.
Women of child-bearing age were visited to identify new pregnancies during a survey conducted in 2002. Children of pregnant women intending to remain in the area for 3 years were eligible for enrolment. Infants were recruited from birth between March 2002 and August 2003 following written informed consent from the mother. These children were followed until their third birthday. The last child was followed up to August 2006. This study was approved by the Institutional Review Board and ethics committee of Christian Medical College and Hospital. In this study, 290 children were included (Fig. 1). In the original study, children were visited twice a week to record incidence of diarrhoea and morbidities. Weight and length at birth were obtained from delivery records available at the first home visit. Subsequently, height and weight were measured at every month until 36 months by field workers at the study clinic using single measurements. Recumbent length was measured using a standard infantometer and subsequently using a stadiometer, both to the nearest millimetre. Weight was measured using a Salter weighing scale to the nearest 100 g. Due to missing growth measurements beyond 3 years of follow up, we have included height and weight from birth to 36 months for the study data analysis.
Study variables
Baseline study characteristics of interest included were gender, height (cm), weight (kg), baby in ICU or not, abortion (yes, no), mode of delivery (suction, forceps, caesarean, vaginal), socio economic status (low, middle, high), gravida (1,2,3, > 3), highest education of household (no formal education, Primary school (1-5 years), Middle school (6-8 years), High school (9-10 years), Higher secondary / College/ Polytechnic / Professional (> 10 years)), and duration of exclusive breast feeding (< 3 months, ≥ 3 months).
Statistical analysis
For demographic and other characteristics, data are presented as mean and standard deviation (SD) for normally distributed variables, and as frequency (percentage) for categorical variables. There were few values missing in the follow-up visits of the growth outcomes. Using the Last Observation Carry Forward (LOCF) method of imputation, the data was considered as complete dense data. To handle and analyse the large amount of constantly measured growth data, Functional Data Analysis (FDA) framework was used [11, 12, 20,21,22,23,24].
Smoothing and B-spline basis functions
Assuming that a curve or function for replication ‘i’ arrives as a set of measured values, yi1, yi2, …, yin, the first step is to convert these values into a curve or function xi with values xi(t) computable for argument value at time ‘t’. A set of functional building blocks ɸk, k = 1,2,…,K which is called basis functions and are combined linearly. A function or curve x(t) is expressed in mathematical representation as
$$x(t)=\sum_{k=1}^K{c}_k\ {\phi}_k(t)$$
(1)
in terms of large number K known basis functions ϕk.
Where c indicate the vector of length K of the coefficients ck and ϕ as the functional vector whose elements are the basis functions ϕk.
Spline functions are the common choice of approximation system for the functional data in the specific nature. It has more or less replaced polynomials, which in any case they contain within the system. In defining a spline, the first step is to divide the interval over which a function is to be approximated into S subintervals separated by values
$${T}_{s,}\ s=1,2,\dots, S-1$$
and which are called knots.
A spline function is a polynomial of specified order m in each interval. To construct the child growth outcome trajectories into functions, we have applied B-spline system. To construct the basis function, number of order, knots and range were chosen. Using these information along with the number of basis then the B spline basis was generated. A B splines-based smoother is used because its simplicity and flexibility for data [11, 12, 21,22,23,24].
Outlying function
Outlier detection visualizing tools such as Functional version of Box plot and outliergram were used to identify an abnormal function in both outcomes [12, 24,25,26]. There are two types of variability in the functions: (i) amplitude variation and (ii) phase variation. The amplitude variation deals with the differences in height between the functions. The phase variation deals with the differences in timing of important features between the functions. The registration technique was carried out to improve the curve misalignment [12, 23, 27, 28].
Functional principal component analysis
FDA is an advanced statistical methodology specially established for analysing temporal data [29]. The longitudinal child growth trajectories was converted into functions using the B-spline basis with smoothing parameter (λ) and which is chosen by the generalized cross-validation (GCV) technique [30]. An optimal of smoothing parameter for growth and other temporal data is generally recommended [31]. This smoothing approach eliminates the random noise from month wise data. Functional principal component analysis (FPCA) is an extension of conventional principal component analysis (PCA) to functional data [29]. We applied Functional version of PCA to identify the important temporal pattern across the growth smooth functions. Individual monthly growth observations xi are replaced with smooth functions xi(t) in the functional setting [29] and weighting coefficient functions βj(t).
$${f}_i=\int \beta (t)\kern0.5em {x}_i(t)\kern0.5em d\kern0.5em t,\kern0.5em \mathrm{i}=1,2\dots, \mathrm{N}$$
(2)
The FPCA was used to extract the information from functional data to identify the different pattern of the child growth function. Independent functional principal component curves describe the important modes of temporal variability in growth across the individual fitted curves. FPCA also reduces the dimensions of the problem by representing functions in terms of a finite set of functions and further functional linear model was used to assess the association between factors and trajectories [20, 22,23,24, 32,33,34,35,36]. The conditional kernel density estimators plot was used to identify the subgroup of the growth functions and, the proportion of children contributing to each subgroup were estimated.
Functional linear model
The traditional statistical methods of analysis of variance (ANOVA) and linear regression investigates the variability in observed data can be accounted for by other known variables. Functional version regression models are used for modelling relation between functional and non-functional variables. When the independent variable is categorical and the outcome is functional, our interest is to determine whether there are differences in the functional outcome among the different categories of the independent variable. In functional setting, the response variable y with argument t is functional version. The most general linear model is,
$${y}_i(t)={\beta}_0(t)\kern0.5em +\kern0.5em {\sum}_{j=1}^p{\beta}_i(t)\kern0.5em {x}_{ij}$$
(3)
Further this was applied and explored to growth data to assess the relation between variables and trajectories.
Software
All statistical analysis were performed using R studio version 3.6.1. FDA was performed using fda package.