Skip to main content
Fig. 1 | BMC Medical Research Methodology

Fig. 1

From: Understanding the importance of key risk factors in predicting chronic bronchitic symptoms using a machine learning approach

Fig. 1

Conceptual division of the longitudinal, participant-level data into a training set and two tests sets (within- and across-participant). This figure illustrates the Conceptual division of the longitudinal, participant-level data into a training set and two tests sets (within- and across-participant). Suppose in hypothetical setting, data from 8 participants over 5 years were collected. Out of the 8 participants, a randomly selected 50% of study participants, two observations (at different study years) were randomly selected, which are person 31, 12, 7, and 2. Models were trained on the first of these observations (training set), denoted by: . Models were then validated using two complementary holdout test datasets. First, we considered the second (later) observation from the participants used to train the model (within-participant test set), denoted by . Then, we considered a random observation from the 50% of participants not included in the training set (across-participant test set), denoted by

Back to article page