One of the cornerstones of health and clinical research is to identify individuals who have a high risk of developing an adverse outcome over a specific time period, so that they can be targeted for early preventative strategies and possibly treatment. For example individuals who are seemingly healthy but are found to have a high risk of developing cardiovascular disease could be recommended to modify their lifestyle and behaviour (e.g. smoking, exercise, eating habits) to reduce their future risk. They may also be prioritised for clinical investigation, which could lead to early diagnosis of an underlying condition (e.g. diabetes, high blood pressure) and preventative treatment (e.g. statins or aspirin) to manage it.
For this purpose of prognostic risk assessments there is a growing interest in risk prediction modelling, [1–3] where a statistical model is used to estimate the risk of future outcomes for individuals based on one or more underlying characteristics. When considering future outcomes in patients, a risk prediction model is often referred to as a prognostic model (typically used for outcome risk for a defined disease) or more generally a clinical prediction model (used for both diseased or non-diseased settings) Similarly the word ‘model’ is often replaced with ‘score’, ‘tool’, ‘index’, or ‘rule’. However, the same principle remains: to accurately predict the risk of future occurrence of an outcome in an individual by utilising the values or levels of multiple individual characteristics. We refer here to such characteristics simply as predictors, but they are also termed prognostic factors, risk factors, prognostic variables, and prognostic markers . They often include standard features such as age, sex, smoking and family history, but also increasingly include more complex clinical measures such as biomarkers, relating to a diverse range of measurable biological (including genomic), pathological, imaging, clinical, and physiological variables.
Diagnostic risk prediction models also exist, where the risk of already having a disease is calculated; however, the focus in this article is on predicting the risk of future outcomes. Unless the outcome prediction relates to the very near future (e.g. risk of hypocalcaemia within 48 hours after thyroidectomy ), single predictors usually do not provide accurate predictions at the individual-level . For this reason risk prediction models usually utilise multiple predictors in combination. For example, in healthy women the probability of developing breast cancer can be estimated from the Gail model, which is a risk prediction model combining information on family history, age, age at first live birth, age at menarche, breast biopsy number, and menopause [6, 7]. In women with newly diagnosed breast cancer, a well-known risk prediction model is the Nottingham Prognostic Index (NPI),  which gives a score that relates to the survival probability and is based on a combination of tumour grade, number of involved lymph nodes, and tumour size.
Before evaluation of its impact in daily practice [3, 9, 10], risk prediction model research has two main phases: model development (including internal validation using the same data or data source) and external validation (using new data from a different data source) [2, 11, 12]. Validation requires demonstrating that the model is accurate in the population of individuals for whom it is intended. It must ascertain the model’s ability to distinguish between patients with different outcomes (‘discrimination’) and show the agreement between predicted and observed risks in groups of individuals with similar risk predictions (‘calibration’) . Importantly, validation must go beyond the set of data and individuals that were used to develop the model, because predictive performance when estimated on the development data is often optimistic, related to multiple testing with a limited sample size [1, 13, 14]. Validation is therefore needed in individuals not used in the development process and preferably selected from different settings (external validation) .
Unfortunately most publications on risk prediction models describe model development, and only a small number report external validation studies . This might be a key reason why, despite many being developed, relatively few models are actually being adopted in practice. The collation and synthesis of individual participant data (IPD) from multiple studies offers a novel and natural opportunity to overcome this current lack of validation . For example, models could be developed using data from a subset of studies and assessed on data from the remaining studies . Variation in model accuracy across studies and its causes could also be explored. The approach would also unite researchers, increasing sample sizes and encouraging a consensus towards a single well developed and validated prognostic model, rather than a number of competing and non-validated models for the same clinical question. For example, the IMPACT (International Mission for Prognosis and Analysis of Clinical Trials) consortium developed a prediction model for mortality and unfavourable outcome in traumatic brain injury by sharing IPD from 11 studies (8509 patients), with successful external validation using IPD from another large study (6681 patients) .
IPD meta-analysis in this context can also go beyond using IPD from multiple studies, and more broadly consider synthesising IPD from any relevant clusters in the wider population of interest. For example, large electronic databases and registries are increasingly available that contain routinely collected patient records and risk factor measurements, which can be linked to health outcomes using, for example, Health Episode Statistics (HES) linkage. An example is the THIN database , which contains anonymised patient records and risk factor information from millions of patients collected from over 500 general practices in the UK , Such databases inevitably contain clustering of patients, for example within practices, hospitals and countries, and so an IPD meta-analysis could account for such clustering, for example by developing a model using data from a subset of the clusters (e.g. hospitals, practices), followed by external validation on the remainder.
The aim of this article is to perform a qualitative review to examine how researchers are developing and validating risk prediction models when IPD from multiple studies are sought and then combined for this purpose. The aim is to identify the current research standards and techniques; the role of IPD meta-analysis methods toward development and validation; and the common challenges and methodological problems researchers face. This allows us to generate a set of recommendations for how research in this area can be improved, and to flag those methodological techniques and issues researchers should recognise when modelling risk prediction using multiple sources of IPD.