We will systematically review cohort studies and include individual patient data in a meta-analysis to estimate the prognostic value of clinical characteristics and diagnostic test results. This will allow us to develop a prognostic model of the risk factors for diabetic foot ulceration (DFU) based on data collected worldwide. We will test the robustness of the model in different demographic profiles – for example, age, duration of diabetes, control of diabetes (insulin, diet or oral medication) and type of diabetes (Type I, Type II).
The electronic search strategies used for in a previous systematic review by members of our group will be conducted according to the published methods . Copies of the EMBASE and MEDLINE search strategies can be found in Additional file 2: Appendix 2.
One reviewer will apply the IPD review eligibility criteria to the full-text articles of the studies identified in our literature search and also all studies excluded from our aggregate systematic review to ensure we do not miss eligible IPD. A second reviewer will apply the eligibility criteria to a 10% random sample of the abstract search yield to check that no relevant material will be missed by having only one reviewer assess all the abstracts.
Types of Participants
The IPD review will only include data from individuals who are free of foot ulceration at the time of study entry and who have a diagnosis of diabetes mellitus (either type 1 or type 2). Corresponding authors of all identified cohort studies will be contacted and invited to share their data. When we identify studies with patients who had prevalent foot ulcers at the time of recruitment, we will ascertain whether IPD are available for patients who were free of ulceration at the time of recruitment.
Types of exposure variables
All elements from the patient history, symptoms, signs and diagnostic test results will be considered for inclusion in the prognostic model. These are collected variously as continuous, binary and multi-categorical data.
Type of outcome variable
The outcome variables will be incident foot ulceration (present/absent) and time to ulceration from initial diagnosis of diabetes as well as from the time of screening.
Types of studies
We will seek data from all cohort studies which included participants who were free of foot ulceration at the time of study recruitment. Our previous work indicates that data collected in older studies could be difficult to obtain and we are aware that some investigators are no longer in possession of their study data (Personal communication, D. Armstrong 2012). Where data are unavailable, details of the study will be presented in aggregate form in the final report.
Cohort studies which recruited patients with prevalent and incident foot ulceration will be considered for inclusion where it is possible to separate the data for these patients.
Data extraction and quality assessment
Data extraction will be undertaken by 2 reviewers working independently and disagreement will be resolved by discussion. For quality assessment, a 2-stage process will be used; 2 reviewers working independently will complete those items available from the published report together with information provided by authors of the primary studies.
The assessment of methodological quality is an important component of an IPD systematic review but there is complexity in assessing potential threats to the validity of primary studies for this research genre. No widely agreed criteria exist for assessing the risk of bias in aggregate systematic reviews of prognostic studies  and there is a complete absence of established guidelines for prognostic IPD reviews (personal communication, D. Altman, R.Riley 2012). Although flaws in the recruitment of patients or the manner of data collection can influence review findings, some domains usually assessed by systematic reviewers of published reports are irrelevant, e.g. those pertinent to the analysis performed by the primary authors. We have compiled a list of items relevant to our IPD review question which are likely to identify studies with data which are compromised by threats of validity. This checklist of items can be found in Additional file 3: Appendix 3 [15–26], it has been refined during a pilot phase by 2 researchers working independently.
As with any meta-analysis, heterogeneity must be considered, both from a clinical and statistical viewpoint. First, clinical expertise will be used to decide if it would be meaningful to combine the studies based on the patient demographics, risk factors (symptoms, signs and diagnostic test results), outcome measures and timing of outcome measures (length of follow-up). We will examine histograms of relevant variables from each dataset to check the spread, mean, median, and skewedness, and the consistency of these properties across datasets, before reaching a decision about whether it makes clinical or statistical sense to combine the data. We will also consider relationships between variables using tables and scatter plots.
Sources of heterogeneity that particularly concern us are differences between the patient groups with regard to basic demographics and disease spectrum as these may have a strong influence on prognosis and the performance of the tests. Also important are the various methods used to conduct the tests, which again may lead to marked differences in test performance. Another potentially important source of heterogeneity is length of follow-up as this may impact on the proportion of patients who develop ulceration. These aspects will be carefully detailed during the review process.
We are aware that a consensus has not yet been reached about the investigation of heterogeneity in IPD systematic reviews. Therefore we will use conventional methods of investigating heterogeneity on aggregate data generated from the datasets. We shall therefore generate summary measures and use these to create forest plots and compute I  statistics . I  values of 50% and 75% have been used to denote moderate and high levels of variation between studies that are not explainable by chance. We shall use these figures as a guide only, together with the results from the IPD .
We propose to use a multi-level mixed model, using “study” as one of the levels. Such a model can also allow for the within-patient clustering that occurs if a patient contributes data from both feet, although to aid interpretation, we prefer to use patients rather than feet as the unit of analysis. We will only attempt this analysis if the results of the investigation of heterogeneity do not rule it out and the model diagnostics are acceptable.
As the datasets should contain the date of initial diagnosis of diabetes and the date, if any, of foot ulceration, we propose to use survival analysis. Covariates will be added to the model based on clinical relevance, if there are many possible covariates that could be added given the number of events and patients and there is a danger of model over-fitting, the clinicians will be asked to choose a subset of covariates based on their expertise and experience. We shall not use data-derived methods as these lead to overly optimistic estimates of model performance. Model performance will be assessed graphically and with chi-square and other goodness-of-fit statistics.
As we plan to use the patient, rather than the foot, as the unit-of-analysis, we can use a simpler model that will be easier to interpret. It is also important from the view of patient outcomes – an amputation affects the patient as a whole and not just the foot. One approach to construct the model is to use the most badly affected foot from each patient. However, if the model performance merits an analysis using the foot as the unit-of-analysis, and of course allowing for the correlation between feet belonging to the same patient, we shall conduct such an analysis.
To avoid a loss of information, wherever possible we shall keep continuous variables as continuous and not dichotomise or otherwise categorised variables, e.g. we shall use BMI, rather than subdivide patients into “underweight”, “normal weight”, “overweight”, and “obese”. Sometimes the relationship between a continuous covariate and the outcome is not linear, and in such cases we will investigate the use of fractional polynomials and similar.
Validation of the dataset
We intend to undertake both internal and external validation of the prognostic model. For internal validation, we will not divide the datasets into development and validation subsets, as this is a relatively inefficient method of validating prediction models. Instead we shall use bootstrapping as it is less susceptible to bias and leads to more stable model development . For external validation, we shall reserve one or two of the datasets to test the final model obtained in the main analyses. The reserved datasets will be chosen on the basis of completeness of variables collected so that, we hope, all the variables present in the final model will also be in the reserved datasets, thus requiring no or minimal modification of the final model for external validation purposes. We also shall look at various characteristics of each dataset such as patient demographics when choosing the reserve datasets to ensure that these datasets are not atypical of the set of datasets.
Unfortunately we are currently lacking the data required for a full power calculation. However, as an illustration, assuming that it is possible to split the sample of 17000 evenly in half into patients with and without some prognostic factor, it would be possible to detect a 2% difference in the proportions of patients with foot ulcers in each group with over 90% power. This calculation assumes that the ulceration rate in one group is 0.10 and 0.08 in the other. With a Type I error rate of 0.05, these figures give a power of 99.53%.
Handling missing data
Our method for handling missing data will depend on the extent of the missingness and if the mechanism causing the missingness is known, specifically if they are missing completely at random, or not. If the datasets contain missing data for which there is no explanation, they will be assumed to be ‘missing at random .
We will use ICE multiple imputation (ICE programs, Stata 11.0) , and include all available patient variables (including the patient outcome: foot ulceration) in the imputation model to help predict missing data for the variables of interest. Twenty imputed datasets will be used and included in the imputation procedure. To test the validity of the imputation, a sensitivity analyses will be performed restricting our cohort to patients without missing data (complete case analysis) .
Specifying variables for analysis
A full list of the most common variables reported in cohort studies is presented in Additional file 4: Appendix 4. Examples of variables of interest are below. Importantly the dates relating to patient recruitment, the timing of the measurement of variables and the date of follow-up are also required.
Continuous variables (and date measured)
Peak plantar pressure (PPP)
Duration of diabetes
Binary and other categorical variables (and date measured)
Cutaneous sensation (monofilaments)
Vibration Perception Thresholds (VPT (tuning forks and neuro or biothesiometers))
Absent pedal pulses
Diabetes-related medication use
Outcome variable; Incident foot ulceration (present/absent) and time to ulceration (date measured).
Supplying the data
The authors of the cohort studies will be able to supply data in any way that is most convenient to them. A single individual will be identified for each study to whom all queries about the data collection processes and transformation of individual variables will be addressed. The research committee structures can be found in Additional file 4: Appendix 4.
Ethics and governance
The ethics of obtaining data collected from a number of sources which cross international boundaries and different legal systems have been carefully considered and informed by ethics advice issued by the Medical Research Council (UK). This study does not require separate ethical committee approval for the following reasons;
Investigators of each of the original studies obtained local ethical committee approval and written, informed patient consent prior for each of the cohorts included in the IPD review.
The project seeks anonymised data from which the individuals recruited to the original study cannot be identified .
The value of the IPD analysis will be the production of a global dataset of prognostic factors for diabetic foot disease and the opportunities for new uses will be maximised. Anonymised data from each of the collaborators of the primary cohort studies will be transported in a manner deemed most convenient to original study investigators including encrypted USB sticks if required.
Data will then be formatted in a consistent way to permit a re-analysis. Data will be stored in password protected files on a secure University of Edinburgh computer [University of Edinburgh Data protection registration number: [Z6426984]] and will only be accessible by a member of the Data Management Committee, membership of which can be found in the appendices.
This protocol incorporates a data confidentiality agreement which makes clear the need for the data provided to de-identify individual patients. It also includes an assurance that the original investigators are in possession of local ethical approval for their study.
Regular e-mail updates will be used to inform the international group of our activities. Electronic media such as Drop Box and e-mail may be used to store and exchange data and paperwork between the original investigators and the researchers. When researchers are cleaning a specific data set they may communicate with the original investigators via telephone discussions or by email.
Collaborators face to face meeting
Once the initial analysis has been performed, a face-to-face meeting of all collaborators will be convened. The purpose of the meeting is to allow the collaborators know the results of the review and meta-analysis first and to have the opportunity to interpret the data and question the findings Additional file 5: Appendix 5 and Additional file 6: Appendix 6.
In the final report we will clearly present the methods of the review such as tabulated characteristics of included studies and details of study designs. The report will conform to recommendations in the PRISMA checklist. Formal synthesis of the results and formal assessments of study quality will also be presented .
This protocol is registered with PROSPERO (International Prospective Register of Systematic Reviews) at the NHS Centre for Reviews and Dissemination (CRD) at the University of York . [Registration number: CRD42011001841].
Public Partners Involvement (PPI)
The research is supported by a public partner from Diabetes UK who ensures the research incorporates aspects of risk assessment that matter to patients. His views, opinions and perspective have ensured the study documentation and data collection processes are acceptable to the general diabetic population.