Skip to main content

Table 5 Possible Data Package Contents

From: EFSPI/PSI working group on data sharing: accessing and working with pharmaceutical clinical trial patient level datasets – a primer for academic researchers

Item Further details
Anonymized Raw datasets Dataset content reflects the information as recorded on the case report form. These are usually split into a number of raw datasets reflecting the different types of data that have been collected, e.g. adverse events, laboratory assessments, disease specific measurements. Only the datasets required for the research may be provided by some Data Holders
Anonymized Analysis-ready datasets These datasets will have been derived from the raw datasets and will reflect the additional programming that needs to be applied for the data to be analysis ready. This could be the synthesis of different datapoints to create a single efficacy assessments (e.g. ACR score in RA or a time to disease progression) and could also include derivations and assumptions as a result of missing data. They will also identify the original analysis populations (e.g. ITT, Per Protocol) that were defined in the Statistical Analysis Plan. Researchers should understand the differences between these populations so they can be used appropriately.
Protocol (including any amendments) The protocol describes the clinical study design, assessment schedule and planned statistical analysis in detail. Small amounts of text may be subject to redaction if they are considered to be commercially confidential.
Annotated Case Report Form This document provides the link between the data points that were recorded by the investigator onto the paper or electronic case report form and the variable name and dataset location where they are held within the database. This is a key document to help the researcher navigate the database.
Statistical Analysis Plan This document is written by a statistician prior to the study data being available for analysis. It is a comprehensive outline of the statistical endpoints to be derived and analysis methodology to be used. The 1 to 2 pages of statistical detail from the protocol are expanded into a document that can be 10–20 pages in length.
Dataset specifications This (alongside the Statistical Analysis Plan) will provide a map of the dataset structure and data variable locations
Clinical Study Report The CSR will be subject to some redactions in order to preserve patients’ anonymity and in some cases to protect commercially confidential information. The patient level data listings will not be included.
Optional: SAS Programs In situations where the analysis ready datasets cannot be found, sponsors may choose to share the SAS programs that were used to create the derived datasets and analysis results. Note that copies of SAS programs may not be executable on other systems without some editing.
In certain cases the SAS code outlining the statistical models used may be shared in order to help the researcher navigate the data and original modelling approach.
Optional: SAS Logs Limited value as the original SAS program may not be executable on other computer systems or outside of the Data Holder’s standard SAS macro calls.