HDR Gateway logo
HDR Gateway logo

Bookmarks

Immune Checkpoint Inhibitors synthetic data: HDR UK Medicines Programme resource

Population Size

683

People

Years

2015 - 2023

Associated BioSamples

None/not available

Geographic coverage

United Kingdom

England

Lead time

1-2 months

Summary

A synthetic dataset featuring patient-level information for 683 cancer patients treated with checkpoint inhibitors, including demographics, primary cancer diagnoses, details of ICI treatments, and other clinical records during hospital admissions.

Documentation

This highly granular synthetic dataset created as an asset for the HDR UK Medicines programme includes information on 680 cancer patients over a period of three years. Includes simulated patient-related data, such as demographics & co-morbidities extracted from ICD-10 and SNOMED-CT codes. Serial, structured data pertaining to acute care process (readmissions, survival), primary diagnosis, presenting complaint, physiology readings, blood results (infection, inflammatory markers) and acuity markers such as AVPU Scale, NEWS2 score, imaging reports, prescribed & administered treatments including fluids, blood products, procedures, information on outpatient admissions and survival outcomes following one-year post discharge.

The data was generated using a generative adversarial network model (CTGAN). A flat real data table was created by consolidating essential information from various key relational tables (medications, demographics). A synthetic version of the flat table was generated using a customized script based on the SDV package (N. Patki, 2016), that replicated the real distribution and logic relationships.

Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and provide the real-data via application.

Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.

Dataset type
Health and disease
Dataset sub-type
Not applicable
Dataset population size
683

Keywords

Cancer, Cancer immunotherapy, checkpoint blockades, immune-oncology, oncology, therapeutic efficacy, immune response modulation, medicines, HDR UK Medicines Driver programme, Synthetic data, CT GAN, ICI treatment

Observations

Observed Node
Disambiguating Description
Measured Value
Measured Property
Observation Date

Persons

NOT APPLICABLE

683

Count

20 Mar 2024

Provenance

Source of data extraction
Machine generated
Collection source setting
Secondary care - Accident and Emergency, Secondary care - In-patients, Secondary care - Outpatients
Patient pathway description
Data is representative of the multi-ethnicity population within the West Midlands (42% non white). Data includes all patients admitted during this timeframe, with National data Opt Outs applied, and therefore is representative of admissions to secondary care. Data focuses on in-patient stay in hospital during the acute episode but can be supplemented on request to include previous and subsequent hospital contacts (including outpatient appointments) and ambulance, 111, 999 data.
Image contrast
Not stated
Biological sample availability
None/not available

Structural Metadata

Details

Publishing frequency
Quarterly
Version
1.0.0
Modified

08/10/2024

Distribution release date

21/08/2024

Citation Requirements
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)

Coverage

Start date

01/01/2015

End date

31/08/2023

Time lag
Not applicable
Geographic coverage
United Kingdom, England, West Midlands
Minimum age range
18
Maximum age range
150
Follow-up
Other

Accessibility

Language
en
Alignment with standardised data models
LOCAL
Controlled vocabulary
SNOMED CT, ICD10

Data Access Request

Dataset pipeline status
Available
Time to dataset access
1-2 months
Access request cost
www.pioneerdatahub.co.uk/data/data-services-costs/
Access method category
TRE/SDE
Access service description

Trusted Research Environments (TRE) are built using Microsoft Azure services and hosted in the UK to provide research teams a safe, secure and agile environment which allows users to quickly analyse, interpret and form an enriched view of primary care information through a range of integrated datasets.

Health data collated from multiple sources is ingested into a secure data lake which will then allow subsets of data to be made available to research teams on approval of a data request. Once approved a customer specific TRE is made available with a standard set of leading analytical tools from Microsoft including Azure Databricks, Azure Machine Learning, Azure SQL and Azure Synapse (for large-scale data warehouses). Specific tools can be provided at an additional cost over the standard platform data access charge and the PIONEER team will work with you to determine your exact needs.

Access to the TRE is managed using the latest virtual desktop technology to provide a safe and secure end-user experience. By utilising leading edge design PIONEER are able to create TREs rapidly to enable us to service any customer requirement.

Jurisdiction
GB-ENG
Data use limitation
General research use
Data use requirements
Project-specific restrictions
Data Controller
University Hospitals Birmingham NHS Foundation Trust

Dataset Types: Health and disease


Collection Sources: Secondary care - Accident and Emergency, Secondary care - In-patients, Secondary care - Outpatients