Non-IO Clinical Trial Curation#
This documentation describes the curation process of clinical trial data for non-immunotherapy datasets into a standardized R object.
Non-immunotherapy datasets#
Objective#
While most steps overlap between immunotherapy and non-immunotherapy dataset curation, it is important to understand the differences. The following details focuses on the current data elements and finally the differences.
Currently, a non-ICB clinical dataset is curated into R's SummarizedExperiment (SE) object and not MAE because of the absence of multiple omics data. Sample code for curation can be found on Github.
Curation#
A non-ICB clinical data object contains the following data parts:
- Expression values or
Assay
data - Clinical metadata: contains patient/sample metadata
Expression values or Assay
data#
Assay data contains genomic profiles of the patients. The data is usually processed in-house from raw files or in some instances, published processed data is used directly. For instance, gene expression profiles of the patients are typically generated by either microarray or RNA-seq platforms. In the BHK lab, we use Robust Multiarray Averaging (RMA) and CDF files from Brainarray for processing microarray data, and the Kallisto method for processing RNA-seq data, as mentioned in immunotherapy curation.
Clinical metadata#
Any data pertaining to the samples or clinical response can be included in the Phenodata object. This is either fetched from public platforms like GEO if the data is public or upon request in case of confidentiality. Metadata sections in the SE objects include a few mandatory columns which are populated either by information from the other columns or the original published paper. NA is used to fill out columns for which no information is found. Each SE object includes additional metadata that may or may not be available in other SE objects.
Mandatory columns are the same as immunotherapy colData
(see above).
Gene metadata#
Similar to immunotherapy datasets, gene metadata for non-immunotherapy datasets is also obtained from Gencode annotations. "Ensembl.v99.annotation.RData" from "Gencode.v33.annotation.RData" is used for curating rowData
of the SE object in non-immunotherapy datasets. Annotation data are available in BHKLab-Pachyderm's Annotation repository.