Design Documentation 1.0 Help

Molecular Profile Annotations

TODO::Convert this to an .Rmd file and add examples

Standard @metadata slot annotations each Experiment in a MultiAssayExperiment object.

The MolecularProfiles field in the DNL JSON schema should be a list of dictionaries, each containing metadata about a single Experiment in the MultiAssayExperiment.

The schema for each dictionary is as follows:

"MolecularProfiles": [ { "annotation": string, # one of "rnaseq", "rna", "cnv", "mut" "datatype" : string, # Any subtype of annotation, e.g. "fpkm", "tpm" "class" : string, # one off "SummarizedExperiment", "RangedSummarizedExperiment", "Matrix", "BumpyMatrix", "ExpressionSet", etc.], "filename" : string, # The exact file name used to load this data, i.e. "rnaseq_fpkm_20220624.csv" "data_source": { "description" : string, # A sentence or two describing the data & its source "tool" : string, # Tool used to generate the data, i.e "STAR", "PureCN", "Kallisto" "url" : string, # Exact URL used to access & possibly download the data, i.e. "https://cog.sanger.ac.uk/cmp/download/rnaseq_all_20220624.zip" }, "date" : string, # Date the data was generated, i.e. "2024-02-26" "numSamples" : integer, # Number of samples in the data, ex: 949 "numGenes" : integer, # Number of genes in the data, ex: 20000 }, ... ]

Example adding metadata to a Summarized Experiment

If you already have a SummarizedExperiment object, you can add metadata to it like this:

# assume you have a Summarized Experiment: show(genes_tpm_SE) # class: RangedSummarizedExperiment # dim: 3 2 # metadata(0): # assays(1): exprs # rownames(3): gene1 gene2 gene3 # rowData names(1): gene_id # colnames(2): sample1 sample2 # colData names(2): sampleid batchid genes_tpm_SE@metadata <- list( annotation = "rnaseq", datatype = "genes_tpm", class = "RangedSummarizedExperiment", filename = "rnaseq_fpkm_20220624.csv", data_source = list( # In this example, the source file url is a zip of multiple files, including the one above description = "Read counts, TPM & FPKM-values for all sequenced models including cell lines and organoids.", tool = "STAR v2.7.9a", url = "https://cog.sanger.ac.uk/cmp/download/rnaseq_all_20220624.zip" ), date = Sys.Date(), numSamples = ncol(genes_tpm_SE), numGenes = nrow(genes_tpm_SE) ) show(genes_tpm_SE@metadata) $annotation [1] "rnaseq" $datatype [1] "genes_tpm" $class [1] "RangedSummarizedExperiment" $filename [1] "rnaseq_fpkm_20220624.csv" $data_source $data_source$description [1] "Read counts, TPM & FPKM-values for all sequenced models including cell lines and organoids." $data_source$tool [1] "STAR v2.7.9a" $data_source$url [1] "https://cog.sanger.ac.uk/cmp/download/rnaseq_all_20220624.zip" $date [1] "2024-04-12" $numSamples [1] 2
Last modified: 28 May 2024