Module 1: Introduction to pharmacogenomics
Jermiah J. Joseph
Princess Margaret Cancer Centrejermiah.joseph@uhn.ca
Julia Nguyen
Princess Margaret Cancer Centrejulia.nguyen@uhn.ca
Nikta Feizi
Princess Margaret Cancer Centrenikta.feizi@uhn.ca
18 October 2024
Source:vignettes/Module1.Rmd
Module1.Rmd
Lab 1 Overview
Instructor(s) name(s) and contact information
- Jermiah Joseph jermiah.joseph@uhn.ca
- Nikta Feizi nikta.feizi@uhn.ca
- Julia Nguyen julia.nguyen@uhn.ca
Lab Description
Learning goals
- Understand the data structure of a PharmacoSet
- Learn how to access features and metadata from a PharmacoSet
- Learn how to design linear multivariate predictors
- Learn how to filter out outliers and missing values
Learning objectives
- Describe the use cases for PharmacoGx in Pharmacogenomics
- Understand the structure of the
CoreSet
andPharmacoSet
classes to facilitate their use in subsequent analyses - Download/load a
PharmacoSet
using PharmacoGx or orcestra.ca - Subset and filter a
PharmacoSet
by samples and/or treatments - Access the molecular features, dose-response, and metadata contained
within the
PharmacoSet
Getting Started
Exploring preclinical datasets for pharmacogenomic analysis
*See list of available subsetted datasets from Reference
Molecular profiles
We will start with RNA-Seq data as a simple example.
data(GDSC_rnaseq)
GDSC_rnaseq |> head()
#> model_id model_name data_source gene_id gene_symbol read_count fpkm
#> 1 SIDM00794 A388 sanger SIDG00082 ABCC6 10 0.01
#> 2 SIDM00794 A388 sanger SIDG00106 ABCF3 20264 25.95
#> 3 SIDM00794 A388 sanger SIDG00108 ABCG2 1070 1.47
#> 4 SIDM00794 A388 sanger SIDG00148 ABI3 6 0.02
#> 5 SIDM00794 A388 sanger SIDG00177 ACADSB 1410 1.50
#> 6 SIDM00794 A388 sanger SIDG00198 ACER1 12 0.06
Few key things to notice here: there are identifiers for the sample
(model_id
, model_name
), identifiers for the
gene (gene_id
, gene_symbol
), as well as two
expression values (read_count
, fpkm
).
When we create our expression matrices, we will select one sample identifier, one feature (gene) identifier, and one expression value.
Metadata / annotation files
When preparing for pharmacogenomic analysis, it is ideal to have accompanying metadata for both the samples (cell lines) and the features (genes).
We have made this data available through the package as well. We’ll start with the gene annotations:
data(GDSC_gene_identifiers)
GDSC_gene_identifiers |> head()
#> gene_id cosmic_gene_symbol ensembl_gene_id entrez_id hgnc_id hgnc_symbol
#> 1 SIDG00001 A1BG ENSG00000121410 1 HGNC:5 A1BG
#> 2 SIDG00002 ENSG00000268895 503538 HGNC:37133 A1BG-AS1
#> 3 SIDG00003 A1CF ENSG00000148584 29974 HGNC:24086 A1CF
#> 4 SIDG00004 A2M ENSG00000175899 2 HGNC:7 A2M
#> 5 SIDG00005 ENSG00000245105 144571 HGNC:27057 A2M-AS1
#> 6 SIDG00006 A2ML1 ENSG00000166535 144568 HGNC:23336 A2ML1
#> refseq_id uniprot_id
#> 1 NM_130786 P04217
#> 2 NR_015380
#> 3 NM_014576 Q9NQ94
#> 4 NM_000014 P01023
#> 5 NR_026971
#> 6 NM_144670 A8K2U0
The data above has been provided by GDSC and enables mapping across various gene annotations. It is important to identify which gene annotation maps to the RNA-Seq data and to check for completeness.
GDSC_rnaseq$gene_id %in% GDSC_gene_identifiers$gene_id |> table()
#>
#> TRUE
#> 135000
GDSC_rnaseq$gene_symbol %in% GDSC_gene_identifiers$hgnc_symbol |> table()
#>
#> FALSE TRUE
#> 2100 132900
GDSC_rnaseq$gene_symbol %in% GDSC_gene_identifiers$cosmic_gene_symbol |> table()
#>
#> FALSE TRUE
#> 69600 65400
We can see that gene_id
maps completely to the genes in
our RNA-seq data, whereas hgnc_symbol
and
cosmic_gene_symbol
are missing gene symbols (see the
numbers under FALSE
). This is a pretty obvious indicator to
move forward with the gene_id
for downstream analysis.
Now we move over to the cell line annotations. There are a few attributes made available, we first look to confirm the mapping id to the RNA-Seq data.
data(GDSC_model_list)
print(colnames(GDSC_model_list)[1:39])
#> [1] "model_id" "sample_id"
#> [3] "patient_id" "parent_id"
#> [5] "model_name" "synonyms"
#> [7] "tissue" "cancer_type"
#> [9] "cancer_type_ncit_id" "tissue_status"
#> [11] "sample_site" "cancer_type_detail"
#> [13] "model_type" "growth_properties"
#> [15] "model_treatment" "sampling_day"
#> [17] "sampling_month" "sampling_year"
#> [19] "doi" "pmed"
#> [21] "msi_status" "ploidy_snp6"
#> [23] "ploidy_wes" "ploidy_wgs"
#> [25] "mutational_burden" "model_comments"
#> [27] "model_relations_comment" "COSMIC_ID"
#> [29] "BROAD_ID" "CCLE_ID"
#> [31] "RRID" "HCMI"
#> [33] "suppliers" "supplier"
#> [35] "cat_number" "species"
#> [37] "gender" "ethnicity"
#> [39] "age_at_sampling"
GDSC_rnaseq$model_id %in% GDSC_model_list$model_id |> table()
#>
#> TRUE
#> 135000
Below are some examples of other available variables that may be of interest for downstream analysis.
GDSC_model_list[, c("model_id", "model_name", "tissue", "ploidy_wes", "mutational_burden", "gender", "ethnicity")] |> head()
#> model_id model_name tissue ploidy_wes mutational_burden
#> 1 SIDM01774 PK-59 Pancreas 3.510751 24.79
#> 2 SIDM00192 SNU-1033 Large Intestine 2.780367 23.29
#> 3 SIDM01447 SNU-466 Central Nervous System 2.054101 20.58
#> 4 SIDM01554 IST-MES-2 Lung 1.851007 22.92
#> 5 SIDM01689 MUTZ-5 Haematopoietic and Lymphoid 1.941110 28.76
#> 6 SIDM01460 TM-31 Central Nervous System 2.885529 25.89
#> gender ethnicity
#> 1 Unknown Unknown
#> 2 Female East Asian
#> 3 Male Unknown
#> 4 Male White
#> 5 Male Unknown
#> 6 Female East Asian
Drug response data
Finally, we’ll load in the corresponding drug response data for these cell lines.
data(GDSC_drug_response)
GDSC_drug_response |> head()
#> DATASET NLME_RESULT_ID NLME_CURVE_ID COSMIC_ID CELL_LINE_NAME SANGER_MODEL_ID
#> 1 GDSC2 343 15946320 683667 PFSK-1 SIDM01132
#> 2 GDSC2 343 15946560 684052 A673 SIDM00848
#> 3 GDSC2 343 15946840 684057 ES5 SIDM00263
#> 4 GDSC2 343 15947099 684059 ES7 SIDM00269
#> 5 GDSC2 343 15947381 684062 EW-11 SIDM00203
#> 6 GDSC2 343 15947663 684072 SK-ES-1 SIDM01111
#> TCGA_DESC DRUG_ID DRUG_NAME PUTATIVE_TARGET PATHWAY_NAME COMPANY_ID
#> 1 MB 1017 Olaparib PARP1, PARP2 Genome integrity 1046
#> 2 UNCLASSIFIED 1017 Olaparib PARP1, PARP2 Genome integrity 1046
#> 3 UNCLASSIFIED 1017 Olaparib PARP1, PARP2 Genome integrity 1046
#> 4 UNCLASSIFIED 1017 Olaparib PARP1, PARP2 Genome integrity 1046
#> 5 UNCLASSIFIED 1017 Olaparib PARP1, PARP2 Genome integrity 1046
#> 6 UNCLASSIFIED 1017 Olaparib PARP1, PARP2 Genome integrity 1046
#> WEBRELEASE MIN_CONC MAX_CONC LN_IC50 AUC RMSE Z_SCORE
#> 1 Y 0.010005 10 4.488810 0.974081 0.072391 0.201882
#> 2 Y 0.010005 10 1.782152 0.842679 0.068257 -1.881795
#> 3 Y 0.010005 10 2.116072 0.869909 0.070087 -1.624732
#> 4 Y 0.010005 10 1.685857 0.834608 0.092726 -1.955925
#> 5 Y 0.010005 10 2.078938 0.844879 0.114103 -1.653318
#> 6 Y 0.010005 10 0.592900 0.727416 0.081839 -2.797320
unique(GDSC_rnaseq$model_id) %in% GDSC_drug_response$SANGER_MODEL_ID |> table()
#>
#> FALSE TRUE
#> 10 90
We can use SANGER_MODEL_ID
to map back to our RNA-Seq
data. DRUG_NAME
will be used as the identifier for the
treatment. We also have both the IC50 (LN_IC50
) and the AUC
(AUC
) values for each cell-drug pair.
Notice that some of the cell lines do not have drug response data. These will need to be filtered before downstream analysis.
Exploring other multi-omic profiles
We have prepared a variety of other molecular profiles from both GDSC and CCLE. We look through a few more examples below to better understand these data types.
Driver mutations
Load in the driver mutations data from GDSC:
data(GDSC_drivers)
GDSC_drivers |> head()
#> gene_id gene_symbol model_id protein_mutation rna_mutation
#> 1 SIDG27130 RGPD3 SIDM02101 p.N241fs*6 r.809_819delAAUCUUAUGCU
#> 2 SIDG08129 ESR1 SIDM02095 p.L15fs*69 r.412_416delACUGC
#> 3 SIDG03559 CBLC SIDM02090 p.Q419fs*81 r.1295_1296insc
#> 4 SIDG02114 BAP1 SIDM02090 p.C91Y r.402g>a
#> 5 SIDG02114 BAP1 SIDM02090 p.N78S r.363a>g
#> 6 SIDG36265 SPEN SIDM02089 p.R753fs*53 r.2618_2627delAGGAGGCUUU
#> cdna_mutation cancer_driver cancer_predisposition_variant
#> 1 c.721_731delAATCTTATGCT True False
#> 2 c.42_46delACTGC True False
#> 3 c.1253_1254insC True False
#> 4 c.272G>A True False
#> 5 c.233A>G True False
#> 6 c.2257_2266delAGGAGGCTTT True False
#> effect vaf coding source model_name
#> 1 frameshift 0.2319 True Sanger Mesobank_CellLine-53T
#> 2 frameshift 0.4259 True Sanger Mesobank_CellLine-26
#> 3 frameshift 0.5217 True Sanger Mesobank_CellLine-50T
#> 4 missense 0.5217 True Sanger Mesobank_CellLine-50T
#> 5 missense 0.4595 True Sanger Mesobank_CellLine-50T
#> 6 frameshift 0.6333 True Sanger Mesobank_CellLine-45
Notice that this data is not a continuous expression like the RNA-Seq. This data will have to be further processed before it can be used to predict response.
Methylation
Load in the methylation matrix from GDSC:
data(GDSC_methylation)
GDSC_methylation[1:5, 1:5]
#> X8359018054_R03C01 X8359018053_R04C02
#> chr1:1051178-1052445 0.3733729 0.4144962
#> chr1:109824313-109824526 0.4644322 0.5959816
#> chr1:109825710-109826207 0.1411910 0.1770613
#> chr1:110008962-110010124 0.4143231 0.5317602
#> chr1:110527248-110528026 0.3022454 0.1882153
#> X8221932075_R03C02 X8221924165_R04C02
#> chr1:1051178-1052445 0.2892333 0.3023764
#> chr1:109824313-109824526 0.5048080 0.4594520
#> chr1:109825710-109826207 0.1668371 0.1698714
#> chr1:110008962-110010124 0.5259936 0.4551888
#> chr1:110527248-110528026 0.2679353 0.2989812
#> X7970368131_R04C02
#> chr1:1051178-1052445 0.3880624
#> chr1:109824313-109824526 0.5114854
#> chr1:109825710-109826207 0.1706867
#> chr1:110008962-110010124 0.4740822
#> chr1:110527248-110528026 0.2603969
This data has already been processed into a matrix. Notice though that the sample names are not present, instead there is the array ID and position. We can use the provided annotation file to map back to the sample names in our model list.
data(GDSC_methylation_model_list)
GDSC_methylation_model_list |> head()
#> Sample_Name Sample_Well Sample_Plate Sample_Group Pool_ID Sentrix_ID
#> 1 HL-60 A06 SMET0001 NA NA 5684819030
#> 2 IGR-37 B06 SMET0001 NA NA 5684819030
#> 3 WM793B C07 SMET0001 NA NA 5723654013
#> 4 IGR39 C08 SMET0001 NA NA 5723654013
#> 5 SW-480 D09 SMET0001 NA NA 5723654015
#> 6 C32 E09 SMET0001 NA NA 5723654015
#> Sentrix_Position Investigator Project Tissue
#> 1 R05C01 Catia Moutinio <NA> HAEMATOPOIETIC AND LYMPHOID TISSUE
#> 2 R06C01 Javi Carmona <NA> SKIN
#> 3 R03C01 Javi Carmona <NA> SKIN
#> 4 R05C02 Javi Carmona <NA> SKIN
#> 5 R02C02 Javi Carmona <NA> LARGE INTESTINE
#> 6 R03C02 Catia Moutinio <NA> SKIN
#> Type EBV Cell_Line Wildtype Normal Coment Scan_Date
#> 1 ACUTE MYELOID LEUKEMIA No YES Yes No <NA> 2011-02-12
#> 2 MELANOMA No YES Yes No Metastasis 2011-02-12
#> 3 MELANOMA No YES Yes No Metastasis 2011-02-12
#> 4 MELANOMA No YES Yes No Primary 2011-02-12
#> 5 ADENOCARCINOMA No YES Yes No Primary 2011-02-12
#> 6 MELANOMA No YES Yes No <NA> 2011-02-12
#> GDSC1 GDSC2 cosmic_id
#> 1 blood acute_myeloid_leukaemia 905938
#> 2 skin melanoma 1240153
#> 3 skin melanoma 1299081
#> 4 skin melanoma 1298148
#> 5 <NA> <NA> NA
#> 6 skin melanoma 906830
GDSC_methylation_model_list$sampleid <- paste0(
"X", GDSC_methylation_model_list$Sentrix_ID,
"_", GDSC_methylation_model_list$Sentrix_Position
)
colnames(GDSC_methylation) %in% GDSC_methylation_model_list$sampleid |> table()
#>
#> TRUE
#> 100
colnames(GDSC_methylation) <- GDSC_methylation_model_list$Sample_Name[
match(colnames(GDSC_methylation), GDSC_methylation_model_list$sampleid)
]
GDSC_methylation[1:5, 1:5]
#> A673 RT4 8-MG-BA U-118-MG CHAGO-K-1
#> chr1:1051178-1052445 0.3733729 0.4144962 0.2892333 0.3023764 0.3880624
#> chr1:109824313-109824526 0.4644322 0.5959816 0.5048080 0.4594520 0.5114854
#> chr1:109825710-109826207 0.1411910 0.1770613 0.1668371 0.1698714 0.1706867
#> chr1:110008962-110010124 0.4143231 0.5317602 0.5259936 0.4551888 0.4740822
#> chr1:110527248-110528026 0.3022454 0.1882153 0.2679353 0.2989812 0.2603969
We have provided a few other subsetted datasets. A full list is available from Reference.
We encourage independent exploration of these datasets.
Creating expression matrices for pharmacogenomic analysis
To facilitate downstream pharmacogenomic analysis, we want to create an expression matrix such that:
- Features are the rows
- Samples are the columns
- Feature expression as the individual values
Below, we show a example of such matrix using dummy data.
dummy_data <- setNames(
as.data.frame(replicate(5, rnorm(5))),
paste0("Sample", 1:5)
)
rownames(dummy_data) <- paste0("Feature", 1:5)
dummy_data
#> Sample1 Sample2 Sample3 Sample4 Sample5
#> Feature1 -1.400043517 1.1484116 -0.5536994 -1.86301149 0.4681544
#> Feature2 0.255317055 -1.8218177 0.6289820 -0.52201251 0.3629513
#> Feature3 -2.437263611 -0.2473253 2.0650249 -0.05260191 -1.3045435
#> Feature4 -0.005571287 -0.2441996 -1.6309894 0.54299634 0.7377763
#> Feature5 0.621552721 -0.2827054 0.5124269 -0.91407483 1.8885049
Let’s revisit the RNA-Seq example. The data is currently in a long format (i.e. there is one row for each sample-feature observation).
GDSC_rnaseq |> head()
#> model_id model_name data_source gene_id gene_symbol read_count fpkm
#> 1 SIDM00794 A388 sanger SIDG00082 ABCC6 10 0.01
#> 2 SIDM00794 A388 sanger SIDG00106 ABCF3 20264 25.95
#> 3 SIDM00794 A388 sanger SIDG00108 ABCG2 1070 1.47
#> 4 SIDM00794 A388 sanger SIDG00148 ABI3 6 0.02
#> 5 SIDM00794 A388 sanger SIDG00177 ACADSB 1410 1.50
#> 6 SIDM00794 A388 sanger SIDG00198 ACER1 12 0.06
GDSC_rnaseq |> dim()
#> [1] 135000 7
# number of cell line samples
length(unique(GDSC_rnaseq$model_id))
#> [1] 100
# number of genes
length(unique(GDSC_rnaseq$gene_id))
#> [1] 1350
We want to convert this into a wide format such that each row is a gene, each column is a sample, and the values are the gene expression.
expr <- reshape2::dcast(GDSC_rnaseq, gene_id ~ model_name, value.var = "fpkm")
rownames(expr) <- expr$gene_id
expr$gene_id <- NULL
expr[1:5, 1:10]
#> A388 A427 BB65-RCC Becker BICR78 C-33-A Ca-Ski Ca9-22 CCK-81 CHL-1
#> SIDG00082 0.01 0.29 0.04 0.15 0.03 0.01 0.01 0.01 0.94 1.23
#> SIDG00106 25.95 16.45 8.19 13.88 14.04 10.99 16.42 9.04 9.37 14.86
#> SIDG00108 1.47 0.52 0.02 0.03 1.02 0.01 0.27 0.09 0.01 0.04
#> SIDG00148 0.02 0.02 0.35 0.01 0.00 0.02 0.00 0.10 0.00 0.00
#> SIDG00177 1.50 6.18 3.07 3.40 1.97 7.21 1.01 1.83 5.56 11.63
expr |> dim()
#> [1] 1350 100
Notice that we have the 1350 genes as the rows and the 100 cell lines as the columns.
Feature extraction techniques to define biomarkers
While using the continuous expression of single features is a convenient method for quantifying biomarkers, there are cases when other techniques are needed and/or are more appropriate.
Binarization
Recall that the driver mutations data was not presented as continuous numeric values. One method to prepare this data is to binarize the mutation status.
GDSC_drivers |> head()
#> gene_id gene_symbol model_id protein_mutation rna_mutation
#> 1 SIDG27130 RGPD3 SIDM02101 p.N241fs*6 r.809_819delAAUCUUAUGCU
#> 2 SIDG08129 ESR1 SIDM02095 p.L15fs*69 r.412_416delACUGC
#> 3 SIDG03559 CBLC SIDM02090 p.Q419fs*81 r.1295_1296insc
#> 4 SIDG02114 BAP1 SIDM02090 p.C91Y r.402g>a
#> 5 SIDG02114 BAP1 SIDM02090 p.N78S r.363a>g
#> 6 SIDG36265 SPEN SIDM02089 p.R753fs*53 r.2618_2627delAGGAGGCUUU
#> cdna_mutation cancer_driver cancer_predisposition_variant
#> 1 c.721_731delAATCTTATGCT True False
#> 2 c.42_46delACTGC True False
#> 3 c.1253_1254insC True False
#> 4 c.272G>A True False
#> 5 c.233A>G True False
#> 6 c.2257_2266delAGGAGGCTTT True False
#> effect vaf coding source model_name
#> 1 frameshift 0.2319 True Sanger Mesobank_CellLine-53T
#> 2 frameshift 0.4259 True Sanger Mesobank_CellLine-26
#> 3 frameshift 0.5217 True Sanger Mesobank_CellLine-50T
#> 4 missense 0.5217 True Sanger Mesobank_CellLine-50T
#> 5 missense 0.4595 True Sanger Mesobank_CellLine-50T
#> 6 frameshift 0.6333 True Sanger Mesobank_CellLine-45
Looking at the first row, we can see that there is a mutation on the
RGPD3
gene in the SIDM02101
cell line model.
We would represent such mutation events with 1
.
The code below again casts the long data frame into a wide format.
This time we specify an aggregate function length()
, which
returns the number of rows (mutation events) for each gene-cell line
pair. This was done by passing the option
fun.aggregate = length
.
expr <- reshape2::dcast(
GDSC_drivers,
gene_symbol ~ model_id,
value.var = "cdna_mutation",
fun.aggregate = length
)
rownames(expr) <- expr$gene_symbol
expr$gene_symbol <- NULL
expr["RGPD3", "SIDM02101"]
#> [1] 1
expr[1:5, 1:10]
#> SIDM00001 SIDM00002 SIDM00003 SIDM00006 SIDM00007 SIDM00008 SIDM00009
#> ABCB1 0 0 0 0 0 0 0
#> ABI1 0 0 0 0 0 0 0
#> ABL1 0 0 0 0 0 0 0
#> ABL2 0 0 0 0 0 0 0
#> ACVR1 0 0 0 0 0 0 0
#> SIDM00011 SIDM00013 SIDM00014
#> ABCB1 0 0 0
#> ABI1 0 0 0
#> ABL1 0 0 0
#> ABL2 0 1 0
#> ACVR1 0 0 0
There was one mutation event on the RGPD3
gene in the
SIDM02101
cell line model, hence the value of this
combination is 1
.
Mutation events are relatively sparse, so we see 0
for
the majority of the matrix.
Signature extraction
There are cases when individual features have low predictive power, but when combined become much more informative of drug response.
Let’s revisit our methylation data. Recall that each row is a CpG site. There are 1000 CpG sites.
GDSC_methylation[1:5, 1:5]
#> A673 RT4 8-MG-BA U-118-MG CHAGO-K-1
#> chr1:1051178-1052445 0.3733729 0.4144962 0.2892333 0.3023764 0.3880624
#> chr1:109824313-109824526 0.4644322 0.5959816 0.5048080 0.4594520 0.5114854
#> chr1:109825710-109826207 0.1411910 0.1770613 0.1668371 0.1698714 0.1706867
#> chr1:110008962-110010124 0.4143231 0.5317602 0.5259936 0.4551888 0.4740822
#> chr1:110527248-110528026 0.3022454 0.1882153 0.2679353 0.2989812 0.2603969
GDSC_methylation |> dim()
#> [1] 1000 100
Signatures refer to combinations of features that form some pattern with biological relevance. For example, you may choose to define a signature X to represent CpG sites located on promoters of genes involved in pathway Y.
For simplicity, we can define some arbituary signatures from our CpG sites.
set.seed(123)
signatures <- data.frame(
CpG = rownames(GDSC_methylation),
Signature = sample(gl(10, 100, length = 1000))
)
signatures |> head()
#> CpG Signature
#> 1 chr1:1051178-1052445 5
#> 2 chr1:109824313-109824526 5
#> 3 chr1:109825710-109826207 2
#> 4 chr1:110008962-110010124 6
#> 5 chr1:110527248-110528026 2
#> 6 chr1:114354373-114355300 10
Each of the 1000 CpG sites was randomly assigned to one of 10 signatures.
Next we want to quantify the signature for each cell line. Again, for simplicity, we will sum the beta values across each CpG for each signature.
sScores <- data.frame(matrix(NA, nrow = 0, ncol = 100))
for (s in c(1:10)) {
# get CpGs within each signature
cpgs <- signatures[signatures$Signature == s, ]$CpG
mSig <- GDSC_methylation[rownames(GDSC_methylation) %in% cpgs, ]
# compute sum of beta values for each cell line
sSum <- colSums(mSig)
sScores <- rbind(sScores, sSum)
}
rownames(sScores) <- paste0("Signature", 1:10)
colnames(sScores) <- colnames(GDSC_methylation)
sScores[1:10, 1:5]
#> A673 RT4 8-MG-BA U-118-MG CHAGO-K-1
#> Signature1 35.73567 41.50092 30.31945 35.03195 40.93194
#> Signature2 36.54479 41.78597 34.23020 37.28902 42.43054
#> Signature3 35.08198 39.01984 32.67874 35.14082 39.57340
#> Signature4 37.26978 42.31938 35.94349 38.44868 41.68582
#> Signature5 37.06404 42.90746 32.04316 37.13874 38.69905
#> Signature6 35.32663 40.04647 31.79493 34.86532 39.63569
#> Signature7 36.46212 40.74357 33.85272 36.77940 40.16127
#> Signature8 35.69553 40.46161 32.70537 36.50410 39.12737
#> Signature9 34.37157 38.31962 31.18276 34.40533 39.80003
#> Signature10 40.49063 42.52888 33.00193 39.79155 39.62519
We now have a new expression matrix, this time of the 10 defined signatures for each of our cell lines.