Navigating PSets#
When you first join the BHKLab Pharmacogenomics (PGx) team, you will likely need to familiarize yourself with our R package PharmacoGx
. The fundamental object of PharmacoGx
is the PSet, which is a data structured specifically designed to handle the inputs, results, and meta-data surrounding a cell line screen.
PSets can be a bit counter-intuitive at first and so this page provides a rough guide on how to manipulate them. At present we only have tools in R
to navigate them.
Before we get into the basics, it is probably worth addressing a natural question: why use PSets? The answer is that pharmacogenomics data is messy. Research into links between small molecules and compound effect is conducted in many instutitions, each of which has their own idiosyncratic approach to handling and annotating data. PSets allow for standardized nomenclature across datasets, meaning each PSet will ahve the same column names inside its dataframes, a common set of identifiers for drugs and for cell lines, and quality-controlled response measurements.
This page only covers the basics. Once you've reviewed these snippets you can look at the detailed vignettes here.
Finding PSets#
In order to find PSets of interest, you can look at PharmacoDB or ORCESTRA and manually download them. In addition PharmacoGx has a the functions availablePSets()
and downloadPSet()
functions which can be used to download data. An example call is:
downloadPSet(
name = "CCLE_2015",
saveDir = "../psets", # change this directory as you see fit
timeout = 3600,
verbose = TRUE
)
Loading PSets#
The PSets will be downloaded as a .RDS
file and so will need to be loaded via a call to readRDS
. If your PSet is stored in PSet.file.path
then the following script will load it:
library(PharmacoGx)
my.PSet <- readRDS(PSet.file.path)
my.PSet <- updateObject(ps) # update to the latest version
Accessing Drug Info#
To get information about the drugs used in the PSet you can use the drugInfo
function to access information about the compounds used in the screen.
If you wanted to get just the compound names and their representations as molecular SMILES strings, the following will do the trick:
Accessing Cell Line Info#
This is done via the sampleInfo
function:
Accessing Treatment Response Info#
To get treatment response information we need the treatmentResponse
function. This returns a list with several items. To get the data of treament response the following will work: