Annotating CTRP Treatments
CTRP-Treatment-Annotation.Rmd
Introduction
This vignette compares annotating CTRP-provided treatment ids to PubChem CIDs and CTD information.
Whereas the PubChem CID is a unique identifier for a compound, the PubChem API does not easily map treatment names to CIDs, atleast not in a way that easy for commonly misnamed treatments. Specifically, for the CTRP treatment names (n=545), the PubChem API does not correctly map all of them to PubChem CIDs.
The CTD2 database is the central database where CTRP data is hosted. They happen to expose (an API)[https://ctd2-dashboard.nci.nih.gov/dashboard/#api-documentation] for their database.
Developer Note: The API calls they describe on their API
documentation is useful, but they have an endpoint:
GET /compound/{compoundId}
that is not documented. This
endpoint is useful for mapping compound names in the way their data (i.e
CTRP) names them to PubChem CIDs.
The functionality for this is implemented in the
mapCompound2CTD
function.
It is an investigation to see which of the methods might map more compounds
library(AnnotationGx)
data(CTRP_treatmentMetadata)
# get a random row from the CTRP_treatmentMetadata
treatment <- CTRP_treatmentMetadata[1, CTRP.treatmentid]
sprintf("CTRP treatment id : %s", treatment)
#> [1] "CTRP treatment id : CIL55"
# map the treatment to a CID using the CTD database
mapCompound2CTD(treatment)[, .(displayName, PUBCHEM)]
#> displayName PUBCHEM
#> <char> <char>
#> 1: CIL55 6623618
# map the treatment to a CID using PubChem
mapCompound2CID(treatment)
#> name cids
#> <char> <int>
#> 1: CIL55 6623618
Annotating using the CTD database
result <- CTRP_treatmentMetadata[, mapCompound2CTD(CTRP.treatmentid, query_only = F, raw = F)]
#> Iterating ■■ 3% | ETA: 29s
#> Iterating ■■■■■■■■■■ 30% | ETA: 4s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100% | ETA: 0s
show(result)
#> displayName BROAD_COMPOUND CTRP ID CTRP NAME
#> <char> <char> <char> <char>
#> 1: CIL55 1788 1788 CIL55
#> 2: BRD4132 3588 3588 BRD4132
#> 3: BRD6340 12877 12877 BRD6340
#> 4: ML006 17712 17712 ML006
#> 5: Bax channel blocker 18311 18311 Bax channel blocker
#> ---
#> 541: avicin D 688975 688975 avicin D
#> 542: BRD9876:MK-1775 (4:1 mol/mol) <NA> <NA> <NA>
#> 543: BRD-K30748066 689506 689506 BRD-K30748066
#> 544: linsitinib 705300 705300 linsitinib
#> 545: AT-406 710154 710154 AT-406
#> DepMap compound IMAGE PUBCHEM CAS DRUG BANK
#> <char> <char> <char> <char> <char>
#> 1: CIL55 struct_1788.png 6623618 <NA> <NA>
#> 2: BRD4132 struct_3588.png 7326481 <NA> <NA>
#> 3: BRD6340 struct_12877.png 1641662 <NA> <NA>
#> 4: ML006 struct_17712.png 2842253 <NA> <NA>
#> 5: BAX-channel-blocker struct_18311.png 2729027 <NA> <NA>
#> ---
#> 541: avicin D struct_688975.png 73707595 <NA> <NA>
#> 542: <NA> <NA> <NA> <NA> <NA>
#> 543: BRD-K30748066 struct_689506.png 11257553 <NA> <NA>
#> 544: linsitinib struct_705300.png 11640390 867160-71-2 DB06075
#> 545: AT-406 struct_710154.png 25022340 <NA> <NA>
message("Failed results: ", result[is.na(result$PUBCHEM), .N])
#> Failed results: 92
failed_names <- result[is.na(result$PUBCHEM),displayName]
Annotating using PubChem
(compounds_to_cids <-
CTRP_treatmentMetadata[,
AnnotationGx::mapCompound2CID(
names = CTRP.treatmentid,
first = TRUE
)
]
)
failed <-
attributes(compounds_to_cids)$failed |>
names()
failed <- unique(CTRP_treatmentMetadata[CTRP.treatmentid %in% failed, ])
failed[, CTRP.treatmentid_CLEANED := cleanCharacterStrings(CTRP.treatmentid)]
(failed_to_cids <-
failed[,
AnnotationGx::mapCompound2CID(
names = CTRP.treatmentid_CLEANED,
first = TRUE
)
]
)
failed_again <-
attributes(failed_to_cids)$failed |>
names()
failed_dt <- merge(failed_to_cids[!is.na(cids),], failed, by.x = "name", by.y = "CTRP.treatmentid_CLEANED", all.x = F)
failed_dt$name <- NULL
successful_dt <- merge(CTRP_treatmentMetadata, compounds_to_cids[!is.na(cids),],by.x = "CTRP.treatmentid", by.y = "name", all.x = F)
mapped_PubChem <- data.table::rbindlist(list(successful_dt, failed_dt), use.names = T, fill = T)