Querying Unichem Database
Jermiah Joseph, Shahzada Muhammad Shameel Farooq, and Christopher Eeles
Unichem.Rmd
Introduction to the Unichem API
The UniChem database provides a publicly available REST API for
programmatic retrieval of mappings from standardized structural compound
identifiers to unique compound IDs across a range of large online
cheminformatic databases such as PubChem, ChEMBL, DrugBank and many
more. The service accepts POST requests to two different end-points:
/compound
and /connectivity
. Both endpoints
accept query parameters via the POST body in JSON format. The
/compound
API returns exact matches for the queried
compound, while the /connectivity
API uses layers of the
International Chemical Identifier (InChI) of the query compound to
return exact matches as well as structurally related compounds such as
isomers, salts, ionizations and more. [@UniChemBeta;
@chambersUniChemUnifiedChemical2013]
The functions in AnnotationGx
have been designed to
allow package users to easily query UniChem resources without any
pre-existing knowledge of HTTP requests or the API specifications. In
doing so we hope to provide an R native interface for mapping between
various cheminformatic databases, accessible to anyone familar with
using R functions!
Available Databases
To see a table of database identifiers available via UniChem, you can
call the getUniChemSources
function. By default, just the
database shortname (“Name”) and UniChem’s ID for it (“SourceID”) columns
are returned. To return all columns, pass the
all_columns = TRUE
argument
getUnichemSources()
#> Name SourceID
#> <char> <int>
#> 1: chembl 1
#> 2: drugbank 2
#> 3: pdb 3
#> 4: gtopdb 4
#> 5: pubchem_dotf 5
#> 6: kegg_ligand 6
#> 7: chebi 7
#> 8: nih_ncc 8
#> 9: zinc 9
#> 10: emolecules 10
#> 11: atlas 12
#> 12: fdasrs 14
#> 13: surechembl 15
#> 14: pharmgkb 17
#> 15: hmdb 18
#> 16: selleck 20
#> 17: pubchem_tpharma 21
#> 18: pubchem 22
#> 19: mcule 23
#> 20: nmrshiftdb2 24
#> 21: lincs 25
#> 22: actor 26
#> 23: recon 27
#> 24: molport 28
#> 25: nikkaji 29
#> 26: bindingdb 31
#> 27: comptox 32
#> 28: lipidmaps 33
#> 29: drugcentral 34
#> 30: carotenoiddb 35
#> 31: metabolights 36
#> 32: brenda 37
#> 33: rhea 38
#> 34: chemicalbook 39
#> 35: swisslipids 41
#> 36: dailymed 45
#> 37: clinicaltrials 46
#> 38: rxnorm 47
#> 39: MedChemExpress 48
#> 40: probes_and_drugs 49
#> Name SourceID
When mapping using the queryUnichemCompound
function,
these are the sources that can be used from, and the databases to which
the compound mappings will be returned.
Querying UniChem Compound API
The queryUnichemCompound
function allows you to query
the UniChem Compound API to retrieve mappings for a given compound
identifier. The function takes two mandatory arguments. The first is the
compound
argument which is the compound identifier to be
queried. The second is the type
argument which is the type
of compound identifier to search for. Options are “uci”, “inchi”,
“inchikey”, and “sourceID”. The sourceID
argument is
optional and is only required if the type
argument is
“sourceID”.
The function returns a list of:
- “External_Mappings”
data.table
containing the mapping to other Databases with the following headings:- “compoundID”
character
The compound identifier - “Name”
character
The name of the database - “NameLong”
character
The long name of the database - “SourceID”
character
The UniChem Source ID - “sourceURL”
character
The URL of the source
- “compoundID”
- “UniChem_Mappings”
list
of the following six mappings:- “UCI”
character
The UniChem Identifier - “InchiKey”
character
The InChIKey - “Inchi”
character
The InChI - “formula”
character
The molecular formula - “connections”
character
connection representation “1-6(10)13-8-5-3-2-4-7(8)9(11)12” - “hAtoms”
character
hydrogen atom connections “2-5H,1H3,(H,11,12)”
- “UCI”
Example Searching using uci
(UniChem Identifier)
Note: This type of query requires you to know the UniChem Identifier for the compound.
queryUnichemCompound(compound = "161671", type = "uci")
#> $External_Mappings
#> compoundID Name
#> <char> <char>
#> 1: CHEMBL25 chembl
#> 2: DB00945 drugbank
#> 3: AIN pdb
#> 4: 4139 gtopdb
#> 5: 24714725 pubchem_dotf
#> 6: C01405 kegg_ligand
#> 7: 15365 chebi
#> 8: ZINC000000000053 zinc
#> 9: 474821 emolecules
#> 10: acetylsalicylic acid atlas
#> 11: aspirin atlas
#> 12: R16CO5Y76E fdasrs
#> 13: SCHEMBL1353 surechembl
#> 14: PA448497 pharmgkb
#> 15: HMDB0001879 hmdb
#> 16: aspirin-acetylsalicylic-acid selleck
#> 17: 15195166 pubchem_tpharma
#> 18: 2244 pubchem
#> 19: MCULE-3199019536 mcule
#> 20: 20038075 nmrshiftdb2
#> 21: LSM-5288 lincs
#> 22: 50-78-2 actor
#> 23: 11126-35-5 actor
#> 24: J2.300K nikkaji
#> 25: 22360 bindingdb
#> 26: DTXSID5020108 comptox
#> 27: 74 drugcentral
#> 28: 32748 brenda
#> 29: 4779 brenda
#> 30: 2261 brenda
#> 31: 3100 brenda
#> 32: 159662 brenda
#> 33: 6476 brenda
#> 34: CB5114818 chemicalbook
#> 35: CB4421683 chemicalbook
#> 36: ASPIRIN dailymed
#> 37: LYSINE ACETYLSALICYLATE clinicaltrials
#> 38: BAY1019036 clinicaltrials
#> 39: MEASURIN clinicaltrials
#> 40: VENOPIRIN clinicaltrials
#> 41: ASPIRIN DL-LYSINE clinicaltrials
#> 42: ASPIRIN clinicaltrials
#> 43: DURLAZA clinicaltrials
#> 44: ACETYLSALICYLIC ACID clinicaltrials
#> 45: ACETYL SALICYLATE rxnorm
#> 46: ASPIRIN rxnorm
#> 47: ECOTRIN rxnorm
#> 48: DURLAZA rxnorm
#> 49: HY-14654 MedChemExpress
#> 50: PD002467 probes_and_drugs
#> compoundID Name
#> NameLong sourceID
#> <char> <int>
#> 1: ChEMBL 1
#> 2: DrugBank 2
#> 3: PDBe (Protein Data Bank Europe) 3
#> 4: Guide to Pharmacology 4
#> 5: PubChem ('Drugs of the Future' subset) 5
#> 6: KEGG (Kyoto Encyclopedia of Genes and Genomes) Ligand 6
#> 7: ChEBI (Chemical Entities of Biological Interest). 7
#> 8: ZINC 9
#> 9: eMolecules 10
#> 10: Gene Expression Atlas 12
#> 11: Gene Expression Atlas 12
#> 12: FDA/USP Substance Registration System (SRS) 14
#> 13: SureChEMBL 15
#> 14: PharmGKB 17
#> 15: Human Metabolome Database (HMDB) 18
#> 16: Selleck 20
#> 17: PubChem ('Thomson Pharma' subset) 21
#> 18: PubChem Compounds 22
#> 19: Mcule 23
#> 20: NMRShiftDB 24
#> 21: Library of Integrated Network-based Cellular Signatures 25
#> 22: ACToR 26
#> 23: ACToR 26
#> 24: Nikkaji 29
#> 25: BindingDB 31
#> 26: EPA (Environmental Protection Agency) CompTox Dashboard 32
#> 27: DrugCentral 34
#> 28: Brenda 37
#> 29: Brenda 37
#> 30: Brenda 37
#> 31: Brenda 37
#> 32: Brenda 37
#> 33: Brenda 37
#> 34: ChemicalBook 39
#> 35: ChemicalBook 39
#> 36: DailyMed 45
#> 37: clinicaltrials 46
#> 38: clinicaltrials 46
#> 39: clinicaltrials 46
#> 40: clinicaltrials 46
#> 41: clinicaltrials 46
#> 42: clinicaltrials 46
#> 43: clinicaltrials 46
#> 44: clinicaltrials 46
#> 45: rxnorm 47
#> 46: rxnorm 47
#> 47: rxnorm 47
#> 48: rxnorm 47
#> 49: MedChemExpress 48
#> 50: Probes And Drugs 49
#> NameLong sourceID
#> sourceURL
#> <char>
#> 1: https://www.ebi.ac.uk/chembldb/compound/inspect/CHEMBL25
#> 2: http://www.drugbank.ca/drugs/DB00945
#> 3: http://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/AIN
#> 4: http://www.guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=4139
#> 5: http://pubchem.ncbi.nlm.nih.gov/substance/24714725
#> 6: http://www.genome.jp/dbget-bin/www_bget?C01405
#> 7: http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3A15365
#> 8: http://zinc15.docking.org/substances/ZINC000000000053
#> 9: https://www.emolecules.com/cgi-bin/more?vid=474821
#> 10: http://www.ebi.ac.uk/gxa/query?conditionQuery=acetylsalicylic acid
#> 11: http://www.ebi.ac.uk/gxa/query?conditionQuery=aspirin
#> 12: https://precision.fda.gov/uniisearch/srs/unii/R16CO5Y76E
#> 13: https://www.surechembl.org/chemical/SCHEMBL1353
#> 14: https://www.pharmgkb.org/drug/PA448497
#> 15: http://www.hmdb.ca/metabolites/HMDB0001879
#> 16: http://www.selleckchem.com/products/aspirin-acetylsalicylic-acid.html
#> 17: http://pubchem.ncbi.nlm.nih.gov/substance/15195166
#> 18: http://pubchem.ncbi.nlm.nih.gov/compound/2244
#> 19: https://mcule.com/MCULE-3199019536
#> 20: http://nmrshiftdb.org/molecule/20038075
#> 21: http://identifiers.org/lincs.smallmolecule/LSM-5288
#> 22: http://actor.epa.gov/actor/chemical.xhtml?casrn=50-78-2
#> 23: http://actor.epa.gov/actor/chemical.xhtml?casrn=11126-35-5
#> 24: http://jglobal.jst.go.jp/en/redirect?Nikkaji_No=J2.300K
#> 25: http://www.bindingdb.org/bind/chemsearch/marvin/MolStructure.jsp?monomerid=22360
#> 26: https://comptox.epa.gov/dashboard/DTXSID5020108
#> 27: http://drugcentral.org/drugcard/74
#> 28: https://www.brenda-enzymes.org/ligand.php?brenda_ligand_id=32748
#> 29: https://www.brenda-enzymes.org/ligand.php?brenda_ligand_id=4779
#> 30: https://www.brenda-enzymes.org/ligand.php?brenda_ligand_id=2261
#> 31: https://www.brenda-enzymes.org/ligand.php?brenda_ligand_id=3100
#> 32: https://www.brenda-enzymes.org/ligand.php?brenda_ligand_id=159662
#> 33: https://www.brenda-enzymes.org/ligand.php?brenda_ligand_id=6476
#> 34: https://www.chemicalbook.com/ChemicalProductProperty_EN_CB5114818
#> 35: https://www.chemicalbook.com/ChemicalProductProperty_EN_CB4421683
#> 36: https://dailymed.nlm.nih.gov/dailymed/search.cfm?adv=1&labeltype=human&query=ACTIVEMOIETY:(ASPIRIN)+OR+INGREDIENT:(ASPIRIN)
#> 37: https://www.clinicaltrials.gov/ct2/results?cond=&term=LYSINE ACETYLSALICYLATE
#> 38: https://www.clinicaltrials.gov/ct2/results?cond=&term=BAY1019036
#> 39: https://www.clinicaltrials.gov/ct2/results?cond=&term=MEASURIN
#> 40: https://www.clinicaltrials.gov/ct2/results?cond=&term=VENOPIRIN
#> 41: https://www.clinicaltrials.gov/ct2/results?cond=&term=ASPIRIN DL-LYSINE
#> 42: https://www.clinicaltrials.gov/ct2/results?cond=&term=ASPIRIN
#> 43: https://www.clinicaltrials.gov/ct2/results?cond=&term=DURLAZA
#> 44: https://www.clinicaltrials.gov/ct2/results?cond=&term=ACETYLSALICYLIC ACID
#> 45: https://mor.nlm.nih.gov/RxNav/search?searchBy=RXCUI&searchTerm=91101
#> 46: https://mor.nlm.nih.gov/RxNav/search?searchBy=RXCUI&searchTerm=1191
#> 47: https://mor.nlm.nih.gov/RxNav/search?searchBy=RXCUI&searchTerm=202554
#> 48: https://mor.nlm.nih.gov/RxNav/search?searchBy=RXCUI&searchTerm=1665357
#> 49: https://www.medchemexpress.com/HY-14654
#> 50: https://www.probes-drugs.org/compound/PD002467
#> sourceURL
#>
#> $UniChem_Mappings
#> $UniChem_Mappings$UniChem.UCI
#> [1] 161671
#>
#> $UniChem_Mappings$UniChem.InchiKey
#> [1] "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"
#>
#> $UniChem_Mappings$UniChem.Inchi
#> [1] "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"
#>
#> $UniChem_Mappings$UniChem.formula
#> [1] "C9H8O4"
#>
#> $UniChem_Mappings$UniChem.connections
#> [1] "1-6(10)13-8-5-3-2-4-7(8)9(11)12"
#>
#> $UniChem_Mappings$UniChem.hAtoms
#> [1] "2-5H,1H3,(H,11,12)"