Combining OmniPath annotations and networks¶

Marton Olbei¶

09/12/2020¶

In this tutorial we show you how to query interactions from one of the many resources included in OmniPath, we map additional annotations to the data, and combine the two to build a small tissue specific network out of them

We'll start by importing libraries, first omnipath, and pandas for data wrangling.

import omnipath as op
import pandas as pd

With the omnipath package loaded, we can download a subset of the interaction data. To take a look at the available databases, run:

op.interactions.AllInteractions.resources()

('ABS',
 'ACSN',
 'ACSN_SignaLink3',
 'ARACNe-GTEx_DoRothEA',
 'ARN',
 'Adhesome',
 'AlzPathway',
 'BEL-Large-Corpus_ProtMapper',
 'Baccin2019',
 'BioGRID',
 'BioGRID_ICELLNET',
 'CA1',
 'CancerCellMap',
 'CellPhoneDB',
 'CellPhoneDB_ICELLNET',
 'DEPOD',
 'DIP',
 'DOMINO',
 'DeathDomain',
 'Dinarello2013_ICELLNET',
 'DoRothEA',
 'DoRothEA-reviews_DoRothEA',
 'ELM',
 'EMBRACE',
 'ENCODE-distal',
 'ENCODE-proximal',
 'ENCODE_tf-mirna',
 'FANTOM4_DoRothEA',
 'Fantom5_LRdb',
 'GO-lig-rec_ICELLNET',
 'Guide2Pharma',
 'Guide2Pharma_CellPhoneDB',
 'Guide2Pharma_ICELLNET',
 'Guide2Pharma_LRdb',
 'HOCOMOCO_DoRothEA',
 'HPMR',
 'HPMR_ICELLNET',
 'HPMR_LRdb',
 'HPRD',
 'HPRD-phos',
 'HPRD_KEA',
 'HPRD_LRdb',
 'HPRD_MIMP',
 'HTRIdb',
 'HTRIdb_DoRothEA',
 'HuRI',
 'I2D_CellPhoneDB',
 'ICELLNET',
 'IMEx_CellPhoneDB',
 'InnateDB',
 'InnateDB-All_CellPhoneDB',
 'InnateDB_CellPhoneDB',
 'InnateDB_ICELLNET',
 'InnateDB_SignaLink3',
 'IntAct',
 'IntAct_CellPhoneDB',
 'IntAct_DoRothEA',
 'JASPAR_DoRothEA',
 'KEA',
 'KEGG-MEDICUS',
 'Kinexus_KEA',
 'Kirouac2010',
 'Kirouac2010_ICELLNET',
 'LMPID',
 'LRdb',
 'Li2012',
 'Lit-BM-17',
 'LncRNADisease',
 'MIMP',
 'MINT_CellPhoneDB',
 'MPPI',
 'Macrophage',
 'Macrophage_ICELLNET',
 'MatrixDB',
 'MatrixDB_CellPhoneDB',
 'NCI-PID_ProtMapper',
 'NFIRegulomeDB_DoRothEA',
 'NRF2ome',
 'NetPath',
 'NetworKIN_KEA',
 'ORegAnno',
 'ORegAnno_DoRothEA',
 'PAZAR',
 'PAZAR_DoRothEA',
 'PhosphoNetworks',
 'PhosphoPoint',
 'PhosphoSite',
 'PhosphoSite_KEA',
 'PhosphoSite_MIMP',
 'PhosphoSite_ProtMapper',
 'PhosphoSite_noref',
 'ProtMapper',
 'REACH_ProtMapper',
 'RLIMS-P_ProtMapper',
 'Ramilowski2015',
 'Ramilowski2015_Baccin2019',
 'Ramilowski2015_ICELLNET',
 'ReMap_DoRothEA',
 'Reactome_ICELLNET',
 'Reactome_LRdb',
 'Reactome_ProtMapper',
 'Reactome_SignaLink3',
 'RegNetwork_DoRothEA',
 'SIGNOR',
 'SIGNOR_ICELLNET',
 'SIGNOR_ProtMapper',
 'SPIKE',
 'SPIKE_ICELLNET',
 'STRING_ICELLNET',
 'SignaLink3',
 'SignaLink3_ICELLNET',
 'Sparser_ProtMapper',
 'TCRcuration_SignaLink3',
 'TFactS_DoRothEA',
 'TFe_DoRothEA',
 'TRED_DoRothEA',
 'TRIP',
 'TRRD_DoRothEA',
 'TRRUST_DoRothEA',
 'TransmiR',
 'UniProt_CellPhoneDB',
 'UniProt_LRdb',
 'Wang',
 'dbPTM',
 'iPTMnet',
 'iTALK',
 'lncrnadb',
 'miR2Disease',
 'miRDeathDB',
 'miRTarBase',
 'miRecords',
 'ncRDeathDB',
 'phosphoELM',
 'phosphoELM_KEA',
 'phosphoELM_MIMP')

To query the interactions from one of these sources, we can use the interactions.AllInteractions function.

interactions = op.interactions.AllInteractions.get()

Let's get interactions coming from the SIGNOR database. It is important to mention here, that the omnipath library queries can be replicated in browser too, as they access specific URLs, depending on the parameters we give here. Feel free to give it a go: https://omnipathdb.org/interactions?genesymbols=yes&resources=SIGNOR&datasets=dorothea,kinaseextra,ligrecextra,lncrna_mrna,mirnatarget,omnipath,pathwayextra,tf_mirna,tf_target,tfregulons&dorothea_levels=A,B&fields=sources,references,curation_effort,dorothea_level&license=academic

interactions_filtered = interactions[interactions.sources.isin(['SIGNOR'])]
interactions_filtered

Importing annotations¶

To give these interactions a bit more depth, we can map annotation data to the interactions. To take a look at the available annotation resources in OmniPath, call the requests.Annotations function by running the line below.

op.requests.Annotations.resources()

('Adhesome',
 'Almen2009',
 'Baccin2019',
 'CORUM_Funcat',
 'CORUM_GO',
 'CSPA',
 'CSPA_celltype',
 'CancerGeneCensus',
 'CancerSEA',
 'CellCellInteractions',
 'CellPhoneDB',
 'CellPhoneDB_complex',
 'ComPPI',
 'DGIdb',
 'DisGeNet',
 'EMBRACE',
 'Exocarta',
 'GO_Intercell',
 'GPCRdb',
 'Guide2Pharma',
 'HGNC',
 'HPA_secretome',
 'HPA_subcellular',
 'HPA_tissue',
 'HPMR',
 'HPMR_complex',
 'ICELLNET',
 'ICELLNET_complex',
 'IntOGen',
 'Integrins',
 'KEGG-PC',
 'Kirouac2010',
 'LOCATE',
 'LRdb',
 'MCAM',
 'MSigDB',
 'Matrisome',
 'MatrixDB',
 'Membranome',
 'NetPath',
 'OPM',
 'Phobius',
 'Phosphatome',
 'Ramilowski2015',
 'Ramilowski_location',
 'SIGNOR',
 'SignaLink_function',
 'SignaLink_pathway',
 'Surfaceome',
 'TCDB',
 'TFcensus',
 'TopDB',
 'UniProt_family',
 'UniProt_keyword',
 'UniProt_location',
 'UniProt_tissue',
 'UniProt_topology',
 'Vesiclepedia',
 'Zhong2015',
 'iTALK',
 'kinase.com')

Let's import tissue enrichment data from the Human Protein Atlas. Calling the necessary function first we select the proteins we'd like to gather information on, followed by the resources we are pulling the data from.

A toy example below:

We assign the annotations of the proteins TP53 and LMNA from the tissue section of the HPA into the HPA_small variable. The "wide" setting pivots the data from a long format to wide, which gives us a nice table from the queried data.

HPA_small = op.requests.Annotations.get(
    proteins = ['TP53, LMNA'],
    resources = 'HPA_tissue'
)
HPA_small

These queries are also URL accessible. This toy query translates to the following:

URL: https://omnipathdb.org/annotations?resources=HPA_tissue&proteins=TP53,LMNA&license=academic

Let's get the unique proteins from SIGNOR into a list (set)

SIGNOR_proteins = []
SIGNOR_proteins.extend(interactions_filtered['source'])
SIGNOR_proteins.extend(interactions_filtered['target'])
SIGNOR_proteins = list(set(SIGNOR_proteins)) #keep unique values only

We can pass this into requests.Annotations.get() just like above. To ensure sensible runtimes for this tutorial here we restrict the queried proteins to the first 500 in the list with the [0:499] slice (starting from 0 as Python uses zero based numbering).

HPA_signor = op.requests.Annotations.get(
    proteins = SIGNOR_proteins[0:499],
    resources = 'HPA_tissue'
)
HPA_signor

First we need to pivot this "long" data format to a "wide" one.

HPA_signor_wide = pd.pivot_table(HPA_signor, index = 'uniprot', columns = 'label', values = 'value', aggfunc='first')

Now that we have the data, let's filter it down to a specific case, like breast cancer. We filter out rows where the levels of the proteins are favourable, i.e. we only move forward with the ones that are.

HPA_breast_cancer = HPA_signor_wide[
    (HPA_signor_wide['tissue'] != 'breast cancer') &
    (HPA_signor_wide['favourable'] == 'True' )]

HPA_breast_cancer

	source	target	is_directed	is_stimulation	is_inhibition	consensus_direction	consensus_stimulation	consensus_inhibition	dip_url	curation_effort	references	sources	type	references_stripped	n_references	n_sources	n_primary_sources
799	P53355	P46821	True	True	False	True	True	False	None	1	SIGNOR:18806760	SIGNOR	post_translational	18806760	1	1	1
2416	Q8N726	P06748	True	False	True	False	False	False	None	1	SIGNOR:14636574	SIGNOR	post_translational	14636574	1	1	1
2814	Q12933	P61088	True	True	False	False	False	False	None	1	SIGNOR:18635759	SIGNOR	post_translational	18635759	1	1	1
3937	P23458	P38484	True	True	False	False	False	False	None	1	SIGNOR:19041276	SIGNOR	post_translational	19041276	1	1	1
3942	P22681	Q96JA1	True	False	True	False	False	False	None	1	SIGNOR:15282549	SIGNOR	post_translational	15282549	1	1	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
149118	Q13526	P01574	True	False	True	True	False	True	None	1	SIGNOR:16699525	SIGNOR	transcriptional	16699525	1	1	1
149119	Q01860	P31749	True	False	True	True	False	True	None	1	SIGNOR:23041284	SIGNOR	transcriptional	23041284	1	1	1
149120	Q01860	Q9Y243	True	False	True	True	False	True	None	1	SIGNOR:23041284	SIGNOR	transcriptional	23041284	1	1	1
149121	Q01860	P31751	True	False	True	True	False	True	None	1	SIGNOR:23041284	SIGNOR	transcriptional	23041284	1	1	1
149122	Q01860	P26358	True	True	False	True	True	False	None	1	SIGNOR:22795133	SIGNOR	transcriptional	22795133	1	1	1

	uniprot	genesymbol	entity_type	source	label	value	record_id
0	P04637	TP53	protein	HPA_tissue	organ	lymphoma	632671
1	P04637	TP53	protein	HPA_tissue	tissue	lymphoma	632671
2	P04637	TP53	protein	HPA_tissue	level	Not detected	632671
3	P04637	TP53	protein	HPA_tissue	n_not_detected	9	632671
4	P04637	TP53	protein	HPA_tissue	n_low	3	632671
...	...	...	...	...	...	...	...
780	P04637	TP53	protein	HPA_tissue	level	Not detected	632772
781	P04637	TP53	protein	HPA_tissue	status	Enhanced	632772
782	P04637	TP53	protein	HPA_tissue	prognostic	False	632772
783	P04637	TP53	protein	HPA_tissue	favourable	False	632772
784	P04637	TP53	protein	HPA_tissue	pathology	False	632772

	uniprot	genesymbol	entity_type	source	label	value	record_id
0	P23511	NFYA	protein	HPA_tissue	organ	prostate	797
1	P23511	NFYA	protein	HPA_tissue	tissue	glandular cells	797
2	P23511	NFYA	protein	HPA_tissue	level	Medium	797
3	P23511	NFYA	protein	HPA_tissue	status	Enhanced	797
4	P23511	NFYA	protein	HPA_tissue	prognostic	False	797
...	...	...	...	...	...	...	...
316210	P10147	CCL3	protein	HPA_tissue	tissue	liver cancer	1421904
316211	P10147	CCL3	protein	HPA_tissue	prognostic	False	1421904
316212	P10147	CCL3	protein	HPA_tissue	favourable	False	1421904
316213	P10147	CCL3	protein	HPA_tissue	score	0.3542	1421904
316214	P10147	CCL3	protein	HPA_tissue	pathology	True	1421904

label	favourable	level	n_high	n_low	n_medium	n_not_detected	organ	pathology	prognostic	score	status	tissue
uniprot
A0PJZ3	True	Not detected	0	0	1	2	thyroid cancer	True	False	0.2515	NaN	thyroid cancer
A8MT69	True	NaN	NaN	NaN	NaN	NaN	stomach cancer	True	False	0.02572	NaN	stomach cancer
O00170	True	Medium	1	2	8	0	melanoma	True	False	0.006193	Approved	melanoma
O14511	True	Not detected	0	3	0	8	renal cancer	True	False	0.00779	Uncertain	renal cancer
O14842	True	NaN	NaN	NaN	NaN	NaN	pancreatic cancer	True	False	0.04515	NaN	pancreatic cancer
...	...	...	...	...	...	...	...	...	...	...	...	...
Q9UI33	True	Low	0	6	1	4	urothelial cancer	True	False	0.0008296	NaN	urothelial cancer
Q9UJ55	True	NaN	NaN	NaN	NaN	NaN	head and neck cancer	True	False	0.3796	NaN	head and neck cancer
Q9ULT6	True	Medium	0	0	11	0	prostate cancer	True	False	0.3942	NaN	prostate cancer
Q9Y261	True	Medium	0	1	6	4	ovarian cancer	True	True	0.0006283	Enhanced	ovarian cancer
Q9Y3A5	True	Medium	0	3	4	4	prostate cancer	True	False	0.4591	Approved	prostate cancer