In this tutorial we show you how to query interactions from one of the many resources included in OmniPath, we map additional annotations to the data, and combine the two to build a small tissue specific network out of them
We'll start by importing libraries, first omnipath
,
and pandas
for data wrangling.
import omnipath as op
import pandas as pd
With the omnipath
package loaded, we can download a subset of the interaction data. To take a look at the available databases, run:
op.interactions.AllInteractions.resources()
To query the interactions from one of these sources, we can use the interactions.AllInteractions
function.
interactions = op.interactions.AllInteractions.get()
Let's get interactions coming from the SIGNOR
database. It is important to mention here, that the omnipath
library queries can be replicated in browser too, as they access specific URLs, depending on the parameters we give here.
Feel free to give it a go: https://omnipathdb.org/interactions?genesymbols=yes&resources=SIGNOR&datasets=dorothea,kinaseextra,ligrecextra,lncrna_mrna,mirnatarget,omnipath,pathwayextra,tf_mirna,tf_target,tfregulons&dorothea_levels=A,B&fields=sources,references,curation_effort,dorothea_level&license=academic
interactions_filtered = interactions[interactions.sources.isin(['SIGNOR'])]
interactions_filtered
To give these interactions a bit more depth, we can map annotation data to the interactions.
To take a look at the available annotation resources in OmniPath,
call the requests.Annotations
function by running the line below.
op.requests.Annotations.resources()
Let's import tissue enrichment data from the Human Protein Atlas. Calling the necessary function first we select the proteins we'd like to gather information on, followed by the resources we are pulling the data from.
A toy example below:
We assign the annotations of the proteins TP53 and LMNA from the tissue section of the HPA into the HPA_small variable. The "wide" setting pivots the data from a long format to wide, which gives us a nice table from the queried data.
HPA_small = op.requests.Annotations.get(
proteins = ['TP53, LMNA'],
resources = 'HPA_tissue'
)
HPA_small
These queries are also URL accessible. This toy query translates to the following:
URL: https://omnipathdb.org/annotations?resources=HPA_tissue&proteins=TP53,LMNA&license=academic
Let's get the unique proteins from SIGNOR into a list (set)
SIGNOR_proteins = []
SIGNOR_proteins.extend(interactions_filtered['source'])
SIGNOR_proteins.extend(interactions_filtered['target'])
SIGNOR_proteins = list(set(SIGNOR_proteins)) #keep unique values only
We can pass this into requests.Annotations.get()
just like above.
To ensure sensible runtimes for this tutorial here we restrict the queried
proteins to the first 500 in the list with the [0:499] slice (starting from 0 as Python uses zero based numbering).
HPA_signor = op.requests.Annotations.get(
proteins = SIGNOR_proteins[0:499],
resources = 'HPA_tissue'
)
HPA_signor
First we need to pivot this "long" data format to a "wide" one.
HPA_signor_wide = pd.pivot_table(HPA_signor, index = 'uniprot', columns = 'label', values = 'value', aggfunc='first')
Now that we have the data, let's filter it down to a specific case, like breast cancer. We filter out rows where the levels of the proteins are favourable, i.e. we only move forward with the ones that are.
HPA_breast_cancer = HPA_signor_wide[
(HPA_signor_wide['tissue'] != 'breast cancer') &
(HPA_signor_wide['favourable'] == 'True' )]
HPA_breast_cancer