The pypath tutorial collection

Before April 2019 on the OmniPath webpage ( we had a few tutorials for pypath. However over the past years we developed a lot pypath and especially recently a number of important points in the interface changed (although we wanted to keep compatibility as much as possible). This is a new comprehensive tutorial which replaced the previous tutorials by April 2019 and has been updated in August 2019.

1: Quick start – How do I build OmniPath data with pypath?

pypath provides an easy way to build the OmniPath network as it has been described in our paper. At the first time this will take several minutes, because all data will be downloaded from the original providers. Next time pypath will use the data from its cache directory, so the network will build much faster. If you want to load it even faster, you can save it into a pickle dump.

In [10]:
from pypath import main
from pypath import settings
In [ ]:
pa = main.PyPath()
#pa.load_omnipath() # This is commented out because it takes > 1h 
                    # to run it for the first time due to the vast
                    # amount of data download.
                    # Once you populated the cache it still takes
                    # approx. 30 min to build the entire OmniPath
                    # as the process consists of quite some data
                    # processing. If you dump it in a pickle, you
                    # can load the network in < 1 min

2: Quick start – I just want a network quickly and play around with pypath

You can find the predefined formats in the pypath.data_formats module. For example, to load one resource from there, let's say Signor:

In [2]:
from pypath import main
from pypath import data_formats
pa = main.PyPath()
pa.load_resources({'signor': data_formats.pathway['signor']})

Or to load all activity flow resources with literature references:

In [3]:
from pypath import main
from pypath import data_formats
In [4]:
pa = main.PyPath()

Or to load all activity flow resources, including the ones without literature references:

In [5]:
pa = main.PyPath()

3: Quick start – How do I build networks from any data with pypath?

Here we show how to build a network from your own files. The advantage of building network with pypath is that you don't need to worry about merging redundant elements, neither about different formats and identifiers. Let's say you have two files with network data:





Note: you need to create these files in order to load them.

3a: Defining input formats

In [6]:
import pypath
import pypath.input_formats as input_formats

input1 = input_formats.ReadSettings(
    name = 'egf1',
    input = 'network1.csv',
    header = True,
    separator = ',',
    id_col_a = 0,
    id_col_b = 1,
    id_type_a = 'entrez',
    id_type_b = 'entrez',
    sign = (2, 'stimulation', 'inhibition'),
    ncbi_tax_id = 9606,

input2 = input_formats.ReadSettings(
    name = 'egf2',
    input = 'network2.sif',
    separator = ' ',
    id_col_a = 0,
    id_col_b = 2,
    id_type_a = 'genesymbol',
    id_type_b = 'genesymbol',
    sign = (1, '+', '-'),
    ncbi_tax_id = 9606,

3b: Creating PyPath object and loading the 2 test files

In [7]:
inputs = {
    'egf1': input1,
    'egf2': input2

pa = main.PyPath()
pa.init_network(lst = inputs)

4: Plotting the network with igraph

Here we use the network created above (because it is reasonable size, not like the networks we could get from most of the network databases). Igraph has excellent plotting capabilities built on top of the cairo library.

In [8]:
import igraph
plot = igraph.plot(pa.graph, target = 'egf_network.png',
            edge_width = 0.3, edge_color = '#777777',
            vertex_color = '#97BE73', vertex_frame_width = 0,
            vertex_size = 70.0, vertex_label_size = 15,
            vertex_label_color = '#FFFFFF',
            # due to a bug in either igraph or IPython, 
            # vertex labels are not visible on inline plots:
            inline = False, margin = 120)
from IPython.display import Image

5: Building networks

For this you will need the PyPath class from the pypath.main module which takes care about building and querying the network. Also you need the pypath.data_formats module where you find a number of predefined input settings organized in larger categories (e.g. activity flow, enzyme-substrate, transcriptional regulation, etc). These input settings will tell pypath how to download and process the data.

In [20]:
from pypath import main
from pypath import data_formats

For example data_formats.pathway is a collection of databases which fit into the activity flow concept, i.e. one protein either stimulates or inhibits the other. It is a dictionary with names as keys and the input settings as values:

In [9]:
{'trip': <pypath.input_formats.ReadSettings at 0x6da2497bc940>,
 'spike': <pypath.input_formats.ReadSettings at 0x6da2497bc9b0>,
 'signalink3': <pypath.input_formats.ReadSettings at 0x6da2497bc9e8>,
 'guide2pharma': <pypath.input_formats.ReadSettings at 0x6da2497bca20>,
 'ca1': <pypath.input_formats.ReadSettings at 0x6da2497bca58>,
 'arn': <pypath.input_formats.ReadSettings at 0x6da2497bcac8>,
 'nrf2': <pypath.input_formats.ReadSettings at 0x6da2497bcb00>,
 'macrophage': <pypath.input_formats.ReadSettings at 0x6da2497bca90>,
 'death': <pypath.input_formats.ReadSettings at 0x6da2497bcb38>,
 'pdz': <pypath.input_formats.ReadSettings at 0x6da2497bcb70>,
 'signor': <pypath.input_formats.ReadSettings at 0x6da2497bcba8>,
 'adhesome': <pypath.input_formats.ReadSettings at 0x6da2497bcbe0>,
 'hpmr': <pypath.input_formats.ReadSettings at 0x6da2497c0908>,
 'cellphonedb': <pypath.input_formats.ReadSettings at 0x6da2497c09e8>,
 'ramilowski2015': <pypath.input_formats.ReadSettings at 0x6da2497c0ac8>}

Such a dictionary you can pass to the init_network method of the PyPath object. Then it will download the data from the original sources, translate the identifiers and merge the networks. Pypath stores all downloaded data in a cache, by default ~/.pypath/cache in your user's home directory. For this reason when you load a resource for the first time it might take long but next time will be faster as data will be fetched from the cache. First create a pypath.main.PyPath object, then build the network:

In [10]:
pa = main.PyPath()

You can add more resource sets a similar way:

In [23]:

To load one single resource simply create a one element dict:

In [24]:
pa.load_resources({'matrixdb': data_formats.interaction['matrixdb']})

5a: Which network datasets are pre-defined in pypath?

You can find all the pre-defined datasets in the pypath.data_formats module. As already we mentined above, the pathway dataset contains the literature curated activity flow resources. This was the original focus of pypath and OmniPath, however since then we added a great variety of other kinds of resource definitions. Here we give an overview of these.

  • data_formats.pathway: activity flow networks with literature references
  • data_formats.activity_flow: synonym for pathway
  • data_formats.pathway_noref: activity flow networks without literature references
  • data_formats.pathway_all: all activity flow data
  • data_formats.ptm: enzyme-substrate interaction networks with literature references
  • data_formats.enzyme_substrate: synonym for ptm
  • data_formats.ptm_noref: enzyme-substrate networks without literature references
  • data_formats.ptm_all: all enzyme-substrate data
  • data_formats.interaction: undirected interactions from both literature curated and high-throughput collections (e.g. IntAct, BioGRID)
  • data_formats.interaction_misc: undirected, high-scale interaction networks without the constraint of having any literature reference (e.g. the unbiased human interactome screen from the Vidal lab)
  • data_formats.transcription_onebyone: transcriptional regulation databases (TF-target interactions) with all databases downloaded directly and processed by pypath
  • data_formats.transcription: transcriptional regulation only from the DoRothEA data
  • data_formats.mirna_target: miRNA-mRNA interactions from literature curated resources
  • data_formats.tf_mirna: transcriptional regulation of miRNA from literature curated resources
  • data_formats.lncrna_protein: lncRNA-protein interactions from literature curated datasets
  • data_formats.ligand_receptor: ligand-receptor interactions from both literature curated and other kinds of resources
  • data_formats.pathwaycommons: the PathwayCommons database
  • data_formats.reaction: process description databases; not guaranteed to work at this moment
  • data_formats.reaction_misc: alternative definitions to load process description databases; not guaranteed to work at this moment
  • data_formats.small_molecule_protein: signaling interactions between small molecules and proteins

To see the list of the resources in a dataset, you can check the dict keys or the name attribute of each element:

In [17]:
dict_keys(['trip', 'spike', 'signalink3', 'guide2pharma', 'ca1', 'arn', 'nrf2', 'macrophage', 'death', 'pdz', 'signor', 'adhesome', 'hpmr', 'cellphonedb', 'ramilowski2015'])
In [19]:
[ for resource in data_formats.pathway.values()]

6: How to access the network

Once you built a network you can use it for various purposes and write your own scripts for further processing or analysis. The network is represented by an igraph object (

In [25]:
<igraph.Graph at 0x6ee60f2c7318>

Number of edges and nodes:

In [12]:
pa.ecount, pa.vcount
(22101, 5184)

The edge and vertex sequences you can access in the es and vs attributes, you can iterate these or index by integers. The edge and vertex attributes you can access by string keys. E.g. get the sources of edge 0:

In [15]:[81]['sources']
{'SPIKE', 'SignaLink3'}

7: Directions and signs

By default the igraph object is undirected but it carries all direction information in Python objects assigned to each edge. Pypath can convert it to a directed igraph object, but you still need the Direction objects to have the signs, as igraph has no signed network representation. Certain methods need the directed igraph object and they will automatically create it, but you can create it manually:

In [40]:

You find the directed network in the pa.dgraph attribute:

In [41]:
<igraph.Graph at 0x6ee649d04318>

Now let's take a look on the pypath.main.Direction objects which contain details about directions and signs. First as an example, select a random edge:

In [54]:
edge =[3241]

The Direction object is in the dirs edge attribute:

In [55]:
d = edge['dirs']

It has a method to print its content a human readable way:

In [56]:
Directions and signs of interaction between Q13489 and Q13546

	Q13489 ===> Q13546 :: SPIKE, SignaLink3
	Q13489 <=== Q13546 :: SignaLink3
	Q13489 =+=> Q13546 :: SPIKE

From this we see the databases phosphoELM and Signor agree that protein P17252 has an effect on Q15139 and Signor in addition tells us this effect is stimulatory. However in your scripts you can query the Direction objects a number of ways. Each Direction object calls the two possible directions either straight or reverse:

In [57]:
('Q13489', 'Q13546')
In [58]:
('Q13546', 'Q13489')

It can tell you if one of these directions is supported by any of the network resources:

In [59]:

Or it can return those resources:

In [60]:
d.get_dir(d.straight, sources = True)
{'SPIKE', 'SignaLink3'}

The opposite direction is not supported by any resource:

In [61]:
d.get_dir(d.reverse, sources = True)

Similar way the signs can be queried. The returned pair of boolean values mean if the interaction in this direction is stimulatory or inhibitory, respectively.

In [62]:
[True, False]

Or you can ask whether it is inhibition:

In [63]:

Or if the interaction is directed at all:

In [64]:

Sometimes resources don't agree, for example one tells an interaction is inhibition while according to others it is stimulation; or one tells A effects B and another resource the other way around. Here we preserve all these potentially contradicting information in the Direction object and at the end you decide what to do with it depending on your purpose. If you want to get rid of ambiguity there is a method to get a consensus direction and sign which returns the attributes the most resources agree on:

In [65]:
[['Q13489', 'Q13546', 'directed', 'positive']]

8: Accessing nodes in the network

In igraph the vertices are numbered but this numbering can change at certain operations. Instead the we can use the vertex attributes. In PyPath for proteins the name attribute is UniProt ID by default and the label is Gene Symbol.

In [66]:
['P63000', 'O00161', 'Q9GZU1', 'Q96H20', 'Q9NWB7']
In [67]:
['RAC1', 'SNAP23', 'MCOLN1', 'SNF8', 'IFT57']

The PyPath object offers a number of helper methods to access the nodes by their names. For example, uniprot or up returns the igraph.Vertex for a UniProt ID:

In [68]:

Similarly genesymbol or gs for Gene Symbols:

In [36]:

Each of these has a "plural" version:

In [69]:
len(list(pa.gss(['MTOR', 'ATG16L2', 'ULK1'])))

And a generic method where you can mix UniProts and Gene Symbols:

In [70]:
len(list(pa.proteins(['MTOR', 'P00533'])))

9: Querying relationships with our without causality

Above you could see how to query the directions and names of individual edges and nodes. Building on top of these, other methods give a way to query causality, i.e. which proteins are affected by an other one, and which others are its regulators. The example below returns the nodes PIK3CA is stimulated by, the gs prefix tells we query by the Gene Symbol:

In [71]:
<pypath.main._NamedVertexSeq at 0x6ee604b0a8c8>

It returns a so called _NamedVertexSeq object, which you can get a series of igraph.Vertex objects or Gene Symbols or UniProt IDs from:

In [72]:
['NTRK1', 'SRC', 'GAB1', 'PTPN11', 'NRAS']
In [73]:
['P04629', 'P12931', 'Q13480', 'Q06124', 'P01111']

Note, the names of these methods are a bit contraintuitive, the for example the gs_stimulates returns the genes stimulated by PIK3CA:

In [74]:
['MTOR', 'AKT1']
In [75]:
'PIK3CA' in set(pa.affected_by('AKT1').gs())

There are many similary methods, inhibited_by returns negative regulators, affected_by does not consider +/- signs, without gs_ and up_ prefixes you can provide either of these identifiers, neighbors does not consider the direction. At the end .gs() converts the result for a list of Gene Symbols, up() to UniProts, .ids() to vertex IDs and by default it yields igraph.Vertex objects:

In [76]:
[0, 32, 38, 50, 69]

Finally, with neighborhood methods return the indirect neighborhood in custom number of steps (however size of the neighborhood increases rapidly with number of steps):

In [77]:
print(list(pa.neighborhood('ATG3', 1).gs()))
['ATG3', 'GABARAP', 'ATG5', 'GABARAPL2', 'ATG12', 'ATG7', 'CFLAR', 'MAP1LC3B', 'MAP1LC3A', 'TP63']
In [78]:
print(list(pa.neighborhood('ATG3', 2).gs()))
['ATG3', 'GABARAP', 'ATG5', 'GABARAPL2', 'ATG12', 'ATG7', 'CFLAR', 'MAP1LC3B', 'MAP1LC3A', 'TP63', 'TRPV1', 'CLTC', 'FNBP1', 'NBR1', 'BNIP3L', 'ATG13', 'SQSTM1', 'RB1CC1', 'FYCO1', 'ATG4B', 'ULK1', 'ULK2', 'DVL2', 'OPTN', 'IFIH1', 'BCL2L1', 'ATF4', 'TP73', 'WDFY3', 'CAPN2', 'FADD', 'CAPN1', 'ATG10', 'DDX58', 'DDIT3', 'MAVS', 'ATG16L1', 'ATG16L2', 'TECPR1', 'PPHLN1', 'COX5B', 'UBA5', 'NEK9', 'ATG4A', 'BNIP3', 'NIPSNAP2', 'EP300', 'FOXO1', 'HSF1', 'TAX1BP3', 'ITCH', 'RIPK1', 'FAS', 'NFKB1', 'PRKCB', 'RIPK2', 'TRAF2', 'AR', 'CASP8', 'AKT1', 'MAP3K14', 'CASP10', 'PRKACA', 'MAP1B', 'EGR1', 'MAPK8', 'KEAP1', 'ZKSCAN3', 'TFEB', 'P27791', 'TBC1D5', 'E2F1', 'MAP1A', 'RAB3GAP1', 'HNRNPAB', 'FBXW7', 'ATM', 'TP53', 'MDM2', 'RPS6KB1', 'CDK2', 'IKBKB', 'ATG9A', 'BECN1']
In [79]:
len(list(pa.neighborhood('ATG3', 3).gs()))
In [80]:
len(list(pa.neighborhood('ATG3', 4).gs()))

10: Accessing edges by identifiers

Just like nodes also edges can be accessed by identifiers like Gene Symbols. get_edge returns an igraph.Edge if the edge exists otherwise None.

In [81]:
type(pa.get_edge('EGF', 'EGFR'))
In [82]:
type(pa.get_edge('EGF', 'P00533'))
In [83]:
type(pa.get_edge('EGF', 'AKT1'))
In [84]:
print(pa.get_edge('EGF', 'EGFR')['dirs'])
Directions and signs of interaction between P00533 and P01133

	P00533 <=== P01133 :: SPIKE, HPMR, SignaLink3
	P00533 <=+= P01133 :: SPIKE, SignaLink3

11: Literature references

Select a random edge and in the references attribute you find a list of references:

In [86]:
edge = pa.get_edge( 'MAP1LC3B', 'SQSTM1')
[<pypath.refs.Reference at 0x6ee605f6dd98>,
 <pypath.refs.Reference at 0x6ee605f6dd68>]

Each reference has a PubMed ID:

In [132]:
In [133]:

These 3 references come from 3 different databases, but there must be 2 overlaps between them:

In [87]:
{'NRF2ome': {<pypath.refs.Reference at 0x6ee605f6dd98>},
 'ELM': {<pypath.refs.Reference at 0x6ee5fdc8cd98>,
  <pypath.refs.Reference at 0x6ee605f6dd68>}}

12: Translating identifiers

The pypath.mapping module is for ID translation, most of the time you can simply call the map_name method:

In [20]:
from pypath import mapping
mapping.map_name('P00533', 'uniprot', 'genesymbol')
In [21]:
mapping.map_name('8408', 'entrez', 'uniprot')

A number of mapping tables are predefined and loaded automatically. However it does not translate in 2 steps if no direct translation table is available. For example Entrez to Gene Symbol you can translate this way:

In [22]:
    mapping.map_name('8408', 'entrez', 'uniprot'),

By default the map_name function returns a set because it accounts for ambiguous mapping. However most often the ID translation is unambiguous, and you want to retrieve only one ID. The map_name0 returns a string, even in case of ambiguity, it returns a random element from the resulted set:

In [23]:
mapping.map_name0('GABARAPL3', 'genesymbol', 'uniprot')

13: Enzyme-substrate interactions

The pypath.ptm module builds a database of enzyme-substrate interactions.

In [ ]:
from pypath import ptm
ptm_db = ptm.get_db()

Here you got a dictionary with pairs of UniProt IDs as keys and a list of special objects representing enzyme-substrate interactions as values:

In [ ]:
print(ptm_db.enz_sub[('Q13177', 'P01236')][0])

Alternatively the enzyme-substrate interactions can be assigned to network edges:

In [ ]:
In [ ]:

14: Annotations

This module provides various annotations about the function and location of the proteins.

In [24]:
from pypath import annot
a = annot.get_db()

OmniPath contains annotations from 27 resources. These provide various information about the characteristics of the proteins, e.g. their localization or function. The AnnotationTable object loads all annotations by default, optionally you can limit this to certain resources. For example, if you only want to load the pathway membership annotations from SIGNOR, SignaLink, NetPath and KEGG, you can provide the names of the appropriate classes:

In [25]:
pathways = annot.AnnotationTable(
    protein_sources = (

The AnnotationTable object provides methods to query all resources together, or build a boolean array out of them. To see all annotations of one protein:

In [26]:
[SignalinkPathway(pathway='TNF/Apoptosis', core=True),
 SignalinkPathway(pathway='RTK', core=True),
 SignalinkPathway(pathway='WNT', core=True),
 SignalinkPathway(pathway='IIP', core=True),
 KeggPathway(pathway='Proteoglycans in cancer'),
 KeggPathway(pathway='Pathways in cancer'),
 KeggPathway(pathway='Pancreatic cancer'),
 KeggPathway(pathway='Central carbon metabolism in cancer'),
 KeggPathway(pathway='Phospholipase D signaling pathway'),
 KeggPathway(pathway='Human cytomegalovirus infection'),
 KeggPathway(pathway='Oxytocin signaling pathway'),
 KeggPathway(pathway='Hepatocellular carcinoma'),
 KeggPathway(pathway='Bladder cancer'),
 KeggPathway(pathway='Endocrine resistance'),
 KeggPathway(pathway='Prostate cancer'),
 KeggPathway(pathway='Estrogen signaling pathway'),
 KeggPathway(pathway='Breast cancer'),
 KeggPathway(pathway='ErbB signaling pathway'),
 KeggPathway(pathway='Non-small cell lung cancer'),
 KeggPathway(pathway='FoxO signaling pathway'),
 KeggPathway(pathway='Parathyroid hormone synthesis, secretion and action'),
 KeggPathway(pathway='Regulation of actin cytoskeleton'),
 KeggPathway(pathway='Human papillomavirus infection'),
 KeggPathway(pathway='GnRH signaling pathway'),
 KeggPathway(pathway='Relaxin signaling pathway'),
 KeggPathway(pathway='Adherens junction'),
 KeggPathway(pathway='EGFR tyrosine kinase inhibitor resistance'),
 KeggPathway(pathway='Colorectal cancer'),
 KeggPathway(pathway='HIF-1 signaling pathway'),
 KeggPathway(pathway='Hepatitis C'),
 KeggPathway(pathway='Choline metabolism in cancer'),
 KeggPathway(pathway='Epithelial cell signaling in Helicobacter pylori infection'),
 KeggPathway(pathway='Focal adhesion'),
 KeggPathway(pathway='Cushing syndrome'),
 KeggPathway(pathway='Calcium signaling pathway'),
 KeggPathway(pathway='Endometrial cancer'),
 KeggPathway(pathway='Gap junction'),
 NetpathPathway(pathway='Epidermal growth factor receptor (EGFR)'),
 NetpathPathway(pathway='Receptor activator of nuclear factor kappa-B ligand (RANKL)'),
 NetpathPathway(pathway='Follicle-stimulating hormone (FSH)'),
 NetpathPathway(pathway='Advanced glycation end-products (AGE/RAGE)'),
 NetpathPathway(pathway='Tumor necrosis factor (TNF) alpha'),
 NetpathPathway(pathway='Androgen receptor (AR)'),
 NetpathPathway(pathway='Alpha6 Beta4 Integrin'),
 SignorPathway(pathway='Glioblastoma Multiforme'),
In [27]:
pathways.create_dataframe = True
In [32]:
SignaLink3 SignaLink3__ SignaLink3__BCR SignaLink3__GPCR SignaLink3__HH SignaLink3__HIPPO SignaLink3__Hedgehog SignaLink3__Hippo SignaLink3__IIP SignaLink3__JAK/STAT ... CellPhoneDB_complex__Cytokine receptor IL3 family CellPhoneDB_complex__Cytokine receptor IL6 family CellPhoneDB_complex__Cytokine receptor IL6 family, IL12 subfamily CellPhoneDB_complex__Cytokine receptor family CellPhoneDB_complex__Human IgG receptor CellPhoneDB_complex__Receptor CellPhoneDB_complex__T cell receptor add CellPhoneDB_complex__TGFBeta_receptor_add CellPhoneDB_complex__growth factor receptor CellPhoneDB_complex__hematopoyetic receptor
A0A024RBG1 False False False False False False False False False False ... False False False False False False False False False False
A0A075B6H9 False False False False False False False False False False ... False False False False False False False False False False
A0A075B6I0 False False False False False False False False False False ... False False False False False False False False False False
A0A075B6I1 False False False False False False False False False False ... False False False False False False False False False False
A0A075B6I4 False False False False False False False False False False ... False False False False False False False False False False
A0A075B6I9 False False False False False False False False False False ... False False False False False False False False False False
A0A075B6J1 False False False False False False False False False False ... False False False False False False False False False False
A0A075B6J6 False False False False False False False False False False ... False False False False False False False False False False
A0A075B6J9 False False False False False False False False False False ... False False False False False False False False False False
A0A075B6K0 False False False False False False False False False False ... False False False False False False False False False False

10 rows × 1597 columns

The AnnotationTable object contains the resource specific annotation objects:

In [33]:
{'CPAD': <pypath.annot.Cpad at 0x68fbbbff5dd0>,
 'DisGeNet': <pypath.annot.Disgenet at 0x68fb8e004a10>,
 'SignaLink3': <pypath.annot.SignalinkPathways at 0x68fb8e8975d0>,
 'CancerGeneCensus': <pypath.annot.CancerGeneCensus at 0x68fb9b855810>,
 'Matrisome': <pypath.annot.Matrisome at 0x68fb8e853310>,
 'KEGG': <pypath.annot.KeggPathways at 0x68fb9c004fd0>,
 'Integrins': <pypath.annot.Integrins at 0x68fb9a2903d0>,
 'Ramilowski_location': <pypath.annot.Ramilowski2015Location at 0x68fb9a4e6110>,
 'Signor': <pypath.annot.SignorPathways at 0x68fb92faf690>,
 'CancerSEA': <pypath.annot.Cancersea at 0x68fb91f236d0>,
 'CSPA': <pypath.annot.CellSurfaceProteinAtlas at 0x68fbadcf5790>,
 'Membranome': <pypath.annot.Membranome at 0x68fbb8013d10>,
 'Guide2Pharma': <pypath.annot.GuideToPharmacology at 0x68fba9a143d0>,
 'OPM': <pypath.annot.Opm at 0x68fb92f40cd0>,
 'Kirouac2010': <pypath.annot.Kirouac2010 at 0x68fbada4c990>,
 'Zhong2015': <pypath.annot.Zhong2015 at 0x68fbcb89c050>,
 'HPA': <pypath.annot.HumanProteinAtlas at 0x68fbaded3810>,
 'TopDB': <pypath.annot.Topdb at 0x68fbae51e1d0>,
 'Kinases': <pypath.annot.Kinases at 0x68fbc97a8b90>,
 'TFcensus': <pypath.annot.Tfcensus at 0x68fbc2262190>,
 'Adhesome': <pypath.annot.Adhesome at 0x68fbc8506f10>,
 'Ramilowski2015': <pypath.annot.Ramilowski2015 at 0x68fbc227ad90>,
 'Phosphatome': <pypath.annot.Phosphatome at 0x68fbc21cbad0>,
 'Vesiclepedia': <pypath.annot.Vesiclepedia at 0x68fb8e22aad0>,
 'NetPath': <pypath.annot.NetpathPathways at 0x68fb9bec5750>,
 'HGNC': <pypath.annot.Hgnc at 0x68fb96c93d50>,
 'HPMR': <pypath.annot.HumanPlasmaMembraneReceptome at 0x68fbe932b490>,
 'DGIdb': <pypath.annot.Dgidb at 0x68fbc2fc3150>,
 'Exocarta': <pypath.annot.Exocarta at 0x68fbc26a8150>,
 'CellPhoneDB': <pypath.annot.CellPhoneDB at 0x68fbc26a87d0>,
 'Locate': <pypath.annot.Locate at 0x68fb9fff2e90>,
 'Surfaceome': <pypath.annot.Surfaceome at 0x68fb9b13b110>,
 'MatrixDB': <pypath.annot.Matrixdb at 0x68fba4fa0e10>,
 'GO_Intercell': <pypath.annot.GOIntercell at 0x68fba55ae3d0>,
 'HPMR_complex': <pypath.annot.HpmrComplex at 0x68fbc26f35d0>,
 'CORUM_Funcat': <pypath.annot.CorumFuncat at 0x68fb95d2a1d0>,
 'CORUM_GO': <pypath.annot.CorumGO at 0x68fbc832add0>,
 'CellPhoneDB_complex': <pypath.annot.CellPhoneDBComplex at 0x68fbc832edd0>}

For each of these you can query the names of the fields, their possible values and the set of proteins annotated with any combination of the values:

In [34]:
matrisome = a.annots['Matrisome']
In [35]:
('mainclass', 'subclass', 'subsubclass')
In [36]:
 'ECM Glycoproteins',
 'ECM Regulators',
 'ECM-affiliated Proteins',
 'Secreted Factors',
In [37]:
matrisome.get_subset(subclass = 'Collagens')
 Complex Collagen type I homotrimer: COMPLEX:P02452,
 Complex HT_DM_Cluster278: COMPLEX:P02452-P02462-P08572-P29400-P53420-Q01955-Q02388-Q14031-Q17RW2-Q8NFW1,
 Complex Collagen type I trimer: COMPLEX:P02452-P08123,
 Complex Collagen type II trimer: COMPLEX:P02458,
 Complex Collagen type XI trimer variant 1: COMPLEX:P02458-P12107-P13942,
 Complex: COMPLEX:P02458-P20908-P25067-P29400,
 Complex: COMPLEX:P02458-P25067-P29400,
 Complex Collagen type III trimer: COMPLEX:P02461,
 Complex: COMPLEX:P02462,
 Complex Collagen type IV trimer variant 1: COMPLEX:P02462-P08572,
 Complex Collagen type XI trimer variant 2: COMPLEX:P05997-P12107,
 Complex Collagen type XI trimer variant 3: COMPLEX:P05997-P12107-P20908,
 Complex Collagen type V trimer variant 1: COMPLEX:P05997-P20908,
 Complex Collagen type V trimer variant 2: COMPLEX:P05997-P20908-P25940,
 Complex: COMPLEX:P08572,
 Complex: COMPLEX:P12109-P12110,
 Complex Collagen type VI trimer: COMPLEX:P12109-P12110-P12111,
 Complex Collagen type IX trimer: COMPLEX:P20849-Q14050-Q14055,
 Complex Collagen type V trimer variant 3: COMPLEX:P20908,
 Complex: COMPLEX:P20908-P25067,
 Complex Collagen type VIII trimer variant 3: COMPLEX:P25067,
 Complex Collagen type VIII trimer variant 1: COMPLEX:P25067-P27658,
 Complex: COMPLEX:P25067-P29400,
 Complex Collagen type VIII trimer variant 2: COMPLEX:P27658,
 Complex Collagen type IV trimer variant 3: COMPLEX:P29400-P53420-Q01955,
 Complex Collagen type IV trimer variant 2: COMPLEX:P29400-Q14031,
 Complex Collagen type XV trimer: COMPLEX:P39059,
 Complex Collagen type XVIII trimer: COMPLEX:P39060,
 Complex: COMPLEX:P53420,
 Complex: COMPLEX:Q01955,
 Complex Collagen type VII trimer: COMPLEX:Q02388,
 Complex Collagen type X trimer: COMPLEX:Q03692,
 Complex Collagen type XIV trimer: COMPLEX:Q05707,
 Complex Collagen type XVI trimer: COMPLEX:Q07092,
 Complex Collagen type XIX trimer: COMPLEX:Q14993,
 Complex Collagen type XXIV trimer: COMPLEX:Q17RW2,
 Complex Collagen type XXVIII trimer: COMPLEX:Q2UY09,
 Complex Collagen type XIII trimer: COMPLEX:Q5TAT6,
 Complex Collagen type XXIII trimer: COMPLEX:Q86Y22,
 Complex Collagen type XXVII trimer: COMPLEX:Q8IZC6,
 Complex Collagen type XXII trimer: COMPLEX:Q8NFW1,
 Complex Collagen type XXVI trimer: COMPLEX:Q96A83,
 Complex Collagen type XXI trimer: COMPLEX:Q96P44,
 Complex Collagen type XII trimer: COMPLEX:Q99715,
 Complex Collagen type XXV trimer, variant 2: COMPLEX:Q9BXS0,
 Complex Collagen type XX trimer: COMPLEX:Q9P218,
 Complex Collagen type XVII trimer: COMPLEX:Q9UMD9,

15: Inter-cellular signaling roles

pypath does not combine the annotations in the annot module, exactly what goes in goes out. For example, WNT pathway from Signor and SignaLink won't be merged automatically. However with the pypath.annot.CustomAnnotation class anyone can do it. For inter-cellular communication categories the pypath.intercell module combines the data from all the relevant resources and creates categories based on a combination of evidences.

In [2]:
from pypath import intercell
In [3]:
i = intercell.get_db() # this takes quite some time
                    # unless you load annotations from a pickle cache
In [8]:
<pypath.intercell.IntercellAnnotation at 0x666c56b9ef90>
In [4]:
In [6]:
category uniprot genesymbol entity_type
0 receptor_cellphonedb COMPLEX:Q5KU26 COLEC12 complex
1 receptor_cellphonedb COMPLEX:Q15223 NECTIN1 complex
2 receptor_cellphonedb P33032 MC5R protein
3 receptor_cellphonedb Q13467 FZD5 protein
4 receptor_cellphonedb P30495 HLA-B protein
5 receptor_cellphonedb P35916 FLT4 protein
6 receptor_cellphonedb COMPLEX:P04629 NTRK1 complex
7 receptor_cellphonedb Q9UKP6 UTS2R protein
8 receptor_cellphonedb COMPLEX:P27037-P36896-Q13705-Q8NER5 ACVR1B-ACVR1C-ACVR2A-ACVR2B complex
9 receptor_cellphonedb COMPLEX:Q9NZQ7 CD274 complex
In [38]:
In [41]:
In [43]:
{'receptor_cellphonedb': 860,
 'receptor_surfaceome': 1563,
 'receptor_go': 594,
 'receptor_hpmr': 1276,
 'receptor_ramilowski': 979,
 'receptor_kirouac': 127,
 'receptor_guide2pharma': 393,
 'interleukin_receptors_hgnc': 78,
 'receptor_hgnc': 78,
 'receptor_dgidb': 979,
 'receptor': 2517,
 'ecm_matrixdb': 656,
 'cell_surface_surfaceome': 3544,
 'cell_surface_go': 907,
 'cell_surface_hpmr': 1276,
 'cell_surface_membranome': 2435,
 'cell_surface_cspa': 2213,
 'cell_surface_cellphonedb': 1159,
 'cell_surface_dgidb': 1384,
 'cell_surface': 6098,
 'ecm_matrisome': 1466,
 'ecm_go': 343,
 'ecm': 1912,
 'ligand_cellphonedb': 848,
 'ligand_go': 468,
 'ligand_hpmr': 393,
 'ligand_ramilowski': 976,
 'ligand_kirouac': 267,
 'ligand_guide2pharma': 441,
 'interleukins_hgnc': 64,
 'endogenous_ligands_hgnc': 367,
 'chemokine_ligands_hgnc': 65,
 'ligand_hgnc': 429,
 'ligand_dgidb': 403,
 'ligand': 1495,
 'intracellular_locate': 15474,
 'intracellular_comppi': 0,
 'intracellular_go': 14982,
 'intracellular': 22020,
 'secreted_locate': 1299,
 'extracellular_locate': 2250,
 'extracellular_surfaceome': 3544,
 'extracellular_matrixdb': 4279,
 'extracellular_membranome': 2435,
 'extracellular_cspa': 2213,
 'extracellular_hpmr': 1669,
 'extracellular_cellphonedb': 848,
 'extracellular': 9246,
 'extracellular_comppi': 0,
 'transmembrane_cellphonedb': 1147,
 'transmembrane_go': 5432,
 'transmembrane_opm': 170,
 'transmembrane_locate': 3098,
 'transmembrane_topdb': 1275,
 'transmembrane': 7022,
 'adhesion_cellphonedb': 0,
 'adhesion_go': 1099,
 'adhesion_matrisome': 109,
 'adhesion_hgnc': 232,
 'adhesion_integrins': 63,
 'adhesion_zhong2015': 676,
 'adhesion_adhesome': 398,
 'adhesion': 1774,
 'surface_enzyme_go': 583,
 'surface_enzyme_surfaceome': 131,
 'surface_enzyme': 630,
 'surface_ligand_go': 120,
 'surface_ligand_cellphonedb': 299,
 'surface_ligand': 384,
 'transporter_surfaceome': 523,
 'transporter_go': 499,
 'transporter_dgidb': 2119,
 'transporter': 2156,
 'extracellular_enzyme': 2597,
 'extracellular_peptidase': 577,
 'growth_factor_binder': 67,
 'growth_factor_regulator': 125,
 'secreted_matrisome': 805,
 'secreted_cellphonedb': 848,
 'secreted': 2362,
 'gap_junction': 33,
 'tight_junction': 130}
In [45]:
In [51]:
{'receptor_cellphonedb': 'Receptor',
 'receptor_surfaceome': 'Receptor',
 'receptor_go': 'Receptor',
 'receptor_hpmr': 'Receptor',
 'receptor_ramilowski': 'Receptor',
 'receptor_kirouac': 'Receptor',
 'receptor_guide2pharma': 'Receptor',
 'interleukin_receptors_hgnc': 'Interleukin receptors (HGNC)',
 'receptor_hgnc': 'Receptor',
 'receptor_dgidb': 'Receptor',
 'receptor': 'Receptor',
 'ecm_matrixdb': 'Extracellular matrix',
 'cell_surface_surfaceome': 'Cell surface',
 'cell_surface_go': 'Cell surface',
 'cell_surface_hpmr': 'Cell surface',
 'cell_surface_membranome': 'Cell surface',
 'cell_surface_cspa': 'Cell surface',
 'cell_surface_cellphonedb': 'Cell surface',
 'cell_surface_dgidb': 'Cell surface',
 'cell_surface': 'Cell surface',
 'ecm_matrisome': 'Extracellular matrix',
 'ecm_go': 'Extracellular matrix',
 'ecm': 'Extracellular matrix',
 'ligand_cellphonedb': 'Ligand',
 'ligand_go': 'Ligand',
 'ligand_hpmr': 'Ligand',
 'ligand_ramilowski': 'Ligand',
 'ligand_kirouac': 'Ligand',
 'ligand_guide2pharma': 'Ligand',
 'interleukins_hgnc': 'Interleukins (HGNC)',
 'endogenous_ligands_hgnc': 'Endogenous ligands (HGNC)',
 'chemokine_ligands_hgnc': 'Chemokine ligands (HGNC)',
 'ligand_hgnc': 'Ligand',
 'ligand_dgidb': 'Ligand',
 'ligand': 'Ligand',
 'intracellular_locate': 'Intracellular',
 'intracellular_comppi': 'Intracellular',
 'intracellular_go': 'Intracellular',
 'intracellular': 'Intracellular',
 'secreted_locate': 'Secreted',
 'extracellular_locate': 'Extracellular',
 'extracellular_surfaceome': 'Extracellular',
 'extracellular_matrixdb': 'Extracellular',
 'extracellular_membranome': 'Extracellular',
 'extracellular_cspa': 'Extracellular',
 'extracellular_hpmr': 'Extracellular',
 'extracellular_cellphonedb': 'Extracellular',
 'extracellular': 'Extracellular',
 'extracellular_comppi': 'Extracellular',
 'transmembrane_cellphonedb': 'Transmembrane',
 'transmembrane_go': 'Transmembrane',
 'transmembrane_opm': 'Transmembrane',
 'transmembrane_locate': 'Transmembrane',
 'transmembrane_topdb': 'Transmembrane',
 'transmembrane': 'Transmembrane',
 'adhesion_cellphonedb': 'Adhesion',
 'adhesion_go': 'Adhesion',
 'adhesion_matrisome': 'Adhesion',
 'adhesion_hgnc': 'Adhesion',
 'adhesion_integrins': 'Adhesion',
 'adhesion_zhong2015': 'Adhesion',
 'adhesion_adhesome': 'Adhesion',
 'adhesion': 'Adhesion',
 'surface_enzyme_go': 'Surface enzyme',
 'surface_enzyme_surfaceome': 'Surface enzyme',
 'surface_enzyme': 'Surface enzyme',
 'surface_ligand_go': 'Surface ligand',
 'surface_ligand_cellphonedb': 'Surface ligand',
 'surface_ligand': 'Surface ligand',
 'transporter_surfaceome': 'Transporter',
 'transporter_go': 'Transporter',
 'transporter_dgidb': 'Transporter',
 'transporter': 'Transporter',
 'extracellular_enzyme': 'Extracellular enzyme',
 'extracellular_peptidase': 'Extracellular peptidase',
 'growth_factor_binder': 'Growth factor binder',
 'growth_factor_regulator': 'Growth factor regulator',
 'secreted_matrisome': 'Secreted',
 'secreted_cellphonedb': 'Secreted',
 'secreted': 'Secreted',
 'gap_junction': 'Gap junction',
 'tight_junction': 'Tight junction'}
In [54]:
 Complex: COMPLEX:P05556-P08195-P08648-P23229,
 Complex: COMPLEX:O94813-Q9Y6N7,

16: Gene Ontology

pypath.go is an almost standalone module for management of the Gene Ontology tree and annotations. The main objects here are GeneOntology and GOAnnotation. The former represents the ontology tree, i.e. terms and their relationships, the latter their assignment to gene products. Both provides many versatile methods for querying.

In [7]:
from pypath import go
goa = go.GOAnnotation()
In [8]:
goa.ontology # the GeneOntology object
<pypath.go.GeneOntology at 0x6ad3e1951cd0>
In [9]:
goa # the GOAnnotation object
<pypath.go.GOAnnotation at 0x6ad3e1999610>

Among many others, the most versatile method is select which is able to select the annotated gene products by various expressions built from GO terms or IDs. It understands AND, OR, NOT and parentheses.

In [10]:
query = """(cell surface OR
        external side of plasma membrane OR
        extracellular region) AND
        (regulation of transmembrane transporter activity OR
        channel regulator activity)"""
result =
['P80108', 'Q16623', 'Q07699', 'Q92913', 'Q8NBP7', 'Q9UKS6', 'Q9UEU0']
In [11]:

17: Protein complexes

The pypath.complex module builds a non-redundant list of complexes from 10 original resources. Complexes are unique considering their set of components, and optionally carry stoichiometry information.

In [16]:
from pypath import complex
complexdb = complex.get_db()
In [17]:
<pypath.complex.ComplexAggregator at 0x6ad441788e50>

To retrieve all complexes containing a specific protein, here MTOR:

In [18]:
{Complex: COMPLEX:O00141-O15530-O75879-P23443-P34931-P42345-Q6R327-Q8N122-Q9BPZ7-Q9BVC4-Q9H672,
 Complex: COMPLEX:O00141-O15530-P07900-P23443-P31749-P31751-P42345-P78527-Q05513-Q05655-Q6R327-Q8N122-Q9BPZ7-Q9BVC4,
 Complex: COMPLEX:O00141-O15530-P0CG47-P0CG48-P23443-P42345-Q15118-Q6R327-Q8N122-Q96BR1-Q9BPZ7-Q9BVC4,
 Complex: COMPLEX:O00141-O15530-P23443-P42345-Q15118-Q6R327-Q8N122-Q96BR1-Q96J02-Q9BPZ7-Q9BVC4,
 Complex: COMPLEX:O00141-O75879-P0CG48-P23443-P34931-P42345-P62753-Q6R327-Q8N122-Q9BPZ7-Q9BVC4-Q9NY26,
 Complex: COMPLEX:O00141-P0CG48-P23443-P36894-P42345-P62942-P68106-Q15427-Q6R327-Q8N122-Q9BPZ7-Q9BVC4,
 Complex: COMPLEX:O00141-P0CG48-P23443-P42345-P46781-P62753-Q6R327-Q8N122-Q96KQ7-Q9BPZ7-Q9BVC4-Q9NY26,
 Complex: COMPLEX:O00141-P0CG48-P23443-P42345-P62753-P62942-Q6R327-Q8N122-Q9BPZ7-Q9BVC4-Q9NY26,
 Complex: COMPLEX:O00141-P0CG48-P23443-P42345-P62753-Q15172-Q6R327-Q8IW41-Q9BPZ7-Q9BVC4-Q9H672,
 Complex: COMPLEX:O00141-P0CG48-P23443-P42345-P62753-Q6R327-Q70Z35-Q8N122-Q8TCU6-Q9BPZ7-Q9BVC4-Q9NY26,
 Complex: COMPLEX:O00141-P0CG48-P23443-P42345-Q13393-Q15382-Q6R327-Q8N122-Q9BPZ7-Q9BVC4-Q9NY26,
 Complex: COMPLEX:O00141-P0CG48-P23443-P42345-Q5VT52-Q6R327-Q8N122-Q9BPZ7-Q9BVC4-Q9NY26-Q9UBS3,
 Complex: COMPLEX:O00141-P23443-P42345-Q6R327-Q7L523-Q8N122-Q9BPZ7-Q9BVC4-Q9HB90-Q9NY26,
 Complex: COMPLEX:O00303-O15371-O15372-O75821-P06730-P23443-P42345-P55884-Q13542-Q6R327-Q7L2H7-Q8N122-Q9BVC4-Q9UBQ5-Q9Y262,
 Complex: COMPLEX:O00303-O15372-O75821-P23443-P42345-P55884-P62753-Q6R327-Q7L2H7-Q8N122-Q9BVC4-Q9UBQ5-Q9Y262,
 Complex phosphatidylinositol 3-kinase complex: COMPLEX:O00329-O00443-O00459-O00750-O75747-P42336-P42338-P42345-P48736-Q8NEB9-Q8WYR1-Q92569,
 Complex: COMPLEX:O15350-O43156-O95619-P04637-P42345-Q71UI9-Q92993-Q9H0E9-Q9NPF5-Q9UBU8-Q9Y230-Q9Y265-Q9Y4A5,
 Complex Yy1-Ppargc1a-Frap1 complex: COMPLEX:O15391-P25490-P42345-Q9UBK2,
 Complex: COMPLEX:O15530-O95782-P07900-P23443-P31749-P31751-P42345-P78527-Q04759-Q05513-Q05655-Q9Y243,
 Complex: COMPLEX:O15530-P06730-P23443-P42345-Q13542-Q6R327-Q8N122-Q9BVC4-Q9UBS0,
 Complex: COMPLEX:O15530-P06730-P23443-P42345-Q13542-Q6R327-Q8N122-Q9BVC4-Q9Y243,
 Complex: COMPLEX:O15530-P0CG48-P23443-P42345-P52736-Q6R327-Q8N122-Q96BR1-Q9BPZ7-Q9BVC4-Q9NY26,
 Complex: COMPLEX:O15530-P23443-P42345-P62753-Q6R327-Q8N122-Q96BR1-Q9BPZ7-Q9BVC4-Q9NY26-Q9UBS0,
 Complex: COMPLEX:O15530-P23443-P42345-Q6R327-Q8N122-Q96BR1-Q96J02-Q9BPZ7-Q9BVC4-Q9NY26,
 Complex: COMPLEX:O15530-P23443-P42345-Q6R327-Q8N122-Q96BR1-Q9BPZ7-Q9BVC4-Q9HBY8-Q9NY26,
 Complex: COMPLEX:O43156-O75925-P36508-P42345-Q9BVM2-Q9H6T3-Q9Y230-Q9Y265,
 Complex: COMPLEX:O43156-O95467-P42345-P61254-Q9BVM2-Q9H6T3-Q9Y230-Q9Y265,
 Complex: COMPLEX:O43156-O95619-P42345-Q71UI9-Q92993-Q9H0E9-Q9H6T3-Q9NPF5-Q9UBU8-Q9Y230-Q9Y265-Q9Y4A5,
 Complex: COMPLEX:O43156-O95831-P42345-Q6NUQ4-Q9BVM2-Q9H6T3-Q9Y230-Q9Y265,
 Complex: COMPLEX:O43156-P11766-P42345-Q6NXG1-Q96CM8-Q9BVM2-Q9H6T3-Q9Y230-Q9Y265,
 Complex: COMPLEX:O43156-P11766-P42345-Q96CM8-Q9BTY7-Q9BVM2-Q9H6T3-Q9Y230-Q9Y265,
 Complex: COMPLEX:O43156-P19388-P42345-P46934-Q6ZTN6-Q9BVM2-Q9H6T3-Q9Y230-Q9Y265-Q9Y4A5,
 Complex: COMPLEX:O43156-P20226-P36508-P42345-Q9BVM2-Q9H6T3-Q9Y230-Q9Y265,
 Complex: COMPLEX:O43156-P25490-P42345-Q6PI98-Q8NBZ0-Q9H6T3-Q9H981-Q9H9F9-Q9Y230-Q9Y265-Q9Y5K5,
 Complex: COMPLEX:O43156-P30533-P42345-Q14677-Q9H6T3-Q9Y230-Q9Y265,
 Complex: COMPLEX:O43156-P42345-P54278-Q6NUQ4-Q9BVM2-Q9H6T3-Q9Y230-Q9Y265,
 Complex: COMPLEX:O43156-P42345-P63104-Q6NXG1-Q9BVM2-Q9H6T3-Q9NX40-Q9Y230-Q9Y265,
 Complex: COMPLEX:O43156-P42345-Q14677-Q9H6T3-Q9Y230-Q9Y265-Q9Y4A5,
 Complex: COMPLEX:O43156-P42345-Q15029-Q96MX6-Q9H6T3-Q9Y230-Q9Y265,
 Complex: COMPLEX:P05387-P0CG48-P18124-P18621-P42345-P47914-P61254-P62750-P62899-Q02543-Q02878-Q70Z35-Q9NY93-Q9Y3U8,
 Complex: COMPLEX:P06730-P23443-P42345-P55884-Q13542-Q6R327-Q8N122-Q9BVC4,
 Complex: COMPLEX:P06730-P23443-P42345-Q13542-Q15208-Q6R327-Q8N122-Q9BVC4,
 Complex: COMPLEX:P0CG48-P23443-P42345-P62753-Q6R327-Q8N122-Q96BR1-Q96J02-Q9BPZ7-Q9BVC4-Q9NY26,
 Complex: COMPLEX:P15056-P23443-P42345-P49815-P62258-P62834-P63104-Q15382-Q6R327-Q8N122-Q9BVC4,
 Complex: COMPLEX:P23443-P42345,
 Complex: COMPLEX:P42345,
 Complex TORC2 complex: COMPLEX:P42345-P62750-Q6R327,
 Complex FKBP12-FK506 complex: COMPLEX:P42345-P62942,
 Complex: COMPLEX:P42345-P83436-Q14746-Q8WTW3-Q96JB2-Q96MW5-Q9H9E3-Q9UP83-Q9Y2V7,
 Complex: COMPLEX:P42345-Q00688,
 Complex: COMPLEX:P42345-Q02790,
 Complex: COMPLEX:P42345-Q13451,
 Complex: COMPLEX:P42345-Q13535-Q92616-Q9UIA9,
 Complex: COMPLEX:P42345-Q13535-Q96QU8,
 Complex: COMPLEX:P42345-Q13535-Q96QU8-Q9UIA9,
 Complex: COMPLEX:P42345-Q13541-Q15382-Q8N122-Q9BVC4,
 Complex: COMPLEX:P42345-Q13541-Q8N122-Q9BVC4,
 Complex TORC 2 complex: COMPLEX:P42345-Q3KP44-Q6R327-Q9BPZ7-Q9BVC4,
 Complex NSC: COMPLEX:P42345-Q6R327-Q8N122-Q9BVC4,
 Complex mTORC2 complex: COMPLEX:P42345-Q6R327-Q9BPZ7-Q9BVC4,
 Complex mTOR complex (MTOR, RICTOR, MLST8): COMPLEX:P42345-Q6R327-Q9BVC4,
 Complex mTOR complex (MTOR, RAPTOR): COMPLEX:P42345-Q8N122,
 Complex mTORC1: COMPLEX:P42345-Q8N122-Q8TB45-Q96B36-Q9BVC4,
 Complex TORC1 complex: COMPLEX:P42345-Q8N122-Q96B36,
 Complex mTOR complex (MTOR, RAPTOR, MLST8): COMPLEX:P42345-Q8N122-Q9BVC4,
 Complex: COMPLEX:P42345-Q96B36-Q9BVC4,
 Complex: COMPLEX:P42345-Q9BVC4,
 Complex: COMPLEX:P42345-Q9BVC4-Q9UJ68}

Note some of the complexes have human readable names, these are preferred at printing if available from any of the databases. Otherwise the complexes are labelled by COMPLEX:list-of-components.

Take a closer look on one complex object. The hash of the is equivalent with the string representation below, where the UniProt IDs are unique and alphabetically sorted. Hence you can look up complexes using strings as keys despite the dict keys are indeed pypath.intera.Complex objects:

In [19]:
cplex = complexdb.complexes['COMPLEX:P42345-Q13451']
In [20]:
cplex.components # stoichiometry
{'Q13451': 2, 'P42345': 2}
In [21]:
cplex.sources # resources

18: Saving datasets as pickles

The large datasets above are compiled from many resources. Even if these are already available in the cache, the data processing often takes longer than convenient, e.g. few minutes. Most of the data integration objects in pypath provide methods to save and load their contents as pickle dumps.

In [ ]:
# for `pypath.main.PyPath` objects:
pa.save_network('mynetwork.pickle') # save
pa.init_network(pfile = 'mynetwork.pickle') # load
# for `pypath.annot.AnnotationTable` objects:
a = annot.AnnotationTable(pickle_file = 'myannots.pickle')
# for `pypath.complex.ComplexAggregator` objects:
complexdb = complex.ComplexAggregator(pickle_file = 'mycomplexes.pickle')

19: Network in pandas.DataFrame

The original implementation of the network in pypath is based on igraph. Work is ongoing to provide a new and more flexible network builder which will result pandas.DataFrame and to make pypath independent from igraph. As a temporary solution you can easily convert the network to a pandas.DataFrame using the module.

In [22]:
from pypath import main
from pypath import data_formats
from pypath import network
In [23]:
pa = main.PyPath()

In [24]:
net = network.Network.from_igraph(pa)
In [25]:
id_a id_b type_a type_b directed effect type sources references
0 P17612 P20020 protein protein True 1 [PPI] {KEGG} {}
1 P17612 P20020 protein protein True -1 [PPI] {CA1, Wang} {9824678}
2 P20020 P17612 protein protein False 0 [PPI] {Wang, KEGG} {}
3 P0DP25 P20020 protein protein True 1 [PPI] {CA1} {6455424}
4 P20020 Q13507 protein protein False 0 [PPI] {TRIP} {18205297, 16887806}
5 Q13976 P20020 protein protein True 1 [PPI] {KEGG} {}
6 P20020 Q13976 protein protein False 0 [PPI] {KEGG} {}
7 P0DP24 P20020 protein protein True 1 [PPI] {CA1} {6455424}
8 P0DP23 P20020 protein protein True 1 [PPI] {CA1, Wang} {6455424}
9 P20020 P0DP23 protein protein False 0 [PPI] {Wang} {}

20: Log messages and sessions

Now pypath has an improved logger. All modules sends messages to a log file named by default by the session ID (a 5 char random string). The default path to the log file is ./pypath_log/pypath-xxxxx.log where xxxxx is the session ID. When you import pypath the welcome message tells you the session ID and the log file location.

In [ ]:
import pypath

Also by default this is the only message pypath prints directly to the console, otherwise it only messages to the log. Here is how you can access the session ID and the logger:

In [ ]:
In [ ]:
In [ ]:

From your scripts and apps you can also easily send messages to the logfile:

In [ ]:
pypath.session_mod.session.log.msg('Greetings from the pypath tutorial notebook! :)')
In [ ]:
with open(pypath.session_mod.session.log.fname, 'r') as fp:
    messages ='\n')
In [ ]:

If you create a class inheriting from pypath.session_mod.Logger it will be automatically connected to the session logger:

In [ ]:
class ChildOfLogger(pypath.session_mod.Logger):
    def __init__(self):
        pypath.session_mod.Logger.__init__(self, name = 'child')
    def say_something(self):
        self._log('Have a nice day! :D')

col = ChildOfLogger()

with open(pypath.session_mod.session.log.fname, 'r') as fp:
    messages ='\n')


Note, the log messages are flushed by default in every 2 seconds, but their timestamps always refer to the exact time the message has been sent. A second stamp shows the name of the sending submodule or class.

Finally see a log from a real pypath session:

In [ ]:
from pypath import main
from pypath import data_formats
pa = main.PyPath()
In [ ]:
with open(pypath.session_mod.session.log.fname, 'r') as fp:
    messages ='\n')


21: BEL export

Biological Expression Language (BEL, is a versatile description language to capture relationships between various biological entities spanning wide range of the levels of biological organization. pypath has a dedicated module to convert the network and the enzyme-substrate interactions to BEL format:

In [30]:
from pypath import main
from pypath import data_formats
from pypath import bel
In [ ]:
pa = main.PyPath()

You can provide one or more resources to the Bel class. Supported resources currently are pypath.main.PyPath and pypath.ptm.PtmAggregator.

In [31]:
b = bel.Bel(resource = pa)

From the resources we compile a BELGraph object which provides a Python interface for various operations and you can also export the data in BEL format:

In [32]:
<pypath.bel.Bel at 0x6ad3b70cc1d0>
In [33]:
<pybel.struct.graph.BELGraph at 0x6ad3b70cc790>
In [34]:
OmniPath vNone
Number of Nodes: 4927
Number of Edges: 70528
Number of Citations: 11930
Number of Authors: 0
Network Density: 2.91E-03
Number of Components: 84
Number of Warnings: 0
In [35]:
In [36]:
with open('omnipath_pathways.bel', 'r') as fp:
    bel_str =
In [37]:
Subject	Predicate	Object
P17612	directlyDecreases	P20020
P17612	directlyDecreases	P20020
P17612	directlyIncreases	P20020
P17612	directlyIncreases	P20020
P17612	directlyDecreases	Q14643
P17612	directlyDecreases	Q14643
P17612	directlyDecreases	Q14643
P17612	directlyDecreases	Q14643
P17612	directlyIncreases	Q14643
P17612	directlyIncre

22: CellPhoneDB export

CellPhoneDB is a statistical method and a database for inferring inter-cellular communication pathways between specific cell types from single-cell data. OmniPath/pypath uses CellPhoneDB as a resource for interaction, protein complex and annotation data. Apart from this, pypath is able to export its data in the appropriate format to provide input for the CellPhoneDB Python module. For this you can use the pypath.cellphonedb module:

In [26]:
from pypath import cellphonedb
from pypath import settings

settings.setup(network_expand_complexes = False)

Here you can provide parameters for the network or provide an already built network. Also you can provide the datasets as pickles to make them load really fast. Otherwise this step will take quite long.

In [27]:
c = cellphonedb.CellPhoneDB()

You can access each of the CellPhoneDB input files as a pandas.DataFrame and also they've been exported to csv files. For example the interaction_input.csv contains interactions from all the resources used for building the network (here Signor, SingnaLink, etc.):

In [28]:
id_cp_interaction partner_a partner_b protein_name_a protein_name_b annotation_strategy source
0 CPI-000001 P17612 P20020 KAPCA_HUMAN AT2B1_HUMAN CA1,KEGG,OmniPath,Wang PMID: ,PMID: 9824678
1 CPI-000001 P20020 P17612 AT2B1_HUMAN KAPCA_HUMAN KEGG,OmniPath,Wang
2 CPI-000002 P0DP25 P20020 CALM3_HUMAN AT2B1_HUMAN CA1,OmniPath PMID: 6455424
3 CPI-000003 P20020 Q13507 AT2B1_HUMAN TRPC3_HUMAN OmniPath,TRIP PMID: 16887806,PMID: 18205297
4 CPI-000004 Q13976 P20020 KGP1_HUMAN AT2B1_HUMAN KEGG,OmniPath
5 CPI-000005 P20020 Q13976 AT2B1_HUMAN KGP1_HUMAN KEGG,OmniPath
6 CPI-000006 P0DP24 P20020 CALM2_HUMAN AT2B1_HUMAN CA1,OmniPath PMID: 6455424
7 CPI-000007 P0DP23 P20020 CALM1_HUMAN AT2B1_HUMAN CA1,OmniPath,Wang PMID: 6455424
8 CPI-000008 P20020 P0DP23 AT2B1_HUMAN CALM1_HUMAN OmniPath,Wang
9 CPI-000009 Q96QT4 P35579 TRPM7_HUMAN MYH9_HUMAN Adhesome,OmniPath,TRIP PMID: 16407977,PMID: 18394644,PMID: 18675813

The proteins and complexes are annotated (transmembrane, peripheral, secreted, etc.) using data from the pypath.intercell module (identical to the query of the web service):

In [29]:
uniprot protein_name transmembrane peripheral secreted secreted_desc secreted_highlight receptor receptor_desc integrin other other_desc tags tags_reason tags_description
0 P55087 AQP4_HUMAN True True False False False
1 O43184 ADA12_HUMAN True True True False False
2 P24001 IL32_HUMAN True False False False False
3 Q92956 TNR14_HUMAN True True False True False
4 P54284 CACB3_HUMAN False False False False False
5 O60542 PSPN_HUMAN False False True False False
6 P48426 PI42A_HUMAN False False True False False
7 P81172 HEPC_HUMAN False False True False False
8 Q9HD43 PTPRH_HUMAN True True False True False
9 P01130 LDLR_HUMAN True True False True False
In [ ]: