PPaxe is a tool to retrieve protein-protein and genetic interactions from the scientific literature by analyzing the sentences using the StanfordCoreNLP and classifying co-ocurrent proteins/genes as interacting/non-interacting with a Random Forest Classifier.Search Form
PPaxe takes a list of PubMed identifiers as an input, separated by commas or newline characters. Once the identifiers have been written, the user has to select either PubMed (to look for interactions in the Abstract of the articles) or PubMedCentral (to look for interactions in the full-text article when available on PubMedCentral). Please, note that all the identifiers have to be PubMed identifiers, whether the search is going to be performed on PubMed or PubMedCentral.
Alternatively, users can also specify a PubMed query by clicking on the "INPUT PUBMED QUERY" buton and writing the query on the textbox. The PubMed identifiers will be written automatically on the PPaxe form.
Finally, users can also upload a plain-text file from which PPaxe will retrieve the interactions.
When searching more than 30 identifiers, PPaxe will request an e-mail to send the results when available, as the search may take a while. Although the search can still be performed without providing an e-mail, and users will be able to retrieve the results even if the browser window is closed (see next section).Performing the Analysis
Once the job has been submitted to PPaxe, the application will provide the user a Job identifier. PPaxe will perform the requested analysis, informing the user of the progress. At this point, the browser window can be closed at any moment, as long as the user saves the job identifier. In order to check the progress (or see the results of the analysis if it is finished), users can go to https://compgen.bio.ub.edu/PPaxe/job/job-identifier. Please, note that results will only be saved for one week before being erased.Search by Organism
As of right now, PPaxe does not have a built-in species recognizer, that is, PPaxe can't decide from which organism a retrieved interaction is.
Available options for standard and advanced PubMed queries are described at NCBI tutorial. However, when performing PubMed queries, one can restrict the result to those articles with a particular MeSH term (which could include one or more organisms). By doing this, researchers can limit the articles to analyze to only those that refer to a particular species.
One PubMed query that can be performed is:
Once the search has been performed, the results will be displayed at the bottom of the page. A small summary table and an option to download the report as a PDF will appear at the top of the results section.
The Interactions found on the requested articles will be displayed on the "Interactions" table. This table includes a confidence value (which is the normalized percentage of votes of the classifier, ranging from 0 to 1), the names of the proteins/genes, the PubMed identifier of the article, the publication year, and the sentence from which the interaction was retrieved (with the proteins displayed in blue, and the verbs in red). The table can be downloaded by clicking on the 'Download table' button.
The Proteins table shows how many times each protein or gene symbol has appeared in the different sentences. Each column corresponds to the different contexts in which the protein symbol appears: Total Count is the sum of all the ocurrences of a particular protein symbol; Int. Count corresponds to the number of times a particular protein appears in a retrieved interaction; Left/Right Count refer to the times a symbol is present on the "left" or the "right" of a particular interaction, e.g:
PPaxe also provides a Graph visualization.
The "Center" button resets the visualization. The Layout dropdown menu allows users to change the layout of the network. And the Export table and Export png allows users to save the network as a tabular file, and as a PNG image respectively.
Finally, PPaxe also displays several plots regarding the articles in which the proteins have been found.