Overview

The ETAscape Cytoscape plugin allows users to make predictions and visualize the basis for those predictions.  It is built on the Cytoscape network analysis and visualization platform.

Downloading / Installing

Cytoscape version 2.8.2 or later must be installed prior to installing the plugin. It can be downloaded from http://www.cytoscape.org/download.html. Next, the plugin can be downloaded from mammoth.bcm.tmc.edu/networks.  Place it in the “plugins” folder in your Cytoscape directory and launch Cytoscape. When you first launch, ETAscape will open tutorial mode, which will guide you through basic usage of the plugin.

Screenshot of Cytoscape with ETAscape, showing the Tutorial Mode message which appears after the first launch. Once installed, the commands for ETAscape are available under the “ETAscape” option in the “Plugins” menu in Cytoscape.

Loading a network

All ETAscape commands are available under “ETAscape” under the “Plugins” menu option. ETAscape includes a precalculated network built from a subset of the Protein Data Bank (PDB) filtered to 90% sequence identity. We have calculated ETA matches among all proteins in the network and show them as edges.  You can load this network by selecting the “Import Precalculated ETA Network” menu option under the ETAscape menu.  This loads the ETA network, labels nodes with known enzymatic function and performs a layout showing the natural clustering of proteins with similar function.

An ETA that has been loaded into Cytoscape. The ETAscape commands in the “ETAscape” menu under “Plugins” is also shown.

Viewing a network

Once loaded, the network can be viewed in the main window of Cytoscape. You can zoom in and out using the scroll wheel, or the Zoom In / Out commands in the Cytoscape toolbar.  Holding down the mouse-wheel allows you to pan.  Alternatively, you may use Cytoscape’s global view in the bottom left of your screen to change the visible region of the network.  By default, the node layout is set to force directed, so short distances between nodes indicate better template matches.  The layout may be changed using Cytoscape’s layout menu.  Node colors are set from the first two digits of the Enzyme Commission (EC) number with the first digit determining the hue and the second determining the saturation.

A loaded network zoomed into a portion of the ETA matches.

Finding a specific protein

You may search for a specific protein by using its pdbID, including the chain ID, in Cytoscape’s search box.  Once you have run a search, Cytoscape will select the node, highlight it in yellow, and center the network view on the node if it is found.

A PDB ID has been entered into the Search box (top of screen).  If it is in the network, Cytoscape centers and zooms on the appropriate node.

Viewing Templates

You may view the templates for a protein by highlighting a pair of proteins and selecting the “Show Templates” menu option. This opens several Jmol sessions, depicting the protein structure, and highlighting the ETA templates and matches on each structure.  Reciprocally matching residues are shown in purple, with non-overlapping matches shown in red and blue.

Jmol options can be accessed by right clicking in the Jmol window.  For example, by default the proteins are set to spin.  This may be deactivated to allow manual positioning of the structures by right clicking and choosing the “Spin” menu.

Jmol window and options menu.

Link to PDBsum and ET server

Right click a node to bring up the linkout options.  You may view additional information on this protein structure from PDBsum, including details of the publication, enzymatic function, sequence and more. Additionally, you may view the details of the Evolutionary Trace for this protein on the ET server or the Unified ET interface (UET).

Linkout options, available when right-clicking a node.

Adding new proteins

New proteins may be added one at a time to the network through the use of the ETA server. Selecting the “Add new node to network” will launch your browser and open the ETA server, which is used to construct the template for your protein. Enter a Protein Data Bank ID (either 2gtxA or 1poaA makes for good test cases) and press “Submit”. Advanced users may enter their own custom traces.  ETA server will then proceed to step two, Template Selection.  By default, ETA heuristically selects a 6-residue template from a cluster of evolutionarily important surface accessible residues.  If you wish to customize the template, you may do so by entering residue numbers and selecting the proper checkbox.  As you change your selection, the image will update, allowing you to visualize the template.  When you are finished with template selection, press the “Submit Template” button.  If it is not available, it is probably because you have not selected 6 residues to make up the template.  The final step is to customize the amino acid labels.  The labels are amino acids that are allowed at each template position.  By default they are selected from the alignment.  You may allow other arrangements of labels by adding rows to the table with the “Add Custom Labels” button, and entering the single letter amino acid codes for amino acids you would like to allow.  When finished, press the “Add to Cytoscape Network” button.

Steps to adding a protein to the network.  1) Enter the pdb id and click “Submit”. 2) Select which residue positions should form the template, or leave defaults. 3) Select which evolutionarily observed residue combinations should match this template, or leave defaults. 4) Return to the Cytoscape application and select “Add these to Network”.  Alternatively, do “Add Another” to launch another browser window.

More detailed information about the ETA server is available from the information icons next to each step.  Also, please see the ETA server manual, available at http://mammoth.bcm.tmc.edu/eta/manual.html for detailed instructions and examples. Please note that if you are using a Cisco VPN you may experience connection issues due to IP compatibility issues.

Saving and loading a network

After customizing a network by adding new protein structures, you may save the resulting network to a file using the “Save Network” menu option.  This will write the network and associated data to a file, which you may name and load later using the “Load Network” menu option.

Predicting function

ETAscape uses a network diffusion model to predict protein function (for more details please see Appendix A).  To run it, select “Run Diffusion” from the plugin menu.  In brief, this propagates function according to the pattern of network connectivity from nodes with known function to those with unknown function.  The function with the strongest influence over a particular node becomes our prediction, and the normalized magnitude of that influence serves as a confidence measure for the prediction.  Confidences measures fall into two levels: “High” and “Moderate”.  Furthermore, not all functions will have a prediction.  Nodes to which there was no function that was able to be propagated with a significant score are marked with “No Prediction”.  Node functions are editable, so if you would like to see how the predictions would change if a protein had a different or known function you may edit the “Known EC” field and rerun the diffusion to see the results.

After diffusion, selecting nodes (highlighted in yellow in the center) will show the predicted function in the table in the lower pane.  In this case, nine proteins are selected.  For six of these, ETA was able to make a high confidence prediction; for the other three ETA was unable to make a prediction.

Viewing basis for predictions

The “Show Influencing Proteins” menu option uses the diffusion algorithm to identify the protein neighborhood around a protein of interest. To use it, select a protein or small group of proteins and select the “Show Influencing Proteins” menu option.

The proteins with the most influence during diffusion over the selected protein (2pa3A, highlighted in yellow).  Cytoscape has created a subnetwork view for these proteins.  Return to the original network by clicking on “ETA Network” in the navigation menu on the left.

Appendix A: Methods

Background on Evolutionary Trace Annotation: ETA starts by running the Evolutionary Trace algorithm on all protein structures in both the query and target sets. ET identifies amino acids whose pattern of conservation correlates with a phylogenetic tree and has been well validated for identifying functional sites. ETA selects a surface accessible cluster of amino acids under the intuition that these clusters may be functionally important. Next, we search for 3D geometric matches to this template in a library of proteins with known function. All matches must be reciprocal. Finally, we use a support vector machine (SVM) to filter matches based on the RMSD between the template matches and the difference in ET scores between the matching residues.

Predicting function with Network Diffusion: We begin by performing an all-against-all ETA search in order to construct a network in which nodes represent proteins and edges represent ETA matches.  Edge weights are calculated as

 =  

where rmsd is the root mean square deviation of the template with the matching residues,  is the average rmsd in all matches, is the std deviation across all rmsd’s, is the average absolute value difference between the ET ranks of matching residues, is the average of the ETScores, and  is the standard deviation of the ETScores.

Next we select a 4 digit EC number c, and define the vector y  Rn where n is the number of nodes in the network and y[i] = 1 if protein i is known to have function c, -1 if it is known to not have function c, and 0 otherwise. We then define the cost function  which in the first term trades off loss (the desire to retain the previously known enzymatic functions) against smoothing (the desire for neighboring proteins to share function). This cost function can be minimized directly using f =  where L is the graph laplacian and calculated as L = D - W, where D is the diagonal matrix ( = ). We repeat this process for every function, resulting in a score for each protein function pair.  We pick the largest such score as our predicted function, and calculate a z-score to use as a confidence value as z =  where and  are the mean and standard deviations of the f values respectively, calculated over the unannotated proteins in the network.

Appendix B: Differences between ETAscape matching process and the ETA server process.

When annotating a protein, users may notice occasional differences between the predictions provided by the ETA server and those provided by ETAscape.  There are several reasons that this can occur. First, ETA relies on an SVM to filter matches that are not considered significant. The ETA server uses a Matlab SVM package.  However, in order to distribute ETAscape, we re-implemented the SVM in LIBSVM.  Second, we updated the training set that we used to train the new SVM. The old testset was quite small - the new one includes both true and false-positive matches between over 1300 enzymes. As a side effect of the training process, the SVMs are using a slightly different set of parameters.  Third, there is a slight change in the datasets used by the two methods.  For efficiency, we have removed nodes from the networks if they are a member of a completely unannotated cluster, as they add no useful information to a search result. However, these proteins are included in matches shown on the ETA server, and may be a source of discrepancy. Finally, The ETA server is able to predict GO terms and 3 digit EC numbers, whereas ETAscape makes only EC number predictions at both the 3rd and 4th EC levels.

Licensing Information

COMMERCIAL LICENSING INFORMATION

For profit use of the Evolutionary Trace Annotation software may be arranged by contacting Mrs. Lisa Beveridge (beveridg@bcm.tmc.edu, 713-798-6821) to request a license from Baylor College of Medicine.

ACADEMIC LICENSE AGREEMENT

This Academic License Agreement (Agreement) has been adopted for use by Baylor College of Medicine (BAYLOR), a Texas non-profit corporation, for the licensing of software programs for use for internal purposes only by employees at non-profit academic institutions of higher learning.

1. DEFINITIONS

1.1 The term Licensed Code shall mean the software entitled ETA, developed by Olivier Lichtarge, R. Matthew Ward, Tuan Tran, David Kristensen, Serkan Erdin, and Eric Venner (collectively the Authors), employees of BAYLOR.

2. GRANT OF LICENSE

2.1 BAYLOR hereby grants to you a non-exclusive and non-transferable, right and license to use, modify, prepare derivative works of and execute solely for internal, research (non-commercial) purposes, the Licensed Code. This license shall be for use only on your computer (the Computer). The foregoing license does not include the right to copy the Licensed Code, or portions thereof, for any reason.

2.2 You acknowledge that the Licensed Code, including, but not limited to, all rights under federal copyright laws, are and shall remain at all times the exclusive property of BAYLOR. Any modifications or derivative works based on the Licensed Code are and shall be considered part of the Licensed Code and ownership thereof is retained by or vested in BAYLOR. You shall provide to BAYLOR reports of such modifications or derivative works and such modifications or derivative works shall be made available to BAYLOR upon receipt of a written request for such from BAYLOR.

2.3 You shall not distribute or transfer the Licensed Code to any other company, institution or person without the prior written permission of BAYLOR.

3. NO UPGRADES/NO SERVICE/DELIVERED AS IS

3.1 It is expressly understood and agreed that the Licensed Code and Manuals will be delivered to you on as "AS IS" basis as of the Agreement Date and that neither BAYLOR nor the Authors will be responsible for future support of the Licensed Code.

3.2 You acknowledge that you will not be entitled to any Licensed Code upgrades and that BAYLOR will not provide maintenance for the Licensed Code. If you find defects, bugs, in the Licensed Code, You shall notify BAYLOR of the existence of the bugs. Neither BAYLOR nor the Authors shall, however, have any obligation to fix any bugs in the Licensed Code.

4. DISCLAIMER OF WARRANTY

BAYLOR MAKES NO WARRANTIES OR REPRESENTATIONS, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, WARRANTIES OF FITNESS OR MERCHANTABILITY, REGARDING OR WITH RESPECT TO THE LICENSED CODE. BAYLOR MAKES NO WARRANTIES OR REPRESENTATIONS, EXPRESS OR IMPLIED, OF THE PATENTABILITY OR COPYRIGHTABILITY OF ANY OF THE LICENSED CODE OR OF THE ENFORCEABILITY OF ANY PATENTS OR COPYRIGHTS ISSUING THEREUPON, IF ANY, OR THAT THE LICENSED CODE, IS OR SHALL BE FREE FROM INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER RIGHTS OF THIRD PARTIES. BAYLOR MAKES NO WARRANTIES OR REPRESENTATIONS, EXPRESS OR IMPLIED, THAT THE LICENSED CODE SATISFIES REGULATORY REQUIREMENTS. YOU ACKNOWLEDGE THAT THE LICENSED CODE IS BEING LICENSED AS IS. BAYLOR HAS NOT TESTED THE LICENSED CODE FOR VIRUSES OR OTHER DEFECTS NOR HAS BAYLOR COMPLETED TESTING THE LICENSED CODE.

5. LIMITATION OF DAMAGES

BAYLOR shall not be liable for any monetary damages whatsoever with respect to your use of the Licensed Code nor shall BAYLOR be liable for any special, indirect, incidental or consequential damages arising out of or related to this Agreement, even if BAYLOR is advised of such damages.

6. INDEMNIFICATION

You will indemnify and hold BAYLOR, BAYLOR's trustees, officers, agents, employees, students, persons holding academic appointments within BAYLOR, and affiliated hospitals (the Indemnified parties) harmless from and against any and all claims, causes of action or lawsuits for personal injury (including death), property damage, and any other losses of any nature together with related expenses (including attorney's fees) made against the Indemnified parties resulting directly or indirectly from the use or possession of the Licensed Code by you, regardless of whether such claim, causes of action, lawsuits, other proceedings and the costs (including attorney's fees) related thereto result in whole or in part from the negligence of any of the Indemnified parties.

The Jmol library is used according to the LGPL license.

LIBSVM copyright info available here.

Apache's commons-math licensing info available here.