Motivation: Automated function prediction (AFP) of proteins has great significance in biology. In essence, AFP is a large-scale multi-label classification over pairs of proteins and GO terms. Existing AFP approaches, however, have their limitations on both sides of proteins and GO terms. Using various sequence information and the robust learning to rank (LTR) framework, we have developed GOLabeler[1], a state-of-the-art approach of CAFA3, which overcome the limitation of the GO term side, such as imbalanced GO terms. Unfortunately, for the protein side issue, available abundant protein information, except for sequences, have not been effectively used for large-scale AFP in CAFA.

Results: We propose NetGO, which improves large-scale AFP with massive network information. NetGO has threefold in using network information, 1)The powerful LTR framwork of NetGO efficiently and effectively integrates both sequence and network information, which can easily handle the task of large-scale protein function prediction; 2) NetGO uses whole and massive network information of all species in STRING (other than only high confidence links and/or some specific species); 3) NetGO can annotate proteins without network information by homology transfer. Under numerous experimental settings, we examined the performance of NetGO, such as general performance comparison, speciesspecific prediction, and prediction on difficult proteins, by using training and test data separated by the time-delayed setting of CAFA. Experimental results have clearly demonstrated that NetGO outperforms GOLabeler, DeepGO, and other compared baseline methods.

References

  1. You R, Zhang Z, Xiong Y, et al. GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank. Bioinformatics, Volume 34, Issue 14, 15 July 2018, Pages 2465–2473.
Note: Note: The maximum supported number of proteins for each online submission is 1,000. You can click "Help" to get the tutorial of NetGO predictions. If the number of proteins in your job exceeds 1,000, you can divide them to separated jobs, or please send your whole input file to us at swyao18@fudan.edu.cn.
Show an exampleExample File

Enter Protein Sequence(s) (FASTA Format)

Or upload a fasta file

(Optional) Your email address

Clear input

Run as GOLabeler

submit

The Tutorial on NetGO Predictions

NetGO improves large-scale automated function prediction (AFP) with massive network information. The steps on how to make a prediction and explanations of the meanings of the query results are detailed as follows.

A. How to obtain predictions

This is a screenshot of an input page. The numbers in red refer to the below different sections.

the main frame

1. Input a protein sequence(s)

In the first place, you should specify the sequence(s) which you want to predict. The sequences can either be typed directly into the text area, or can be uploaded from a file using the button. If both the text area and the file uploaded containing sequences, this server will only consider the sequence(s) in the text area and the sequence(s) in the uploaded file will be ignored.

Only the FASTA Format is acceptable to this server: By FASTA Format, long protein sequences in one or multiple lines, each of which begins with '>', are allowed. The lines starting with '>' are treated as the identifiers of the following sequences.

All sequences have to be amino acids specified in a single letter code (ACDEFGHIKLMNPQRSTVWYVBZX*). Any other non-white space characters will be rejected by the input processor with notification. Also, our input processor will check the empty sequence and the uniqueness of sequence identifiers, and a warning will be given when aucleotide-like sequences be found.

If you have trouble in choosing protein sequences, the web page also provides an example. Please click the "Show an example" to provide an example for NetGO. You can also click "Example File" to download an example file.

2. Input your email address(Optional)

For this option, you can input your email address. We will send you a confirmation email after your submission. You will then receive an email to inform you that prediction results are available. Althought it is optional, we highly recommend you use this service.

After you have followed all the above steps, please press the "Submit" button to process the prediction. We will provide you with an a job id and a web link to the results. The page will refresh automatically when when your query results are available (If you had provided your email address, you will receive an email) notification . At any time, you can also track your job status by directly clicking the link or entering your job id in the "Check" page.

The time cost of making a prediction depends on the number of input data points. Usually, it will run relatively quickly. The below table on running times is for your reference.

Protein Num GOLabeler NetGO
1 204.72s 230.93s
100 914.85s 997.12s
200 1502.87s 1573.13s
400 2707.82s 3094.76s
1000 4879.79s 5538.76s

B. Interpreting the prediction output

When a prediction has been made, you can obtain your query results just in the way that you track your job status. The following is a screenshot of a result page. The numbers in red refer to different sections of the interface.

the main frame

1. Information about results (Part 1)

We show some basic information about results and provides a link to download your results.

Prediction results for a protein (Part 2)

We will only show the result of the first 10 (20 or 30 depending on the user's choice) proteins.

(1) Protein name (2.(1))
For each protein, you can click the protein name to obtain its information in UniProt.

(2) Prediction results shown in gragh (2.(2))
We visualize the top m (m=20 by default, and can be set to 30, 50 or 100) predicted GO terms according to the GO structure and organize all of them in a gragh. Note that GO terms of high confidence (score > 0.6) will be emphasized with colors ([1.0, 0.9), [0.9, 0.8), [0.8, 0.7), [0.7, 0.6), [0.6, 0.0]). The top predictions of GO terms are visualized by using the AmiGO API and the meaning of coloured lines between the GO terms are described in AmiGO Manual: Visualize.
You can click the graph to show a high resolution version.

(3) Prediction results listed in a table (2.(3))
The top 20 predicted GO terms are also shown in a table. The three columns list GO terms, corresponding scores, names respectively. You can click on GO terms to display their detailed information. GO terms of high confidence (score > 0.6) are also be emphasized with colors ([1.0, 0.9), [0.9, 0.8), [0.8, 0.7), [0.7, 0.6), [0.6, 0.0]).

C. Browser compatibility

We have tested our system and browser compatibility, as shown below:
OS Version Chrome Firefox Microsoft Edge Safari
Linux Ubuntu 16 71.0 64.0 n/a n/a
MacOS 12.0.1 71.0 64.0 n/a 10.14.2
Windows 10 70.0 64.0 42.17134.1.0 n/a

Job id

submit
For any scientific problems, please contact Shanfeng Zhu (zhusf@fudan.edu.cn).

If any bug occurs or for technical problems, please contact Shuwei Yao (swyao18@fudan.edu.cn).

We will highly apprecitate your support and kindness.