Work Progress

Work Progress

The Bologna Node (Scientific Unit) in the FIRB-LIBI project is involved in the following subtasks:

Implementation of gene prediction methods in genomes from different organisms and identification of new genes with specific characteristics among the annotated genes.

Implementation and development of new algorithms based on automatic learning methods for the predictions of structural and functional characteristics of proteins starting from their sequence.

Development of automatic tools for 3D structure prediction starting from the protein sequence both of globular and membrane proteins.

In the first year of activity (9-2005/9-2006) we focused on the following:

GRID Integration for Technological transfer:

Ongoing projects: Genome comparison at large (CNAF/INFN); Massive sequence alignments (SPACI).

In order to perform massive genome-wide annotation, it is usually necessary to run PSIblast queries on sequence DBs, such as UNIREF-90, for each sequence of the genome. A GRID-enabled psi-blast system has been developed together with SPACI, that allows for rapid genome-wide PSIblast using GRID technology. A web portal has been designed, and implemented by SPACI people, that allows for input submission via GRID. GRID clients nodes have been set-up in order to accept the result of the computation. A parallel PSI-Blast, based on a Master- Workers architecture has been designed and implemented, that distributes the input sequences on the computing GRID, runs PSI-blast queries, parses and integrates the results of the worker nodes, and returns the results to the client nodes via the GLOBUS GRIDftp service. Runs have been performed on the entire human genome with good success: give a large number of worker nodes he bottleneck of the entire process is the geographical network transfer of the results across the GRID, and not the PSI-Blast run itself.

Scientific projects and web servers for the platform

UNIBO - University of Bologna

Scientific Coordinator: CASADIO Rita; Dept: Biologia Evoluzionistica Sperimentale

Program integration: Implementation and development of new algorithms based on automatic learning methods for the predictions of structural and functional characteristics of proteins starting from their sequence.

Program integration: Development of automatic tools for 3D structure prediction starting from the protein sequence both of globular and membrane proteins.

Synopsis: Starting from the protein sequence several features can be predicted provided that algorithms are available capable of generalizing over a given property. In other words given a set of examples where sequences are related to the property at hand machine learning methods can extract the general rules relating inputs to outputs, and extrapolate over never seen before examples. Adopting this approach several predictors have been implemented addressing several problems of computational biology and giving heuristic solutions to functional and structural genomes annotations

Results: For the specific tasks addressed by the FIRB-LIBI project we developed:

one predictor suited to predict for a given sequence the subcellular localization. This predictor is useful for the functional prediction of all the eukaryotic genomes, and it hab been applied to wide scale analysis of 5 genomes, including the human genome.

one predictor suited to predict starting from the protein sequence whether a given mutation may or may not be related to a genetic desease

one predictor for coupling a given protease to its specific inhibitor/s

one web server that implements several predictors of the topology of membrane proteins

Products:

● Amico M, Finelli M, Rossi I, Zauli A, Elofsson A, Viklund H, von Heijne G, Jones D, Krogh A, Fariselli P, Martelli PL, Casadio R -PONGO: a web server for multiple predictions of all-alpha transmembrane proteins- Nucleic Acids Res 34(Web server issue):169-172 (2006)

http://pongo.biocomp.unibo.it/

● Pierleoni A, Martelli PL, Fariselli P, Casadio R -BaCelLo: a balanced subcellular localization predictor- Bioinformatics 22:e408-e416 (2006)

http://gpcr.biocomp.unibo.it/bacello/

● Capriotti E, Calabrese R, Casadio R -Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information- Bioinformatics (in press, 2006)

http://gpcr.biocomp.unibo.it/cgi/predictors/PhD-SNP/PhD-SNP.cgi

● Pierleoni A, Martelli PL, Fariselli P, Casadio R _eSLDB: eukaryotic Subcellular Localization Data Base- Nucleic Acids Res (in press, 2007)

http://gpcr2.biocomp.unibo.it/esldb/

UNIROMA2 - University of Rome "Tor Vergata"

Scientific Coordinator: HELMER-CITTERICH Manuela; Dept: Biology

Synopsis: Aim of the project is the construction of a web server for protein functional annotation based on local structural similarity.

Results: For this purpose, a new database of annotated residues of known structure (pdbScan) was built, where each residue is associated with the following data:

1) residue name;

2) protein chain;

3) PDB code;

4) secondary structure;

5) SMART domain;

6) SCOP number;

7) CATH classification;

8) Pfam code;

9) catalytic site;

10) conservation in the HSSP alignment;

11) ligand binding ability;

12) PROSITE pattern.

In the actual implementation, pdbScan utilizes mmCIF structure files, is now able to handle modified residues (i.e. phosphorylated residues and other residues with post-translational modifications), can take into account also NMR data (by choosing one representative structure) and is fully integrated in a software for local protein structure comparison, based on the Query3D engine (Ausiello et al., 2005).

The prototype can handle xml input and output and is being tested on a set of known cases. Interesting results were obtained in the analysis of different substrates of the Src kinase and experimental work is in progress (with the collaboration of dr. Stefania Gonfloni) to address their biological relevance.

Products:

Ausiello, G., Via, A., Helmer-Citterich, M. (2005). Query3d: a new method for high-throughput analysis of functional residues in protein structures. BMC Bioinformatics, 6(Suppl 4):S5.
Ferraro, E., Via, A., Ausiello, G., Helmer-Citterich, M. (2005). A neural strategy for the inference of SH3 domain-peptide interaction specificity. BMC Bioinformatics, 6(Suppl 4):S13.
Ferraro, E., Via, A., Ausiello, G., Helmer-Citterich, M. A novel structure-based encoding for machine-learning applied to the inference of SH3 domain specificity. Bioinformatics Jul 26; [Epub ahead of print].

UNIROMA1 - University of Rome "La Sapienza"

Scientific Coordinator: TRAMONTANO Anna; Dept: Biochemistry "Rossi Fanelli"

Program integration: Implementation of gene prediction methods in genomes from different organisms and identification of new genes with specific characteristics among the annotated genes.

Synopsis: Protein-protein interactions are at the basis of any cellular process and crucial for understanding many bio-technological applications. During the last few years the development of high-throughput technologies has produced several large-scale protein-interaction data sets for various organisms and many interaction databases have been created by means of data-mining techniques. It is well known that interactions may be mediated by the presence of specific features, such as motifs, patches and domains. Even if many efforts are underway to elucidate the role of these features in the regulation of the interaction network very little is known about this on a genome scale. Data-integration and computational methods are fundamental tools to gain insight into such data, to assign a confidence level to singular interactions or to complete data sets and to get clues of the molecular basis that regulate such interactions.

Results:

We have combined yeast protein interaction data with other biological resources, such as sequences, process and component ontologies and domains to construct a high-confidence interaction set by the identification of similar features in sets of proteins which share a common interaction partner. We have analyzed the presence of similar linear motifs, functions, localization and domains in such groups of proteins for different datasets and measured their statistical relevance. For each dataset this analysis revealed a statistically significant presence of shared motifs with respect to random datasets.
We have also investigated the reciprocal correlation between all these data (motifs, annotation and domain presence) and we have found that the analysis of shared motifs in PPI maps provides a source of information consistent but not redundant with the other sources we have analysed. Finally we have provided an example of how this analysis can be used to make inferences about unknown protein features, such as function or localization.
Our study shows that the analysis of shared motifs in protein interaction networks can be a valuable method to investigate the properties of interacting proteins. It proves to be a source of information that can integrate the other sources used, and, as more experimental interaction data will become available, it will be a useful tool to gain a wider picture of the interactome.

Modalità con le quali si documen

Modalità con le quali si documentano i risultati

Unita di Ricerca UNIBO (http://www.biocomp.unibo.it/firb/)

Modalità

SI/NO

Descrizione

3.1	pubblicazioni scientifiche;

Amico et al, 2006; Pierleoni et al, 2006 a; Pierleoni et al 2006b; Capriotti et al, 2006; Ausiello et al., 2005; Ferraro et al., 2005; Ferraro et al., 2006; Marcatili and Tramontano, submitted, 2006.

3.2	pubblicazioni su supporto informatico (CD, web, etc);

http://www.biocomp.unibo.it/firb/ and related links

3.3	edizioni critiche, lessici, liste di frequenza, etc.;

3.4	rapporti tecnici e/o progetti;

GRID Integration for Technological transfer: Genome comparison at large (CNAF/INFN); Massive sequence alignments (SPACI).

http://www.biocomp.unibo.it/firb/ and related links

3.5

brevetti;

3.6	comunicazioni a congressi nazionali;

BITS2006 Bologna Convegno Nazionale di Bioinformatica

3.7	comunicazioni a congressi internazionali;

ECCB, Madrid September2005;

Automated Protein Function Prediction Meeting, San Diego, August 2006;

RECOMB2006, Venezia, April 2006; ISMB06, Fortaleza, August 2006.

3.8	diffusione dei risultati sul piano informativo;

The eProtein scientific meeting, Wellcome Trust Conference, Hinxton, UK, 2006

Partecipazione a Research to Business 2006, Bologna e EXPOSANITA’ 2006, Bologna

http://www.biocomp.unibo.it/firb/ and related links

3.9	diffusione dei risultati sul piano formativo;

European School of Genetic Medicine, 6th course in Bioinformatics for Molecular Biologists, Bertinoro 2006.

Bologna Winter School 2006.

Applied Bioinformatics. Tablets of bioinformatics for everybody- UTPL, Universidad Tecnica Particular de Loja, Loja (Ecuador)

http://www.biocomp.unibo.it

3.10	diffusione dei risultati sul piano divulgativo;

Ricerca scientifica ed energia del futuro, Senigallia (AN), 16/09/05-