Work Progress
The Bologna Node (Scientific Unit) in the FIRB-LIBI project is involved in the following subtasks:
In the first year of activity (9-2005/9-2006) we focused on the following:
GRID Integration for Technological transfer:
Ongoing projects: Genome comparison at large (CNAF/INFN); Massive sequence alignments (SPACI).
In order to perform massive genome-wide annotation, it is usually necessary to run PSIblast queries on sequence DBs, such as UNIREF-90, for each sequence of the genome. A GRID-enabled psi-blast system has been developed together with SPACI, that allows for rapid genome-wide PSIblast using GRID technology. A web portal has been designed, and implemented by SPACI people, that allows for input submission via GRID. GRID clients nodes have been set-up in order to accept the result of the computation. A parallel PSI-Blast, based on a Master- Workers architecture has been designed and implemented, that distributes the input sequences on the computing GRID, runs PSI-blast queries, parses and integrates the results of the worker nodes, and returns the results to the client nodes via the GLOBUS GRIDftp service. Runs have been performed on the entire human genome with good success: give a large number of worker nodes he bottleneck of the entire process is the geographical network transfer of the results across the GRID, and not the PSI-Blast run itself.
Scientific projects and web servers for the platform
UNIBO - University of Bologna
Scientific Coordinator: CASADIO Rita; Dept: Biologia Evoluzionistica Sperimentale
Program integration: Implementation and development of new algorithms based on automatic learning methods for the predictions of structural and functional characteristics of proteins starting from their sequence.
Program integration: Development of automatic tools for 3D structure prediction starting from the protein sequence both of globular and membrane proteins.
Synopsis: Starting from the protein sequence several features can be predicted provided that algorithms are available capable of generalizing over a given property. In other words given a set of examples where sequences are related to the property at hand machine learning methods can extract the general rules relating inputs to outputs, and extrapolate over never seen before examples. Adopting this approach several predictors have been implemented addressing several problems of computational biology and giving heuristic solutions to functional and structural genomes annotations
Results: For the specific tasks addressed by the FIRB-LIBI project we developed:
one predictor suited to predict for a given sequence the subcellular localization. This predictor is useful for the functional prediction of all the eukaryotic genomes, and it hab been applied to wide scale analysis of 5 genomes, including the human genome.
one predictor suited to predict starting from the protein sequence whether a given mutation may or may not be related to a genetic desease
one predictor for coupling a given protease to its specific inhibitor/s
one web server that implements several predictors of the topology of membrane proteins
Products:
● Amico M, Finelli M, Rossi I, Zauli A, Elofsson A, Viklund H, von Heijne G, Jones D, Krogh A, Fariselli P, Martelli PL, Casadio R -PONGO: a web server for multiple predictions of all-alpha transmembrane proteins- Nucleic Acids Res 34(Web server issue):169-172 (2006)
http://pongo.biocomp.unibo.it/
● Pierleoni A, Martelli PL, Fariselli P, Casadio R -BaCelLo: a balanced subcellular localization predictor- Bioinformatics 22:e408-e416 (2006)
http://gpcr.biocomp.unibo.it/bacello/
● Capriotti E, Calabrese R, Casadio R -Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information- Bioinformatics (in press, 2006)
http://gpcr.biocomp.unibo.it/cgi/predictors/PhD-SNP/PhD-SNP.cgi
● Pierleoni A, Martelli PL, Fariselli P, Casadio R _eSLDB: eukaryotic Subcellular Localization Data Base- Nucleic Acids Res (in press, 2007)
http://gpcr2.biocomp.unibo.it/esldb/
UNIROMA2 - University of Rome "Tor Vergata"
Scientific Coordinator: HELMER-CITTERICH Manuela; Dept: Biology
Program integration: Implementation and development of new algorithms based on automatic learning methods for the predictions of structural and functional characteristics of proteins starting from their sequence.
Synopsis: Aim of the project is the construction of a web server for protein functional annotation based on local structural similarity.
Results: For this purpose, a new database of annotated residues of known structure (pdbScan) was built, where each residue is associated with the following data:
1) residue name;
2) protein chain;
3) PDB code;
4) secondary structure;
5) SMART domain;
6) SCOP number;
7) CATH classification;
8) Pfam code;
9) catalytic site;
10) conservation in the HSSP alignment;
11) ligand binding ability;
12) PROSITE pattern.
In the actual implementation, pdbScan utilizes mmCIF structure files, is now able to handle modified residues (i.e. phosphorylated residues and other residues with post-translational modifications), can take into account also NMR data (by choosing one representative structure) and is fully integrated in a software for local protein structure comparison, based on the Query3D engine (Ausiello et al., 2005).
The prototype can handle xml input and output and is being tested on a set of known cases. Interesting results were obtained in the analysis of different substrates of the Src kinase and experimental work is in progress (with the collaboration of dr. Stefania Gonfloni) to address their biological relevance.
Products:
UNIROMA1 - University of Rome "La Sapienza"
Scientific Coordinator: TRAMONTANO Anna; Dept: Biochemistry "Rossi Fanelli"
Program integration: Implementation of gene prediction methods in genomes from different organisms and identification of new genes with specific characteristics among the annotated genes.
Synopsis: Protein-protein interactions are at the basis of any cellular process and crucial for understanding many bio-technological applications. During the last few years the development of high-throughput technologies has produced several large-scale protein-interaction data sets for various organisms and many interaction databases have been created by means of data-mining techniques. It is well known that interactions may be mediated by the presence of specific features, such as motifs, patches and domains. Even if many efforts are underway to elucidate the role of these features in the regulation of the interaction network very little is known about this on a genome scale. Data-integration and computational methods are fundamental tools to gain insight into such data, to assign a confidence level to singular interactions or to complete data sets and to get clues of the molecular basis that regulate such interactions.
Results: