Work Progress



The Bologna Node (Scientific Unit) in the FIRB-LIBI project is involved in the following subtasks:

 

 

 

 

In the first year of activity (9-2005/9-2006) we focused on the following:

 

GRID Integration for Technological transfer:

Ongoing projects: Genome comparison at large (CNAF/INFN); Massive sequence alignments (SPACI).

In order to perform massive genome-wide annotation, it is usually necessary to run PSIblast queries on sequence DBs, such as UNIREF-90, for each sequence of the genome.  A GRID-enabled psi-blast system has been developed together with SPACI, that allows for rapid genome-wide PSIblast using GRID technology.  A web portal has been designed, and implemented by SPACI people, that allows for input submission via GRID. GRID clients nodes have been set-up in order to accept the result of the computation. A parallel PSI-Blast, based on a Master- Workers architecture has been designed and implemented, that distributes the input sequences on the computing GRID, runs PSI-blast queries, parses and integrates the results of the worker nodes, and returns the results to the client nodes via the GLOBUS GRIDftp service.  Runs have been performed on the entire human genome with good success: give a large number of worker nodes he bottleneck of the entire process is the geographical network transfer of the results across the GRID, and not the PSI-Blast run itself.

 

Scientific projects and web servers for the platform

 

UNIBO - University of Bologna

Scientific Coordinator: CASADIO Rita; Dept: Biologia Evoluzionistica Sperimentale

 

Program integration: Implementation and development of new algorithms based on automatic learning methods for the predictions of structural and functional characteristics of proteins starting from their sequence.

Program integration: Development of automatic tools for 3D structure prediction starting from the protein sequence both of globular and membrane proteins.

 

Synopsis: Starting from the protein sequence several features can be predicted provided that algorithms are available capable of generalizing over a given property. In other words given a set of examples where sequences are related to the property at hand machine learning methods can extract the general rules relating inputs to outputs, and extrapolate over never seen before examples. Adopting this approach several predictors have been implemented addressing several problems of computational biology and giving heuristic solutions to functional and structural genomes annotations

 

Results: For the specific tasks addressed by the FIRB-LIBI project we developed:

one predictor suited to predict for a given sequence the subcellular localization. This predictor is useful for the functional prediction of all the eukaryotic genomes, and it hab been applied to wide scale analysis of 5 genomes, including the human genome.

one predictor suited to predict starting from the protein sequence whether a given mutation may or may not be related to a genetic desease

one predictor for coupling a given protease to its specific inhibitor/s

one web server that implements several predictors of the topology of membrane proteins

 

Products:

 

      Amico M, Finelli M, Rossi I, Zauli A, Elofsson A, Viklund H, von Heijne G, Jones D, Krogh A, Fariselli P, Martelli PL, Casadio R -PONGO: a web server for multiple predictions of all-alpha transmembrane proteins- Nucleic Acids Res 34(Web server issue):169-172 (2006)

http://pongo.biocomp.unibo.it/

 

      Pierleoni A, Martelli PL, Fariselli P, Casadio R -BaCelLo: a balanced subcellular localization predictor- Bioinformatics 22:e408-e416 (2006)

http://gpcr.biocomp.unibo.it/bacello/

 

      Capriotti E, Calabrese R, Casadio R -Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information- Bioinformatics (in press, 2006)

http://gpcr.biocomp.unibo.it/cgi/predictors/PhD-SNP/PhD-SNP.cgi

 

      Pierleoni A, Martelli PL, Fariselli P, Casadio R _eSLDB: eukaryotic Subcellular Localization Data Base- Nucleic Acids Res (in press, 2007)

http://gpcr2.biocomp.unibo.it/esldb/

 

UNIROMA2 - University of Rome "Tor Vergata"

Scientific Coordinator: HELMER-CITTERICH Manuela; Dept: Biology

 

Program integration: Implementation and development of new algorithms based on automatic learning methods for the predictions of structural and functional characteristics of proteins starting from their sequence.

 

Synopsis: Aim of the project is the construction of a web server for protein functional annotation based on local structural similarity.

 

Results: For this purpose, a new database of annotated residues of known structure (pdbScan) was built, where each residue is associated with the following data:

 

1) residue name;

2) protein chain;

3) PDB code;

4) secondary structure;

5) SMART domain;

6) SCOP number;

7) CATH classification;

8) Pfam code;

9) catalytic site;

10) conservation in the HSSP alignment;

11) ligand binding ability;

12) PROSITE pattern.

 

In the actual implementation, pdbScan utilizes mmCIF structure files, is now able to handle modified residues (i.e. phosphorylated residues and other residues with post-translational modifications), can take into account also NMR data (by choosing one representative structure) and is fully integrated in a software for local protein structure comparison, based on the Query3D engine (Ausiello et al., 2005).

The prototype can handle xml input and output and is being tested on a set of known cases. Interesting results were obtained in the analysis of different substrates of the Src kinase and experimental work is in progress (with the collaboration of dr. Stefania Gonfloni) to address their biological relevance.

 

Products:

 

UNIROMA1 - University of Rome "La Sapienza"

Scientific Coordinator: TRAMONTANO Anna; Dept: Biochemistry "Rossi Fanelli"

 

Program integration: Implementation of gene prediction methods in genomes from different organisms and identification of new genes with specific characteristics among the annotated genes.

 

Synopsis: Protein-protein interactions are at the basis of any cellular process and  crucial for understanding many bio-technological applications. During the last few years the development of high-throughput technologies has produced several large-scale protein-interaction data sets for various organisms and many interaction databases have been created by means of data-mining techniques. It is well known that interactions may be mediated by the presence of specific features, such as motifs, patches and domains. Even if many efforts are underway to elucidate the role of these features in the regulation of the interaction network very little is known about this on a genome scale. Data-integration and computational methods are fundamental tools to gain insight into such data, to assign a confidence level to singular interactions or to complete data sets and to  get clues of the molecular basis that regulate such interactions.

 

Results:

 




Modalità con le quali si documen

Modalità con le quali si documentano i risultati

Unita di Ricerca UNIBO (http://www.biocomp.unibo.it/firb/)

 

Modalità 

SI/NO 

Descrizione 

3.1

pubblicazioni scientifiche;

 

 

  Casella di testo: x
  SI

  Casella di testo:  
  NO

Amico et al, 2006; Pierleoni et al, 2006 a; Pierleoni et al 2006b; Capriotti et al, 2006; Ausiello et al., 2005; Ferraro et al., 2005; Ferraro et al., 2006; Marcatili and Tramontano, submitted, 2006.

3.2

pubblicazioni su supporto informatico (CD, web, etc);

 

 

  Casella di testo: X
  SI

  Casella di testo:  
  NO

 http://www.biocomp.unibo.it/firb/ and related links

3.3

edizioni critiche, lessici, liste di frequenza, etc.;

 

 

  Casella di testo:  
  SI

  Casella di testo: x
  NO

 

3.4

rapporti tecnici e/o progetti;

 

 

  Casella di testo: x
  SI

  Casella di testo:  
  NO

 GRID Integration for Technological transfer: Genome comparison at large (CNAF/INFN); Massive sequence alignments (SPACI).

 http://www.biocomp.unibo.it/firb/ and related links

3.5

brevetti;

 

 

  Casella di testo:  
  SI

  Casella di testo: x
  NO

 

3.6

comunicazioni a congressi nazionali;

 

 

  Casella di testo: x
  SI

  Casella di testo:  
  NO

BITS2006 Bologna Convegno Nazionale di Bioinformatica

3.7

comunicazioni a congressi internazionali;

 

 

  Casella di testo: x
  SI

  Casella di testo:  
  NO

ECCB, Madrid September2005;

Automated Protein Function Prediction Meeting, San Diego, August 2006;

RECOMB2006, Venezia, April 2006; ISMB06, Fortaleza, August 2006.

3.8

diffusione dei risultati sul piano informativo;

 

 

  Casella di testo: x
  SI

  Casella di testo:  
  NO

The eProtein scientific meeting, Wellcome Trust Conference, Hinxton, UK, 2006

Partecipazione a Research to Business 2006, Bologna e EXPOSANITA’ 2006, Bologna

http://www.biocomp.unibo.it/firb/ and related links

3.9

diffusione dei risultati sul piano formativo;

 

 

  Casella di testo: x
  SI

  Casella di testo:  
  NO

European School of Genetic Medicine, 6th  course in Bioinformatics for Molecular Biologists, Bertinoro 2006.

Bologna Winter School 2006.

Applied Bioinformatics. Tablets of bioinformatics for everybody- UTPL, Universidad Tecnica Particular de Loja, Loja (Ecuador)

http://www.biocomp.unibo.it

3.10

diffusione dei risultati sul piano divulgativo;

 

 

  Casella di testo: x
  SI

  Casella di testo:  
  NO

Ricerca scientifica ed energia del futuro, Senigallia (AN), 16/09/05-