One of BiO centre project

Representative Sequences DataBase project

The importance of Representatives in Protein Sequence Families

Sequence databases can effectively be reduced to 50% mutual sequence identity
at 1/3 of its original size in homology detection.
BiOCentre Home
BiOFTP at EBI site
BiONews
BiOWeb sources


PDF file: Bioinformatics
paper:
May 2000>



 
(1) About: The Importance of Representatives in Protein Families
Motivation: Biological sequence databases are highly redundant for two main reasons: 1. various databanks keep redundant sequences with many identical and nearly identical sequences 2. natural sequences often have high sequence identities due to gene duplication. We wanted to know how many sequences can be removed before the databases start losing homology information. Can a database of sequences with mutual sequence identity of 50% or less provide us with the same amount of biological information as the original full database? 

Results: Comparisons of nine representative sequence databases (RSDB) derived from full protein databanks showed that the information content of sequence databases is not linearly proportional to its size. An RSDB reduced to mutual sequence identity of around 50% (RSDB50) was equivalent to the original full database in terms of the effectiveness of homology searching. It was a third of the full database size which resulted in a six times faster iterative profile searching. The RSDBs are produced at different granularity for efficient homology searching. 
 

RSDB Home site
(2) BiO Authors
RSDB: Representative Protein Sequence Databases have high information content
Bioinformatics May 2000
Jong Park, Liisa Holm, Andreas Heger and 
Cyrus Chothia
(3) BiO People
(4) BiOLabs
(5) BiO Links
(6) BiO Projects
  1. Soc
  2. Bio
    1. Genome related
    2. Search_meth_comp
    3. SAT
    4. Prediction : Casp4
      1. cafasp targets summary
    5. PDB_ISL (LMB)
  3. Comp
  4. Txt

.
Donation for RSDB project
(7) BiO Higher level bio sites
  • The Universe and us
(8) BiO Services
  1. Web services
    1. ISS sequence searchserver (uk)
    2. PDB_ISL sequence search (MRC site)
    3. Persus 
    4. Project servers
      1. BioPerl
      2. BioJava
      3. BioBean
      4. BioComponent
      5. BioEntity
  2. Ordering service
  3. NNPSL : http://predict.sanger.ac.uk/nnpsl/
(9) BiO
  1. Computer related
    1. Linux
    2. Perl
  2. Protein related
  3. DNA related
(10) BiO Related sites &machines
(11) BiO References, FAQs,Presentations, books
  1. BiO Papers
  2. BiO References
  3. BiO FAQs
  4. BiO Presentations
  5. BiO Pictures
  6. BiO Utilities
  7. BiO Journals
  8. BiO Conferences/courses
  9. BiO Glossary
(12) BiOMisc

AltaVisTA, MetaCrawler, WebCrawlerINFORSEEK, InforMine(for Biology data), MedLine ,  Indexed
MEDLINE, New PubMed, PUBMED, IDEAL, Yahoo, DejaNews , FtpSearch, GOOGLE 
Google Search:  

...or browse web pages by category.
Language options
Altavista >> 

Warning: this is a CopyTheft page! Restrictions Apply!! Read the link below.
CopyTheft,CopyFree, Copyleft,
j@bio.cc