BiOCentre Home
BiOFTP at EBI site
BiONews
BiOWeb
sources
PDF file: Bioinformatics
paper:
May
2000>
|
|
| (1)
About:
The
Importance of Representatives in Protein Families |
| Motivation:
Biological sequence databases are highly redundant for two main
reasons:
1. various databanks keep redundant sequences with many identical and
nearly
identical sequences 2. natural sequences often have high sequence
identities
due to gene duplication. We wanted to know how many sequences can be
removed
before the databases start losing homology information. Can a database
of sequences with mutual sequence identity of 50% or less provide us
with
the same amount of biological information as the original full
database?
Results:
Comparisons of nine representative sequence databases (RSDB) derived
from
full protein databanks showed that the information content of sequence
databases is not linearly proportional to its size. An RSDB reduced to
mutual sequence identity of around 50% (RSDB50) was equivalent to the
original
full database in terms of the effectiveness of homology searching. It
was
a third of the full database size which resulted in a six times faster
iterative profile searching. The RSDBs are produced at different
granularity
for efficient homology searching.
|
| RSDB Home site |
|
|
| (2)
BiO
Authors |
RSDB: Representative
Protein Sequence Databases have high information content
Bioinformatics May
2000
Jong Park, Liisa Holm, Andreas Heger
and
Cyrus Chothia
|
| (3)
BiO
People |
|
|
| (4)
BiOLabs |
|
|
| (5)
BiO
Links |
|
|
| (6)
BiO
Projects |
-
Soc
-
Bio
-
Genome related
-
Search_meth_comp
-
SAT
-
Prediction : Casp4
-
cafasp targets
summary
-
PDB_ISL
(LMB)
-
Comp
-
Txt
|
|
. |
|