!
Warning ! <CopyTheft>
BiOHome
BiOFTP
serv
BiONews
BiOWeb
sources
Dir
Index>>
<Bioinformatics
paper: PDF>
|
|
| (1)
About:
The
Importance of Representatives in Protein Families |
| Motivation:
Biological sequence databases are highly redundant for two main reasons:
1. various databanks keep redundant sequences with many identical and nearly
identical sequences 2. natural sequences often have high sequence identities
due to gene duplication. We wanted to know how many sequences can be removed
before the databases start losing homology information. Can a database
of sequences with mutual sequence identity of 50% or less provide us with
the same amount of biological information as the original full database?
Results:
Comparisons of nine representative sequence databases (RSDB) derived from
full protein databanks showed that the information content of sequence
databases is not linearly proportional to its size. An RSDB reduced to
mutual sequence identity of around 50% (RSDB50) was equivalent to the original
full database in terms of the effectiveness of homology searching. It was
a third of the full database size which resulted in a six times faster
iterative profile searching. The RSDBs are produced at different granularity
for efficient homology searching.
|
| (2)
BiO
Authors |
Jong Park, Liisa Holm, Andreas Heger and
Cyrus Chothia
The European Bioinformatics Institute, EMBL Outstation,
Cambridge CB10 1SD, UK
LMB, MRC Centre, Hills Road, Cambridge, CB2 2QH, UK
|
| (3)
BiO
People |
|
Biology field people directory(BioPeople)
Friends dir
Users dir
Pers dir
|
| (4)
BiOLabs |
|
Holm
lab (EBI), FSSP
Church lab, CGR
NCBI www
MRC-LMB,
Cambridge, UK
UCSC,
HGMP
WASHU Eddy
lab
EBI, (official
web site ->EBI), HGMP
KEGG (kyoto university)
Universities,
|
| (5)
BiO
Links |
|
SRS (EBI),
PFAM
Swissprot
SAGE homepage (http://www.sagenet.org/)
|
| (6)
BiO
Projects |
-
Soc
-
Bio
-
Genome related
-
Search_meth_comp
-
SAT
-
Prediction : Casp4
-
cafasp targets summary
-
PDB_ISL (LMB)
-
Comp
-
Txt
|
|
. |
|