The est_db software package has been developed and used at the Sanger Institute to support the Xenopus tropicalis EST Project - a collaboration between the Sanger Institute and the Wellcome/Cancer Research UK, Gurdon Institute in Cambridge. To date est_db has been used to process the nearly 400,000 ESTs sequenced as part of the project, approximately 305,000 of which passed its quality control (QC) checks, and have been submitted to public databanks. More details of this project are available on the Sanger Xenopus tropicalis project page.
The est_db package is a software suite and database system designed to support expressed sequence tag (EST) sequencing projects, and to provide comprehensive bioinformatic analysis of sequenced EST libraries, for gene discovery and other purposes. The database can hold and efficiently process hundreds of thousands of EST sequences, track the cDNA libraries and clones to which they belong, and store the results of their analysis. Should they be available, large compute farms can be used for the analysis.
Extensive bioinformatic analysis can be carried out on the sequenced EST libraries, including similarity (BLAST) searches, protein sequence prediction, and the import of EST clustering and assembly data from external sources. Results are searchable via a web page, with graphic output of the various analyses, enabling one to retrieve information pertaining to a particular cDNA clone, or EST read, as well as view EST clustering results, or graphical representations of BLAST results on the searched EST sequences.
The est_db package is likely to appeal not only to sequencing groups directly employed in EST sequencing, but also to groups interested in performing bespoke analysis of ESTs that may already be publically available, in order to support their ongoing research aims. The package is easily-extensible, via an API designed specifically to handle ESTs and their analysis. It is open source and is made available free of charge, and, where possible, similarly open-licensed components have been used in its development.