Sanger institute

From RP
Revision as of 22:54, 23 April 2009 by Wikisysop (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

About the Sanger Institute

Access the Bioinformatics Software

http://www.sanger.ac.uk/Software/

Contents

The following bioinformatics software was developed by the Sanger Institute:

  • ACT
    • a DNA sequence comparison viewer based on Artemis
  • Alfresco
    • The aim is to develop a new visualisation tool that allows effective comparative genome sequence analysis. The program will compare multiple sequences from putitatively homologous regions in different species. Results from various different existing analysis programs, such as gene prediction, protein homology and regulatory sequence prediction programs shall be visualised and used to find corresponding sequence domains.
  • Alien Hunter
    • an application for the prediction of putative Horizontal Gene Transfer (HGT) events with the implementation of Interpolated Variable Order Motifs (IVOMs).
  • Angler
    • A Browser of C.elegans Embryo Development In Time and Space
  • Artemis
    • DNA sequence viewer and annotation tool
  • Cdna_db
    • cdna_db is a software system designed for quality-control checking of finished cDNA clone sequences, and their computational analysis. The combination of a relational db (MySQL) schema, and an object-orientated perl API make it easy to implement high-level analyses of these transcript sequences.
  • DAS
    • The Wellcome Trust Sanger Institute provides support for the Distributed Annotation Systems via a range of different projects, websites and applications. This information resource provides an overview of these.
    • The Distributed Annotation System (DAS) addresses these issues. It is frequently being used to openly exchange biological annotations between distributed sites. Data distribution, performed by DAS servers, is separated from visualization, which is done by DAS clients.
  • DECIPHER
    • DECIPHER tracks submicroscopic duplications and deletions of DNA in patients together with phenotypes exhibited by those patients. DECIPHER tallies these genetic abnormalities with genes and other features of interest in the affected areas. The aim of DECIPHER is to provide a research tool to aid clinical diagnosis and treatment of these conditions. DECIPHER makes use of DAS technology to integrate with Ensembl, the world's leading genome browser.
  • Doublescan
    • Doublescan is a program for comparative ab initio prediction of protein coding genes in mouse and human DNA.
  • Eponine
    • a computational method for detecting mammalian transcription start sites
  • Est_db
    • The est_db package is a software suite and database system designed to support expressed sequence tag (EST) sequencing projects.
  • FINEX
    • The FINEX program allows sequence homology searching techniques to be applied, where the sequence data is replaced with a fingerprint abstracted from the intron/exon boundary phase and the exon length.
    • Please note FINEX is no longer supported but is available for download.
  • GAZE
    • GAZE is a tool for the integration of gene prediction signal and content sensor information into complete gene structures. It is completely configurable in the way that both the signal and content data themselves and the the model of gene structure against which assemblies are validated and scored, are external to the system and and supplied by the user.
  • Hexamer
    • Hexamer is a program to scan DNA sequences to look for likely coding regions. The principle is to use 6mers, but to avoid deriving any information from base composition. Therefore, the frequencies of each 6mer are normalized by dividing by the total frequency of all 6mers with the same base composition.
  • Illuminus
    • Illuminus is a fast and accurate algorithm for assigning single nucleotide polymorphism (SNP) genotypes to microarray data from the Illumina BeadArray technology.
  • LogoMat-M
    • Profile Hidden Markov Models (pHMMs) are a widely used tool for protein family research. We present a method to visualize all of their central aspects graphically, thus generalizing the concept of sequence logos introduced by Schneider and Stephens. For each emitting state of the pHMM, we display a stack of letters. As for sequence logos, the stack height is determined by the deviation of the position's letter emission frequencies from the background frequencies of the letters. As a new feature, the stack width now visualizes both the probability of reaching the state (the hitting probability) and the expected number of letters the state emits during a pass through the model (the expected contribution).
  • LogoMat-P
    • The problem of profile-profile comparison has a long history but has received a lot of attention recently. This is a result of the growing number of well characterised protein families in databases such as Pfam. By adding additional information about properties of the entire family, it has been shown that profile-profile methods significantly increase sensitivity compared to profile-sequence comparison.
    • The availability of advanced profile-profile comparison tools such as PRC or HHsearch demand sophisticated visualisation tools not presently available. We introduce an approach built upon the concept of HMM Logos. The method illustrates the similarities of pairs of protein family profiles in an intuitive way.
  • LookSeq
    • LookSeq is a web-based application for alignment visualization, browsing and analysis of genome sequence data.
    • LookSeq supports multiple sequencing technologies, alignment sources, and viewing modes; low or high-depth read pileups; and easy visualization of putative single nucleotide and structural variation. The visible range, from whole chromosome to single base resolution, can be set manually or by scrolling or zooming the display with fast, on-the-fly rendering from the server-side alignment database. LookSeq uses a universal database for alignments of different sequencing technologies and algorithms. Sequence data from multiple sources can be viewed separately or aligned in a single display, facilitating direct comparison between datasets. LookSeq can also link to relevant external sites such as PubMed and other online analysis tools, via buttons or double-clicking on the displayed sequence annotation.
  • MAPTAG
    • MAPTAG is an informatics tool that annotates batches of unknown sequences to the mouse genome, and assigns gene ID where possible through sequence match to the Ensembl database system. It consists of several linked sequence search modules that are controlled by a management script. Sequences are submitted for analysis in simple FASTA format, and resulting data is generated in tab-delimited text files that can be manually or automatically imported into relational databases or PC-based desktop analysis packages such as Microsoft Excel.
  • Margarita
    • Margarita infers genealogies from population genotype data and uses these to map disease loci.
    • These genealogies take the form of the Ancestral Recombination Graph (ARG). The ARG defines a genealogical tree for each locus, and as one moves along the chromosome the topologies of consecutive trees shift according to the impact of historical recombination events. There are two stages to the analysis. First, we infer plausible ARGs using a heuristic algorithm, which can handle unphased and missing data. Second, we test the genealogical tree at each locus for a clustering of the disease cases beneath a branch. Since the true ARG is unknown, we average this analysis over an ensemble of inferred ARGs.
  • NestedMICA
  • PAjHMMA
  • PSILC
  • Pfam
  • ProServer
  • Projector
  • QuickTree
  • Rfam
  • SCOOP
  • SLICE
  • Ssaha
  • Ssaha2
  • SsahaEST
  • SsahaSNP
  • StrataSplice
  • Tctool
  • Wise2

The website also contains external links to:

  • AceDB
  • BioPerl
  • Dynamite
  • EMBOSS
  • Ensembl
    • The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.