Rapid and sensitive sequence similarity searches powered by parallel computing technology | |||||||
|
NEWS: 9 April 2008: Version 5.0 of PARALIGN has been released. It is available from Sencel Bioinformatics. 22 October 2007: Searches now running on the Titan cluster. 25 January 2007: Updated graphical result overview with hit distribution graph and possibilities for resubmission of a selected fragment of the query sequence. |
This site provides a service for searching public sequence databases for sequences similar to a given query sequence. Searches can provide valuable information about gene relationships, functions and structure. Very accurate and rapid searches can be carried out because this service is based on very sensitive comparison methods and is powered by a large computer cluster. The two comparison algorithms used are called Smith-Waterman (SW) and ParAlign. The first algorithm was published by Smith and Waterman (1981) and is a well established method that finds the optimal local alignment of two sequences. It is generally regarded as the "gold standard" for sequence comparison and gives the best results. However, in ordinary implementations, it is very time-consuming. By exploiting parallel computing technology, we have made the Smith-Waterman method run about 8 times faster than normal. It is therefore called accelerated Smith-Waterman. This site features the fastest implementation published of the SW-algorithm on any general-purpose microprocessor. For more information about the new implementation see Rognes and Seeberg (2000). The other algorithm, ParAlign, is a heuristic method for sequence alignment. In almost all cases ParAlign finds exactly the same alignments as the SW-algorithm, but it is not guaranteed. However the speed of ParAlign is much higher. In essence, ParAlign is about as sensitive as Smith-Waterman but runs at the speed of BLAST. For more details, please see the publication by Rognes (2001). The parallel computing technology used is also known as multimedia technology or Single-Instruction Multiple-Data (SIMD) technology and is embedded in most modern processors including the Pentium, PowerPC, Itanium, Alpha and similar microprocessors. The software is also adapted to computers with several processors and to clusters of several computers. The online searches are now powered by a cluster of 33 computers with 2 Intel Xeon 2.4 GHz processors each. PARALIGN is designed to identify weak similarities. For identification of nearly identical nucleotide sequences, BLASTN and similar tools are faster and more appropriate. More information about PARALIGN may be found on Sencel's website and in the PARALIGN User's guide (PDF). References
AvailabilityStand-alone executables for several different computer systems are available from Sencel Bioinformatics. Free software licenses are available for non-commercial academic use. Free evaluation licenses for commercial users are also available. Contact Sencel for more information. Development and FundingThe software and service is developed by Sencel Bioinformatics and the Bioinformatics group at the Centre for Molecular Biology and Neuroscience (CMBN) and the Institute of Medical Microbiology, Rikshospitalet-Radiumhospitalet and University of Oslo, Norway. The service is supported by the National Programme for Research in Functional Genomics in Norway (FUGE), in the Research Council of Norway. The computers are hosted by the Centre for Information Technology Services at the University of Oslo. |
||||||