BLAST: Basic Local Alignment Search Tool
Submitted by
Purnima Sharma
Department of Bioinformatcs
CONTENTS
Definition
History of BLAST
Type
s
of
BLAST
Types of BLAST:
Algorithm of BLAST
Some BLAST terminology
Main steps of BLAST
Parameters: w = length of a hit; T = min. score of a hit (for proteins: w=3, T=13 (BLOSUM62)
step 1.
How BLAST works
•Blast searches begin with a query sequence that will be matched against sequence databases specified by the user.
•Begins by breaking down the query sequence into a series of short overlapping “words”
•Default word size for BLAST N is 28 nucleotides
•Default word size for BLAST P is 3 amino acids
•Results obtained depend on the scoring matrix used.
•BLOSUM 62 matrix is the default scoring matrix for BLASTP
The basic strategy used by the BLAST algorithms
The BLASTP algorithm
• Query sequence is broken into all possible 3-letterwords using a moving window
• Numerical score is calculated for each word by adding up the values for the amino acids fromtheBLOSUM62 matrix
• Words with a score of 12 or more are collected In to the initial BLASTP search set.
• The search set is broadened by adding synonyms that differ from the words at one position.
• Only synonyms with scores above a threshold value are added to the search set. NCBI BLASTP uses a default threshold of 10 for synonyms
Contd….
Using this search set, BLAST scans a database and identifies word hits/matches that score above the threshold.
These short matches serve as seeds. The BLAST algorithm attempts to extend the match in the immediate sequence neighborhood
BLAST keeps a running raw score, using scoring matrices, as it extends the matches. Each new amino acid either increases or decreases the raw score
Penalties are assigned for mismatches and for gaps between the two alignments.
Using this search set, BLAST scans a database and identifies word hits/matches that score above the threshold.
These short matches serve as seeds. The BLAST algorithm attempts to extend the match in the immediate sequence neighborhood
BLAST keeps a running raw score, using scoring matrices, as it extends the matches. Each new amino acid either increases or decreases the raw score
Penalties are assigned for mismatches and for gaps between the two alignments.
Contd….
The Blast output
Includes a table with the bit scores (S) for each alignment andits E-value, or “expect score”
the score (S) is a measure of the quality of an alignment (calculated as the sum of substitution and gap scores for eachaligned residue)
E-value (E), or expectation value is a measure of the significanceof the alignment. The E-value is the number of different alignments, with scores equivalent to or better than S, that areexpected to occur in a database search by chance.
The lower the E-value, the more significant the alignment result.
Alignments with the highest bit scores and lowest E-values arelisted at the top of the table.
Uses of BLAST:
• Identify previously characterized sequences.
• Find phylogenetically related sequences.
• Identify possible functions based onsimilarities to known sequences.
BLAST (Basic Local Alignment Search Tool) has numerous applications in various fields of biology and bioinformatics:
1.Sequence Alignment: BLAST is primarily used for comparing biological sequences (DNA, RNA, or protein) against databases to find regions of similarity. This is essential for identifying homologous sequences, which can provide insights into evolutionary relationships and functional similarities.
2. Functional Annotation: BLAST results can be used to annotate the function of unknown sequences by identifying similar sequences with known functions in the database. This is crucial for interpreting the biological significance of newly sequenced genes or proteins.
3. Genomic and Proteomic Analysis: BLAST can be used to analyze entire genomes or proteomes to identify genes, regulatory regions, or protein domains. It helps in understanding the organization and structure of genomes and proteomes.
4. Phylogenetic Analysis: By identifying homologous sequences across different species, BLAST can aid in phylogenetic analysis to study evolutionary relationships and infer the evolutionary history of organisms.2
Applications of BLAST
5.Disease Research: BLAST is used in biomedical research to identify genetic variations associated with diseases, study gene expression patterns, and investigate the role of specific genes or proteins in disease pathways.
6.Drug Discovery: BLAST can be employed in drug discovery efforts to identify potential drug targets by comparing protein sequences of pathogens or disease-related genes to sequences of known drug targets.
7.Agricultural Biotechnology: BLAST is used in agricultural research to study crop genomes, identify genes related to desirable traits such as disease resistance or yield, and develop molecular markers for breeding purposes.
8.Microbial Ecology: BLAST is used to analyze microbial communities in various environments, such as soil, water, or the human microbiome, by comparing sequences obtained from environmental samples to reference databases.
9.Forensic Analysis: BLAST can be used in forensic biology to compare DNA sequences obtained from crime scenes to databases of known DNA sequences to identify suspects or victims.
10. Virology: BLAST is used to study the genetic diversity of viruses, identify novel viruses, and understand viral evolution and transmission dynamics.
Overall, BLAST is an indispensable tool in molecular biology and bioinformatics, playing a vital role in a wide range of research areas and applications.