Puneet Wadhwa's BIOINFORMATICS BLOG

Thursday, October 27, 2005

Multiple Sequence Alignment of DNA and Proteins - An introduction


Introduction:
In some of the previous articles on BLAST, we went through the basic principles of sequence alignment. In this article, we will look at some of the principles of multiple sequence alignment and also explore some of the common software used for multiple sequence alignment.

Why perform Multiple Sequence Alignment:
First, let us look at why you would want to do multiple sequence alignment in the first place. Multiple alignment can be used to study evolutionary relationships between related proteins. Since the changes between gene sequences due to evolution are incremental, we can take homologous genes , i.e. genes with a common evolutionary origin, from a diverse range of organisms and then compare them by aligning identical or similar residues. The comparison of these related genes may then be used to study, which regions of genes have been conserved, and which are sensitive to mutation, over the years. This is very useful in designing experiments to test and modify the function of specific proteins, and to predict the function and structure of proteins, and to identify new members of protein families.

Multiple Sequence Alignment programs and techniques:

Progressive strategies for multiple alignment: A common approach for multiple sequence alignment is to progressively align pairs of sequences. First two sequences are selected and are aligned together, and then this alignment is used to align each subsequent sequences.

One of the most popular programs for multiple sequence alignment is known as ClustalW. It is a general purpose multiple alignment program for DNA or proteins. It calculates the best match for the selected sequences, and lines them up so that the similarities and differences can be seen. It also generates a cladogram which can be useful for studying the evolutionary relationships between the set of sequences.

You may run the ClustalW programming by either downloading and installing it at your local machine, or may run it online at http://www.ebi.ac.uk/clustalw/. To download the software, you may visit the following location ftp://ftp.ebi.ac.uk/pub/software/unix/clustalw/.

We will look at how to run ClustalW using EBI's online ClustalW server, at the above location. For running ClustalW, you need a set of sequences in Fasta format, which is nothing but a header line beginning with ">", followed by sequence name/description and then followed by the sequence in the next line.

Let us leave the rest of the parameters as default, and if you want, you may enter your email address so that the results can be emailed to you. After the ClustalW finishes running, it produces dour files: Output file (.output), Alignment file - plain text version (.aln); Guide tree file (.dnd), and your input file (.input) . The ClustalW also shows the alignment in the form of a phylogenetic tree, or a cladogram which can be chosen from option menu (right-click) of the Java applet.

The difference between a cladogram and a phylogenetic tree is that, A Phylogenetic tree is a branching diagram (tree) in which branch lengths are proportional to the amount of inferred evolutionary change. A Cladogram is a tree where the branches are of equal length, thus cladograms show common ancestry, but do not indicate the amount of evolutionary "time".

2 Comments:

  • Please see references by Notredame et al and Poirot et al on the performance of the COFFEE MSA programs (COFFEE, T-COFFEE, 3-DCOFFEE) compared with CLUSTAL and others.

    Laurence Frabotta, PhD
    Queens College, CUNY

    By Anonymous Anonymous, at 11:38 AM  

  • bioinformatics tutorial for beginners...
    http://www.bioxist.com

    By Anonymous macx, at 5:48 AM  

Post a Comment

<< Home