Phylogenetics/Alignment Preparation/Global Alignment

Global alignment methods attempt to align the entire length of two sequences. Without models for sequence evolution, these methods are very general and can be applied to any sequence (not just biological) similarity.

The first algorithm to discuss is the Needleman-Wunsch.

requires: seq1 = first sequence; seq2 = second sequence
//scoring matrices values
match = 1
mismatch = -1
gap = -1
sc_matrix //initialize scoring matrix the size of which is the largest sequence length
p_matrix //initialize pointing matrix the size of which is the largest sequence length
sc_matrix[0][0] = 0
p_matrix[0][0] = 0 // 0 = no where; 1 = left; 2 = right; 3 = down; 4 = up; diagonal = 5;
for i=0 to length of seq1
  sc_matrix[0][i] = gap * i
  p_matrix[0][i] = 1
end for
for i=0 to length of seq2
  sc_matrix[i][0] = gap * i
  p_matrix[i][0] = 1
end for
for i=0 to length of seq2
  for j=0 to length of seq1
    subseq1 <= seq1[j]
    subseq2 <= seq2[i]
    if subseq1 = subseq2 then
      diag <= sc_matrix[i][j] + match
    else
      diag <= sc_matrix[i][j] + mismatch
    end if
    up <= sc_matrix[i][j] + gap
    left <= sc_matrix[i][j] + gap
    if diag >= up then
      if diag>= left then
        sc_matrix[i][j] <= diag
        p_matrix[i][j] <= 5
      else
        sc_matrix[i][j] <= left
        p_matrix[i][j] <= 1
      end if
    else
      if diag>= left then
        sc_matrix[i][j] <= up
        p_matrix[i][j] <= 4
      else
        sc_matrix[i][j] <= left
        p_matrix[i][j] <= 1
    end if
  end for
end for
al_seq1 <= ""
al_seq2 <= ""
keep_going <= true
i <= seq1.length
j <= seq2.length
while keep_going = true
  if p_matrix = 0
    keep_going = false
  end if
  if p_matrix[i][j] = 5 then
    al_seq1.add(seq1[i])
    al_seq2.add(seq1[j])
    i <= i - 1
    j <= j - 1
  else if p_matrix[i][j] = 1 then
    al_seq1.add(seq1[i])
    al_seq2.add("-")
    i <= i - 1
  else if p_matrix[i][j] = 4 then
    al_seq1.add("-")
    al_seq2.add(seq1[j])
    j <= j - 1   
  end if
end while
al_seq1 <= al_seq1.reverse
al_seq2 <= al_seq2.reverse