Phylogenetics/Alignment Preparation/Global Alignment
Global alignment methods attempt to align the entire length of two sequences. Without models for sequence evolution, these methods are very general and can be applied to any sequence (not just biological) similarity.
The first algorithm to discuss is the Needleman-Wunsch.
requires: seq1 = first sequence; seq2 = second sequence //scoring matrices values match = 1 mismatch = -1 gap = -1 sc_matrix //initialize scoring matrix the size of which is the largest sequence length p_matrix //initialize pointing matrix the size of which is the largest sequence length sc_matrix[0][0] = 0 p_matrix[0][0] = 0 // 0 = no where; 1 = left; 2 = right; 3 = down; 4 = up; diagonal = 5; for i=0 to length of seq1 sc_matrix[0][i] = gap * i p_matrix[0][i] = 1 end for for i=0 to length of seq2 sc_matrix[i][0] = gap * i p_matrix[i][0] = 1 end for for i=0 to length of seq2 for j=0 to length of seq1 subseq1 <= seq1[j] subseq2 <= seq2[i] if subseq1 = subseq2 then diag <= sc_matrix[i][j] + match else diag <= sc_matrix[i][j] + mismatch end if up <= sc_matrix[i][j] + gap left <= sc_matrix[i][j] + gap if diag >= up then if diag>= left then sc_matrix[i][j] <= diag p_matrix[i][j] <= 5 else sc_matrix[i][j] <= left p_matrix[i][j] <= 1 end if else if diag>= left then sc_matrix[i][j] <= up p_matrix[i][j] <= 4 else sc_matrix[i][j] <= left p_matrix[i][j] <= 1 end if end for end for al_seq1 <= "" al_seq2 <= "" keep_going <= true i <= seq1.length j <= seq2.length while keep_going = true if p_matrix = 0 keep_going = false end if if p_matrix[i][j] = 5 then al_seq1.add(seq1[i]) al_seq2.add(seq1[j]) i <= i - 1 j <= j - 1 else if p_matrix[i][j] = 1 then al_seq1.add(seq1[i]) al_seq2.add("-") i <= i - 1 else if p_matrix[i][j] = 4 then al_seq1.add("-") al_seq2.add(seq1[j]) j <= j - 1 end if end while al_seq1 <= al_seq1.reverse al_seq2 <= al_seq2.reverse