Entering edit mode

Coghlan, Avril
▴
190

@coghlan-avril-3810
Last seen 7.1 years ago

Dear Patrick, all,
I have noticed what I think may perhaps be a bug in the
pairwiseAlignment() function in Biostrings.
I want to use it to calculate the optimal pairwise alignment between
these two strings:
s1 = "TCAGTTGCCAAACCCGCT"
s2 = "AGGGTTGACATCCGTTTT"
I want to use a match score of 10, a mismatch penalty of -10, a gap
opening penalty of -12 and a gap extension penalty of -3. I typed:
s1 <- "TCAGTTGCCAAACCCGCT"
s2 <- "AGGGTTGACATCCGTTTT"
library(Biostrings)
sigma <- nucleotideSubstitutionMatrix(match = 10, mismatch = -10,
baseOnly = TRUE)
pairwiseAlignment(s1, s2, substitutionMatrix = sigma, gapOpening =
-12, gapExtension = -3,
scoreOnly = FALSE, type="local")
The output I get is:
Local PairwiseAlignedFixedSubject (1 of 1)
pattern: [4] GTTGCCAA
subject: [4] GTTGACAT
score: 52
If you look at the alignment printed out as the optimal alignment, it
has 6 matches and 2 mismatches and so should have a score of
(6*10)+(2*-10) = 60-20 = 40. This is different from the score printed
out (52).
When I use my own (home-made) Smith-Waterman R function to calculate
the optimal alignment, it gives the optimal alignment score as 52 (in
agreement with Biostrings), but gives a different optimal alignment:
GTTGCCAAACCCG
GTTG--ACATCCG
If you look at this alignment, it has 9 matches, two mismatches, and
one gap of length two, and so the alignment score should be
(9*10)+(2*-10)+(-12-3-3) = 52, which is the score that Biostrings says
that the optimal alignment should have.
Therefore, I am wondering whether Biostrings is calculating the
optimal alignment and optimal alignment score correctly (52 here), but
is printing out the wrong alignment as the optimal one ?
I would be grateful for your comments on this, I'm wondering if I have
misunderstood something.
One last small comment on the pairwiseAlignment() function - it took
me a while to realise that when you specify a "gap opening penalty"
(eg. -12 here) and "gap extension penalty" (-3 here) in its arguments,
if you have a gap of length two then the first position in the gap is
given a score of -12-3=-15. This took me a little while to figure out
(I don't think it's clear from the Biostrings documentation), since in
most textbooks the "gap opening penalty" means the score given to the
first position in a gap (eg. see Deonier "Computational Genome
Analysis page 161) and so I had expected that specifying a "gap
opening penalty" of -12 would mean that the score for the first
position in the gap would be -12. It would be good if you made this
clearer in the documentation for the pairwiseAlignment() function.
Kind Regards,
Avril
Dept. Microbiology
University College Cork
Ireland
[[alternative HTML version deleted]]