pairwise alignment¶
abutils can perform several types of pairwise alignments, including global, local, and semi-global alignments.
All pairwise alignment functions return a subclass of the abutils.tl.PairwiseAlignment object. The return
classes for each pairwise alignment type are identical, with the exception of the alignment_function and alignment_type
properties, so the user can easily switch between alignment methods with minimal changes to their code.
alignment type |
function |
|---|---|
local |
|
global |
|
semi-global |
examples¶
local sequence alignment
The query and target sequences for all abutils pairwise alignment functions can be an abutils.Sequence`
object, or anything accepted by abutils.Sequence. The following example uses sequence strings,
performs local alignment, and prints the alignment to the console.
Note
If sequences are provided as strings, a random sequence ID will be assigned to each sequence.
import abutils
# input sequences, as strings
seq1 = 'ATGCATGCATGC'
seq2 = 'ATGCATGCATGC'
# calculate and print the alignment
aln = abutils.tl.local_alignment(seq1, seq2)
print(aln)
global sequence alignment with custom scoring parameters
All abutils pairwise alignment functions accept custom scoring parameters. These parameters are:
match: the score for a match (default is3)
mismatch: the score for a mismatch (default is-2)
gap_open: the penalty for opening a gap (default is5)
gap_extend: the penalty for extending a gap (default is2)
Alignment functions can also accept any parasail similarity matrix, passed
to the matrix argument. The following example uses the blosum62 matrix:
import abutils
# input sequences, as strings
seq1 = 'ATGCATGCATGC'
seq2 = 'ATGCATGCATGC'
# calculate and print the alignment
aln = abutils.tl.global_alignment(
seq1,
seq2,
matrix='blosum62',
)
semi-global alignment against multiple targets and selecting the best match
All abutils pairwise alignment functions can align the same query sequence against multiple target sequences
using the targets argument. This provides a moderate speed increase and avoids the need to loop through the targets.
The following example aligns a single query sequence against multiple target sequences, sorts the resulting list of
alignments (which, by default, sorts by alignment score), and selects the top scoring alignment.
import abutils
# query and target sequences
query = 'ATGCATGCATGC'
targets = abutils.io.read_fasta('path/to/targets.fasta')
alns = abutils.tl.semi_global_alignment(
query,
targets=targets
)
# get the highest scoring alignment
best_aln = sorted(alns, reverse=True)[0]
api¶
- abutils.tl.local_alignment(query: str | ~Bio.SeqRecord.SeqRecord | ~abutils.core.sequence.Sequence | ~typing.Iterable, target: str | ~Bio.SeqRecord.SeqRecord | ~abutils.core.sequence.Sequence | ~typing.Iterable | None = None, targets: ~typing.Iterable | None = None, match: int = 3, mismatch: int = -2, gap_open: int = -5, gap_extend: int = -2, matrix: str | ~parasail.bindings_v2.Matrix | None = None, alignment_function: ~typing.Callable = <function sw_trace_striped_16>, aa: bool = False, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None) LocalAlignment | Iterable¶
Striped Smith-Waterman local pairwise sequence alignment.
- Parameters:
query (str, SeqRecord, Sequence, or list) –
- Query sequence. Can be any of the following:
a nucleotide or amino acid sequence, as a string
a Biopython
SeqRecordobjectan abutils
Sequenceobjecta
listortupleof the format[seq_id, sequence]
target (str, SeqRecord, Sequence, or list, default=None) – Target sequence. Can be anything accepted by query. One of target or targets must be provided.
target – Iterable of multiple target sequences. A
listortupleof anything accepted by query. One of target or targets must be provided.match (int, default=3) – Match score. Should be a positive integer.
mismatch (int, default=-2) – Mismatch score. Typically should be less than or equal to
0.gap_open (int, default=-5) – Gap open score. Must be less than or equal to
0.gap_extend (int, default=-2) – Gap extension score. Must be less than or equal to
0.matrix (str, dict or parasail.Matrix, default=None) –
- Scoring matrix. Can be either:
the name of a
parasailbuilt-in sbustitution matrix.path to a matrix file (in a format accepted by
parasail).a
parasail.Matrixobject
If not provided, a matrix will be created using match and mismatch.
alignment_function (Callable, default=parasail.sw_trace_striped_16) –
parasaillocal alignment function to be used for alignment.aa (bool, default=False) – Not used. Retained for backwards compatibility.
gap_open_penalty (int, default=None) – Depricated, but retained for backwards compatibility. Use gap_open insteaed.
gap_extend_penalty (int, default=None) – Depricated, but retained for backwards compatibility. Use gap_exted insteaed.
- Returns:
alignment – If a single target sequence is provided (via
target), a singleLocalAlignmentobject will be returned. If multiple target sequences are supplied (viatargets), a list ofLocalAlignmentobjects will be returned.- Return type:
LocalAlignment or iterable
- abutils.tl.global_alignment(query: str | ~Bio.SeqRecord.SeqRecord | ~abutils.core.sequence.Sequence | ~typing.Iterable, target: str | ~Bio.SeqRecord.SeqRecord | ~abutils.core.sequence.Sequence | ~typing.Iterable | None = None, targets: ~typing.Iterable | None = None, match: int = 3, mismatch: int = -2, gap_open: int = -5, gap_extend: int = -2, matrix: str | ~parasail.bindings_v2.Matrix | None = None, alignment_function: ~typing.Callable = <function nw_trace_striped_16>, aa: bool = False, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None) GlobalAlignment | Iterable¶
Needleman-Wunch global pairwise sequence alignment.
- Parameters:
query (str, SeqRecord, Sequence, or list) –
- Query sequence. Can be any of the following:
a nucleotide or amino acid sequence, as a string
a Biopython
SeqRecordobjectan abutils
Sequenceobjecta
listortupleof the format[seq_id, sequence]
target (str, SeqRecord, Sequence, or list, default=None) – Target sequence. Can be anything accepted by query. One of target or targets must be provided.
target – Iterable of multiple target sequences. A
listortupleof anything accepted by query. One of target or targets must be provided.match (int, default=3) – Match score. Should be a positive integer.
mismatch (int, default=-2) – Mismatch score. Typically should be less than or equal to
0.gap_open (int, default=-5) – Gap open score. Must be less than or equal to
0.gap_extend (int, default=-2) – Gap extension score. Must be less than or equal to
0.matrix (str, dict or parasail.Matrix, default=None) –
- Scoring matrix. Can be either:
the name of a
parasailbuilt-in substitution matrix.path to a matrix file (in a format accepted by
parasail).a
parasail.Matrixobject
If not provided, a matrix will be created using match and mismatch.
alignment_function (Callable, default=parasail.nw_trace_striped_16) –
parasailglobal alignment function to be used for alignment.aa (bool, default=False) – Not used. Retained for backwards compatibility.
gap_open_penalty (int, default=None) – Depricated, but retained for backwards compatibility. Use gap_open insteaed.
gap_extend_penalty (int, default=None) – Depricated, but retained for backwards compatibility. Use gap_exted insteaed.
- Returns:
alignment – If a single target sequence is provided (via
target), a singleGlobalAlignmentobject will be returned. If multiple target sequences are supplied (viatargets), a list ofGlobalAlignmentobjects will be returned.- Return type:
GlobalAlignment or iterable
- abutils.tl.semiglobal_alignment(query: str | ~Bio.SeqRecord.SeqRecord | ~abutils.core.sequence.Sequence | ~typing.Iterable, target: str | ~Bio.SeqRecord.SeqRecord | ~abutils.core.sequence.Sequence | ~typing.Iterable | None = None, targets: ~typing.Iterable | None = None, match: int = 3, mismatch: int = -2, gap_open: int = -5, gap_extend: int = -2, matrix: str | ~parasail.bindings_v2.Matrix | None = None, alignment_function: ~typing.Callable = <function sg_trace_striped_16>, aa: bool = False, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None) SemiGlobalAlignment | Iterable¶
Semi-global pairwise sequence alignment.
- Parameters:
query (str, SeqRecord, Sequence, or list) –
- Query sequence. Can be any of the following:
a nucleotide or amino acid sequence, as a string
a Biopython
SeqRecordobjectan abutils
Sequenceobjecta
listortupleof the format[seq_id, sequence]
target (str, SeqRecord, Sequence, or list, default=None) – Target sequence. Can be anything accepted by query. One of target or targets must be provided.
target – Iterable of multiple target sequences. A
listortupleof anything accepted by query. One of target or targets must be provided.match (int, default=3) – Match score. Should be a positive integer.
mismatch (int, default=-2) – Mismatch score. Typically should be less than or equal to
0.gap_open (int, default=-5) – Gap open score. Must be less than or equal to
0.gap_extend (int, default=-2) – Gap extension score. Must be less than or equal to
0.matrix (str, dict or parasail.Matrix, default=None) –
- Scoring matrix. Can be either:
the name of a
parasailbuilt-in substitution matrix.path to a matrix file (in a format accepted by
parasail).a
parasail.Matrixobject
If not provided, a matrix will be created using match and mismatch.
alignment_function (Callable, default=parasail.sw_trace_striped_16) –
parasailsemi-global alignment function to be used for alignment.aa (bool, default=False) – Not used. Retained for backwards compatibility.
gap_open_penalty (int, default=None) – Depricated, but retained for backwards compatibility. Use gap_open insteaed.
gap_extend_penalty (int, default=None) – Depricated, but retained for backwards compatibility. Use gap_exted insteaed.
- Returns:
alignment – If a single target sequence is provided (via
target), a singleSemiGlobalAlignmentobject will be returned. If multiple target sequences are supplied (viatargets), a list ofSemiGlobalAlignmentobjects will be returned.- Return type:
SemiGlobalAlignment or iterable
- class abutils.tl.PairwiseAlignment(query: Sequence, target: Sequence, match: int = 3, mismatch: int = -2, gap_open: int = 5, gap_extend: int = 2, matrix: Matrix | None = None, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None)¶
Base class for local, global, and semi-global pairwise alignments.
Note
All comparisons between
PairwiseAlignmentobjects are performed using thescoreattribute. This was done so that sorting alignments like so:list_of_alignments.sort(reverse=True)
produces a sorted list of alignments from the highest alignment score to the lowest.
- query(Sequence)¶
Sequenceobject.- Type:
The input query sequence, as an AbTools
- target(Sequence)¶
Sequenceobject.- Type:
The input target sequence, as an AbTools
- target_id(str)¶
- Type:
ID of the target sequence.
- raw_query¶
- Type:
The raw query, before conversion to a
Sequence.
- raw_target¶
- Type:
The raw target, before conversion to a
Sequence.
- __init__(query: Sequence, target: Sequence, match: int = 3, mismatch: int = -2, gap_open: int = 5, gap_extend: int = 2, matrix: Matrix | None = None, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None)¶
- property cigar: CIGAR¶
Alignment CIGAR, as a
CIGARobject.- Returns:
CIGAR object representing the alignment.
- Return type:
CIGAR
- property target_id: str¶
Sequence ID of the target sequence.
- class abutils.tl.LocalAlignment(query: ~abutils.core.sequence.Sequence, target: ~abutils.core.sequence.Sequence, match: int = 3, mismatch: int = -2, gap_open: int = 5, gap_extend: int = 2, matrix: str | ~parasail.bindings_v2.Matrix | None = None, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None, alignment_function: ~typing.Callable = <function sw_trace_striped_16>)¶
docstring for LocalAlignment
- class abutils.tl.GlobalAlignment(query: ~abutils.core.sequence.Sequence, target: ~abutils.core.sequence.Sequence, match: int = 3, mismatch: int = -2, gap_open: int = 5, gap_extend: int = 2, matrix: str | ~parasail.bindings_v2.Matrix | None = None, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None, alignment_function: ~typing.Callable = <function nw_trace_striped_16>)¶
docstring for LocalAlignment
- class abutils.tl.SemiGlobalAlignment(query: ~abutils.core.sequence.Sequence, target: ~abutils.core.sequence.Sequence, match: int = 3, mismatch: int = -2, gap_open: int = 5, gap_extend: int = 2, matrix: str | ~parasail.bindings_v2.Matrix | None = None, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None, alignment_function: ~typing.Callable = <function sg_trace_striped_16>)¶
docstring for LocalAlignment