pairwise alignment

abutils can perform several types of pairwise alignments, including global, local, and semi-global alignments. All pairwise alignment functions return a subclass of the abutils.tl.PairwiseAlignment object. The return classes for each pairwise alignment type are identical, with the exception of the alignment_function and alignment_type properties, so the user can easily switch between alignment methods with minimal changes to their code.


examples

local sequence alignment

The query and target sequences for all abutils pairwise alignment functions can be an abutils.Sequence` object, or anything accepted by abutils.Sequence. The following example uses sequence strings, performs local alignment, and prints the alignment to the console.

Note

If sequences are provided as strings, a random sequence ID will be assigned to each sequence.

import abutils

# input sequences, as strings
seq1 = 'ATGCATGCATGC'
seq2 = 'ATGCATGCATGC'

# calculate and print the alignment
aln = abutils.tl.local_alignment(seq1, seq2)
print(aln)

global sequence alignment with custom scoring parameters

All abutils pairwise alignment functions accept custom scoring parameters. These parameters are:

  • match: the score for a match (default is 3)

  • mismatch: the score for a mismatch (default is -2)

  • gap_open: the penalty for opening a gap (default is 5)

  • gap_extend: the penalty for extending a gap (default is 2)

Alignment functions can also accept any parasail similarity matrix, passed to the matrix argument. The following example uses the blosum62 matrix:

import abutils

# input sequences, as strings
seq1 = 'ATGCATGCATGC'
seq2 = 'ATGCATGCATGC'

# calculate and print the alignment
aln = abutils.tl.global_alignment(
    seq1,
    seq2,
    matrix='blosum62',
)

semi-global alignment against multiple targets and selecting the best match

All abutils pairwise alignment functions can align the same query sequence against multiple target sequences using the targets argument. This provides a moderate speed increase and avoids the need to loop through the targets. The following example aligns a single query sequence against multiple target sequences, sorts the resulting list of alignments (which, by default, sorts by alignment score), and selects the top scoring alignment.

import abutils

# query and target sequences
query = 'ATGCATGCATGC'
targets = abutils.io.read_fasta('path/to/targets.fasta')

alns = abutils.tl.semi_global_alignment(
    query,
    targets=targets
)

# get the highest scoring alignment
best_aln = sorted(alns, reverse=True)[0]

api

abutils.tl.local_alignment(query: str | ~Bio.SeqRecord.SeqRecord | ~abutils.core.sequence.Sequence | ~typing.Iterable, target: str | ~Bio.SeqRecord.SeqRecord | ~abutils.core.sequence.Sequence | ~typing.Iterable | None = None, targets: ~typing.Iterable | None = None, match: int = 3, mismatch: int = -2, gap_open: int = -5, gap_extend: int = -2, matrix: str | ~parasail.bindings_v2.Matrix | None = None, alignment_function: ~typing.Callable = <function sw_trace_striped_16>, aa: bool = False, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None) LocalAlignment | Iterable

Striped Smith-Waterman local pairwise sequence alignment.

Parameters:
  • query (str, SeqRecord, Sequence, or list) –

    Query sequence. Can be any of the following:
    1. a nucleotide or amino acid sequence, as a string

    2. a Biopython SeqRecord object

    3. an abutils Sequence object

    4. a list or tuple of the format [seq_id, sequence]

  • target (str, SeqRecord, Sequence, or list, default=None) – Target sequence. Can be anything accepted by query. One of target or targets must be provided.

  • target – Iterable of multiple target sequences. A list or tuple of anything accepted by query. One of target or targets must be provided.

  • match (int, default=3) – Match score. Should be a positive integer.

  • mismatch (int, default=-2) – Mismatch score. Typically should be less than or equal to 0.

  • gap_open (int, default=-5) – Gap open score. Must be less than or equal to 0.

  • gap_extend (int, default=-2) – Gap extension score. Must be less than or equal to 0.

  • matrix (str, dict or parasail.Matrix, default=None) –

    Scoring matrix. Can be either:

    If not provided, a matrix will be created using match and mismatch.

  • alignment_function (Callable, default=parasail.sw_trace_striped_16) – parasail local alignment function to be used for alignment.

  • aa (bool, default=False) – Not used. Retained for backwards compatibility.

  • gap_open_penalty (int, default=None) – Depricated, but retained for backwards compatibility. Use gap_open insteaed.

  • gap_extend_penalty (int, default=None) – Depricated, but retained for backwards compatibility. Use gap_exted insteaed.

Returns:

alignment – If a single target sequence is provided (via target), a single LocalAlignment object will be returned. If multiple target sequences are supplied (via targets), a list of LocalAlignment objects will be returned.

Return type:

LocalAlignment or iterable

abutils.tl.global_alignment(query: str | ~Bio.SeqRecord.SeqRecord | ~abutils.core.sequence.Sequence | ~typing.Iterable, target: str | ~Bio.SeqRecord.SeqRecord | ~abutils.core.sequence.Sequence | ~typing.Iterable | None = None, targets: ~typing.Iterable | None = None, match: int = 3, mismatch: int = -2, gap_open: int = -5, gap_extend: int = -2, matrix: str | ~parasail.bindings_v2.Matrix | None = None, alignment_function: ~typing.Callable = <function nw_trace_striped_16>, aa: bool = False, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None) GlobalAlignment | Iterable

Needleman-Wunch global pairwise sequence alignment.

Parameters:
  • query (str, SeqRecord, Sequence, or list) –

    Query sequence. Can be any of the following:
    1. a nucleotide or amino acid sequence, as a string

    2. a Biopython SeqRecord object

    3. an abutils Sequence object

    4. a list or tuple of the format [seq_id, sequence]

  • target (str, SeqRecord, Sequence, or list, default=None) – Target sequence. Can be anything accepted by query. One of target or targets must be provided.

  • target – Iterable of multiple target sequences. A list or tuple of anything accepted by query. One of target or targets must be provided.

  • match (int, default=3) – Match score. Should be a positive integer.

  • mismatch (int, default=-2) – Mismatch score. Typically should be less than or equal to 0.

  • gap_open (int, default=-5) – Gap open score. Must be less than or equal to 0.

  • gap_extend (int, default=-2) – Gap extension score. Must be less than or equal to 0.

  • matrix (str, dict or parasail.Matrix, default=None) –

    Scoring matrix. Can be either:

    If not provided, a matrix will be created using match and mismatch.

  • alignment_function (Callable, default=parasail.nw_trace_striped_16) – parasail global alignment function to be used for alignment.

  • aa (bool, default=False) – Not used. Retained for backwards compatibility.

  • gap_open_penalty (int, default=None) – Depricated, but retained for backwards compatibility. Use gap_open insteaed.

  • gap_extend_penalty (int, default=None) – Depricated, but retained for backwards compatibility. Use gap_exted insteaed.

Returns:

alignment – If a single target sequence is provided (via target), a single GlobalAlignment object will be returned. If multiple target sequences are supplied (via targets), a list of GlobalAlignment objects will be returned.

Return type:

GlobalAlignment or iterable

abutils.tl.semiglobal_alignment(query: str | ~Bio.SeqRecord.SeqRecord | ~abutils.core.sequence.Sequence | ~typing.Iterable, target: str | ~Bio.SeqRecord.SeqRecord | ~abutils.core.sequence.Sequence | ~typing.Iterable | None = None, targets: ~typing.Iterable | None = None, match: int = 3, mismatch: int = -2, gap_open: int = -5, gap_extend: int = -2, matrix: str | ~parasail.bindings_v2.Matrix | None = None, alignment_function: ~typing.Callable = <function sg_trace_striped_16>, aa: bool = False, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None) SemiGlobalAlignment | Iterable

Semi-global pairwise sequence alignment.

Parameters:
  • query (str, SeqRecord, Sequence, or list) –

    Query sequence. Can be any of the following:
    1. a nucleotide or amino acid sequence, as a string

    2. a Biopython SeqRecord object

    3. an abutils Sequence object

    4. a list or tuple of the format [seq_id, sequence]

  • target (str, SeqRecord, Sequence, or list, default=None) – Target sequence. Can be anything accepted by query. One of target or targets must be provided.

  • target – Iterable of multiple target sequences. A list or tuple of anything accepted by query. One of target or targets must be provided.

  • match (int, default=3) – Match score. Should be a positive integer.

  • mismatch (int, default=-2) – Mismatch score. Typically should be less than or equal to 0.

  • gap_open (int, default=-5) – Gap open score. Must be less than or equal to 0.

  • gap_extend (int, default=-2) – Gap extension score. Must be less than or equal to 0.

  • matrix (str, dict or parasail.Matrix, default=None) –

    Scoring matrix. Can be either:

    If not provided, a matrix will be created using match and mismatch.

  • alignment_function (Callable, default=parasail.sw_trace_striped_16) – parasail semi-global alignment function to be used for alignment.

  • aa (bool, default=False) – Not used. Retained for backwards compatibility.

  • gap_open_penalty (int, default=None) – Depricated, but retained for backwards compatibility. Use gap_open insteaed.

  • gap_extend_penalty (int, default=None) – Depricated, but retained for backwards compatibility. Use gap_exted insteaed.

Returns:

alignment – If a single target sequence is provided (via target), a single SemiGlobalAlignment object will be returned. If multiple target sequences are supplied (via targets), a list of SemiGlobalAlignment objects will be returned.

Return type:

SemiGlobalAlignment or iterable


class abutils.tl.PairwiseAlignment(query: Sequence, target: Sequence, match: int = 3, mismatch: int = -2, gap_open: int = 5, gap_extend: int = 2, matrix: Matrix | None = None, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None)

Base class for local, global, and semi-global pairwise alignments.

Note

All comparisons between PairwiseAlignment objects are performed using the score attribute. This was done so that sorting alignments like so:

list_of_alignments.sort(reverse=True)

produces a sorted list of alignments from the highest alignment score to the lowest.

query(Sequence)

Sequence object.

Type:

The input query sequence, as an AbTools

target(Sequence)

Sequence object.

Type:

The input target sequence, as an AbTools

target_id(str)
Type:

ID of the target sequence.

raw_query
Type:

The raw query, before conversion to a Sequence.

raw_target
Type:

The raw target, before conversion to a Sequence.

__init__(query: Sequence, target: Sequence, match: int = 3, mismatch: int = -2, gap_open: int = 5, gap_extend: int = 2, matrix: Matrix | None = None, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None)
property cigar: CIGAR

Alignment CIGAR, as a CIGAR object.

Returns:

CIGAR object representing the alignment.

Return type:

CIGAR

property target_id: str

Sequence ID of the target sequence.


class abutils.tl.LocalAlignment(query: ~abutils.core.sequence.Sequence, target: ~abutils.core.sequence.Sequence, match: int = 3, mismatch: int = -2, gap_open: int = 5, gap_extend: int = 2, matrix: str | ~parasail.bindings_v2.Matrix | None = None, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None, alignment_function: ~typing.Callable = <function sw_trace_striped_16>)

docstring for LocalAlignment

class abutils.tl.GlobalAlignment(query: ~abutils.core.sequence.Sequence, target: ~abutils.core.sequence.Sequence, match: int = 3, mismatch: int = -2, gap_open: int = 5, gap_extend: int = 2, matrix: str | ~parasail.bindings_v2.Matrix | None = None, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None, alignment_function: ~typing.Callable = <function nw_trace_striped_16>)

docstring for LocalAlignment

class abutils.tl.SemiGlobalAlignment(query: ~abutils.core.sequence.Sequence, target: ~abutils.core.sequence.Sequence, match: int = 3, mismatch: int = -2, gap_open: int = 5, gap_extend: int = 2, matrix: str | ~parasail.bindings_v2.Matrix | None = None, gap_open_penalty: int | None = None, gap_extend_penalty: int | None = None, alignment_function: ~typing.Callable = <function sg_trace_striped_16>)

docstring for LocalAlignment