overview#
With technical breakthroughs in the throughput and read-length of next-generation sequencing platforms, antibody repertoire sequencing is becoming an increasingly important tool for detailed characterization of the immune response to infection and immunization. Accordingly, there is a need for open, scalable software for the genetic analysis of repertoire-scale antibody sequence data.
We built abutils
to provide a cohesive set of tools designed for the specific challenges inherent in
working with antibody repertoire data. The components in abutils
were designed to be flexible:
equally at home when used used interactively (in a Jupyter Notebook, for example) or when
integrated into more complex programs and/or pipelines (such as abstar, which is capable of annotating
billions of antibody sequences).
core models#
To represent antibody repertoire data at varying levels of granularity, abutils provides three core models:
Sequence
, Pair
, and Lineage
. These models are heirarchical – a Lineage
is composed of one
or more Pair
objects, a Pair
is composed of one or more Sequence
objects – and contain methods
appropriate for each level of granularity.
Sequence
: model for representing a single antibody sequnce (either heavy or light chain). Provides a means to store and access abstar annotations. Includes common methods of sequence manipulation, including slicing, reverse-complement, and conversion to FASTA format. TheSequence
object is used extensively throughout the ab[x] toolkit.
Pair
: model for representing paired (heavy and light) antibody sequences. Comprised of one or moreSequence
objects.
Lineage
: model for representing an antibody clonal lineage. Comprised of one or morePair
objects. Includes methods for lineage manipulation, including generating dot alignments and UCA calculation.
tools#
In addition to the core models, abutils provides a number of commonly used functions. These functions are widely used throughput the ab[x] toolkit and are suitable for incorporation into custom pipelines or for use when performing interactive analyses:
alignment: local (Smith-Waterman), global (Needleman-Wunsch) and semi-global pairwise sequence alignment, as well as multiple sequence alignment using MAFFT or MUSCLE
clustering: identity-based sequence clustering with VSEARCH or MMseqs2
phylogeny: computing lineage phylogenies with FastTree or IgPhyML, tree drawing with baltic
plots#
abutils
provides a