abutils: utilities for AIRR analysis#

With technical breakthroughs in the throughput and read-length of next-generation sequencing platforms, adaptive immune receptor repertoire sequencing is invaluable for detailed characterization of the immune response to infection and immunization. Accordingly, there is a need for open, scalable software for the genetic analysis of adaptive immune receptor data at repertoire scale.

We built abutils to provide a cohesive set of tools that address the specific challenges inherent in working with adaptive immune receptor repertoire data. The components in abutils were designed to be flexible: equally at home when used used interactively (in a Jupyter Notebook, for example) or when integrated into more complex programs or pipelines (such as abstar, which is capable of annotating billions of adaptive immune receptor sequences).

core models#

To represent antibody repertoire data at varying levels of granularity, abutils provides three core models:

``Sequence``: model for representing a single antibody sequnce (either heavy or light chain). Provides a means to store and access abstar annotations. Includes common methods of sequence manipulation, including slicing, reverse-complement, and conversion to FASTA format. The Sequence object is used extensively throughout the ab[x] toolkit.

``Pair``: model for representing paired (heavy and light) antibody sequences. Comprised of one or more Sequence objects.

``Lineage``: model for representing an antibody clonal lineage. Comprised of one or more Pair objects. Includes methods for lineage manipulation, including generating dot alignments and UCA calculation.

These models are heirarchical – a Lineage is composed of one or more Pair objects, a Pair is composed of one or more Sequence objects – and contain methods appropriate for each level of granularity.

tools#

In addition to the core models, abutils provides a number of commonly used functions. These functions are widely used throughput the ab[x] toolkit and can be easily integrated into custom pipelines or for use when performing interactive analyses:

pairwise alignment: local (Smith-Waterman), global (Needleman-Wunsch) and semi-global pairwise sequence alignment,

multiple sequence alignment using MAFFT or MUSCLE

clustering: identity-based sequence clustering with VSEARCH, CDHIT, or MMseqs2

phylogeny: computing lineage phylogenies with FastTree or IgPhyML, tree drawing with baltic

plots#

abutils provides a

abutils: utilities for AIRR analysis#

core models#

tools#

plots#

index#