abutils: utilities for AIRR analysis#

Antibody repertoire sequencing is an increasingly important tool for detailed characterization of the immune response to infection and immunization. We built abutils to provide a cohesive set of tools designed for the specific challenges inherent in working with antibody repertoire data. The components in abutils were designed to be flexible: equally at home when used used interactively (in a Jupyter Notebook, for example) or when integrated into more complex programs and/or pipelines (such as abstar, which is capable of annotating billions of antibody sequences).


core models#

To represent antibody repertoire data at varying levels of granularity, abutils provides three core models:

  • Sequence: model for representing a single antibody sequnce (either heavy or light chain). Provides a means to store and access abstar annotations. Includes common methods of sequence manipulation, including slicing, reverse-complement, and conversion to FASTA format. The Sequence object is used extensively throughout the ab[x] toolkit.

  • Pair: model for representing paired (heavy and light) antibody sequences. Comprised of one or more Sequence objects. Heavily used in scab, which is our toolkit for analyzing adaptive immune single cell datasets.

  • Lineage: model for representing an antibody clonal lineage. Comprised of one or more Pair objects. Includes methods for lineage manipulation, including generating dot alignments and UCA calculation.

These models are heirarchical – a Lineage is composed of one or more Pair objects, a Pair is composed of one or more Sequence objects – and contain methods appropriate for each level of granularity.


tools (abutils.tl)#

In addition to the core models, abutils provides a number of commonly used functions. These functions are widely used throughput the ab[x] toolkit and can be easily integrated into custom pipelines or for use when performing interactive analyses:

All of the tool functions are accessible via abutils.tl.


plots (abutils.pl)#

abutils provides a number of plotting functions for visualizing antibody repertoire data. These functions are built on top of matplotlib and seaborn and are designed to be easily integrated into custom analyses or pipelines. Plotting funcions are desogmed tp work with Sequence, Pair, and Lineage objects, and fully support AIRR-C annotation formats for plotting adaptive immune receptor features like CDR3 length distributions and germline gene usage.

All of the plotting functions are accessible via abutils.pl

utilities#

abutils also provides a number of utility functions that are generally useful when working with antibody repertoire data. These include funcions for monitoring multiprocessing jobs, creating and modifying color palettes, and others.

  • jobs: functions for monitoring multiprocessing jobs

  • colors: functions for working with colors and color palettes

  • path: functions for working with file paths


index#