clonify

abutils provides functions for assigning antibody sequences to clonal lineages using the clonify_ [Briney16] algorithm. This algorithm uses a combination of CDR3 sequence similarity and shared somatic hypermutation patterns to group sequences into B cell clonal lineages.

The primary function is abutils.tl.clonify(), which handles lineage assignment at scale, with support for different input/output formats and parallel processing.


lineage assignment method

function

Clonify algorithm

abutils.tl.clonify()

Pairwise distance calculation

abutils.tl.pairwise_distance()

examples

basic lineage assignment

clonify() can accept a variety of input formats, including paths to AIRR-formatted TSV files, Parquet files, or lists of abutils.Sequence objects.

import abutils

# clonal assignment using default parameters
lineages = abutils.tl.clonify(
    sequences='path/to/airr_data.tsv',
    output_path='path/to/output_with_lineages.tsv',
    verbose=True
)

customizing lineage assignment parameters

You can customize the parameters that control lineage assignment sensitivity and specificity.

import abutils

# customize lineage assignment parameters
lineages = abutils.tl.clonify(
    sequences='path/to/airr_data.tsv',
    output_path='path/to/output_with_lineages.tsv',
    distance_cutoff=0.32,             # stricter distance threshold
    shared_mutation_bonus=0.4,        # increased bonus for shared mutations
    length_penalty_multiplier=2.5,    # increased penalty for CDR3 length differences
    group_by_v=True,                  # group by V-gene before assignment
    group_by_j=True,                  # group by J-gene before assignment
    verbose=True
)

working with paired heavy and light chain data

For paired data, you can use light chain information in the lineage assignment process.

import abutils

# lineage assignment with paired heavy/light chain data
lineages = abutils.tl.clonify(
    sequences='path/to/paired_data.parquet',
    output_path='path/to/output_with_lineages.parquet',
    output_fmt='parquet',
    group_by_light_chain_vj=True,     # also group by light chain V/J genes
    n_processes=8                     # use 8 processes for parallel computation
)

api

abutils.tools.clonify()
[Briney16]

Bryan Briney, Khoa Le, Jiang Zhu, and Dennis R Burton. Clonify: unseeded antibody lineage assignment from next-generation sequencing data. Scientific Reports 2016. https://doi.org/10.1038/srep23901