pair

Pair objects are used to represent one or Sequence objects that together represent a single mAb or TCR.


instantiation

An individual Pair object can be instantiated directly from a list of Sequence objects that belong to the same mAb or TCR:

import abutils

# create a pair from a list of sequences
pair = abutils.Pair([heavy_sequence, light_sequence])

Or, more commonly, from a larger list of Sequence objects for which the appropriate pairing needs to be determined from the sequence names, using assign_pairs():

import abutils

# batch create pairs from a list of annotated sequences
sequences = abutils.io.read_airr('path/to/sequences.tsv')
pairs = abutils.tl.assign_pairs(sequences)

assign_pairs() makes a specific accomodation for sequence data derived from 10x Genomics single-cell sequencing, in which the CSV-formatted annotation file generated by CellRanger (typically named "filtered_contig_annotations.csv") can be supplied and this additoinal metadata will be added to Sequence objects before they are assembled into Pair objects:

import abutils

# batch create pairs from a list of annotated 10x Genomics sequences
sequences = abutils.io.read_airr('path/to/sequences.tsv')
pairs = abutils.tl.assign_pairs(
    sequences,
    tenx_annot_file='path/to/filtered_contig_annotations.csv'
)

api

class abutils.core.pair.Pair(sequences: Iterable[dict | Sequence], name: str | None = None, chain_selection_func: Callable | None = None, properties: dict | None = None)

Holds a one or more Sequence objects, corresponding to a paired mAb or TCR.

Initialize a Pair object.

Parameters:
  • sequences (Iterable[Union[dict, Sequence]]) – A list of sequence objects, each containing sequence information.

  • name (Optional[str], default=None) – The name of the pair.

  • chain_selection_func (Optional[Callable], default=None) – A function that takes a list of sequences and orders them to determine the “correct” heavy and light chains in cases for which multiple heavy or light chains exist. If not provided, chains are prioritized in the order provided.

  • properties (Optional[dict], default=None) – A dictionary of additional properties to add to the Pair object.

abutils.core.pair.assign_pairs(seqs: Iterable[dict | Sequence], id_key: str = 'sequence_id', delim: str | None = None, delim_occurance: int = 1, pairs_only: bool = False, chain_selection_func: Callable | None = None, tenx_annot_file: str | None = None) Iterable[Pair]

Assigns sequences to the appropriate mAb pair, based on the sequence name.

Parameters:
  • seqs (Iterable[Union[dict, Sequence]]) – List of sequence objects, of the format returned by querying a MongoDB containing Abstar output.

  • id_key (str, default="sequence_id") – The dict key of the field to be used to group the sequences into pairs. Default is ‘seq_id’

  • delim (Optional[str], default=None) – An optional delimiter used to truncate the contents of the ::name:: field. Default is None, which results in no name truncation.

  • delim_occurance (int, default=1) – The occurance of the delimiter at which to trim. Trimming is performed as delim.join(name.split(delim)[:delim_occurance]), so setting delim_occurance to -1 will trucate after the last occurance of delim. Default is 1.

  • pairs_only (bool, default=False) – Setting to True results in only truly paired sequences (pair.is_pair == True) will be returned. Default is False.

  • chain_selection_func (Optional[Callable], default=None) – A function that takes a list of sequences and returns a single sequence. Default is None, which results in the first sequence in the list being returned.

  • tenx_annot_file (Optional[str], default=None) – A path to a 10x Genomics annotations file. If provided, the UMIs and 10x annotations will be added to the sequence annotations.

Returns:

pairs – A list of Pair objects, one for each mAb pair.

Return type:

Iterable[Pair]