sequence I/O

abutils provides a set of functions for reading and writing sequence data to and from various file formats. Additionally, we can convert lists of Pair and Sequence objects to and from Pandas or Polars DataFrames.


sequence annotations

abutils follows the AIRR-C standard for sequence annotations. In tabular format, such as tab-delimited (the official AIRR format), CSV, or Parquet, sequence annotations appear as follows, with one sequence per row:

sequence_id

sequence

sequence_aa

sequence1

ATCG…

EVQLVE…

sequence2

ATCG…

QVQLVE…

sequence3

ATCG…

EVQLVE…


pair annotations

abutils uses a custom extentension of the AIRR-C standard for pair annotations. Each row contains a heavy/light chain pair. All AIRR-C fields are supported for each chain, with heavy chain annotation fields appended with ":0" and light chain annotation fields appended with ":1". Additionally, a name field is included to allow for naming the pair independently of either sequence chain:

name

sequence_id:0

sequence:0

sequence_id:1

sequence:1

pair1

sequence1_heavy

ATGC…

sequence1_light

ATGC…

pair2

sequence2_heavy

ATGC…

sequence2_light

ATGC…

pair3

sequence3_heavy

ATGC…

sequence3_light

ATGC…


read

format

function

notes

FASTA/Q

read_fastx()

returns a list of Sequence objects

parse_fastx()

yields single Sequence objects

FASTA

read_fasta()

returns a list of Sequence objects

parse_fasta()

yields single Sequence objects

FASTQ

read_fastq()

returns a list of Sequence objects

parse_fastq()

yields single Sequence objects

AIRR

read_airr()

only supports Sequence objects

Parquet

read_parquet()

supports Sequence or Pair objects

CSV

read_csv()

supports Sequence or Pair objects

write

format

function

notes

FASTA

to_fasta()

supports Sequence or Pair objects

FASTQ

to_fastq()

supports Sequence or Pair objects

AIRR

to_airr()

only supports Sequence objects

Parquet

to_parquet()

supports Sequence or Pair objects

CSV

to_csv()

supports Sequence or Pair objects

convert

format

function

notes

Pandas

to_pandas()

supports Sequence or Pair objects

from_pandas()

supports Sequence or Pair objects

Polars

to_polars()

supports Sequence or Pair objects

from_polars()

supports Sequence or Pair objects