path

The abutils.io module contains functions for working with file and directory paths. These are mainly convenience functions to facilitate common tasks like creating directories, deleting files, renaming files, and splitting/concatenating files.


function

description

abutils.io.make_dir()

Creates a directory

abutils.io.list_files()

Lists files in a directory

abutils.io.rename_file()

Renames a file

abutils.io.delete_file()

Deletes a file

abutils.io.concatenate_files()

Concatenates multiple files into a single file

abutils.io.split_parquet()

Splits a parquet file into multiple files

abutils.io.split_fastx()

Splits a FASTA or FASTQ file into multiple files

abutils.io.split_fasta()

Splits a FASTA file into multiple files

abutils.io.split_fastq()

Splits a FASTQ file into multiple files

api

abutils.io.make_dir(directory: str) None

Makes a directory, if it doesn’t already exist.

Parameters:

directory (str) – Path to a directory.

abutils.io.list_files(directory: str, extension: str | Iterable | None = None, recursive: bool = False, match: str | None = None, ignore_dot_files: bool = True) Iterable[str]

Lists files in a given directory.

Parameters:
  • directory (str) – Path to a directory. If a file path is passed instead, the returned list of files will contain only that file path.

  • extension (str) – If supplied, only files that end with the specificied extension(s) will be returned. Can be either a string or a list of strings. Extension evaluation is case-insensitive and can match complex extensions (e.g. ‘.fastq.gz’). Default is None, which returns all files in the directory, regardless of extension.

  • recursive (bool, default=False) – If True, the directory will be searched recursively, and all files in all subdirectories will be returned.

  • match (str, optional) – If supplied, only files that match the specified pattern will be returned. Regular expressions are supported.

  • ignore_dot_files (bool, default=True) – If True, dot files (hidden files) will be ignored.

Return type:

Iterable[str]

abutils.io.rename_file(file: str, new_name: str) None

Renames a file.

Parameters:
  • file (str) – Path to the file to be renamed.

  • new_name (str) – New name for the file.

abutils.io.concatenate_files(files: Iterable[str], output_file: str) None

Concatenates multiple files into a single file.

Parameters:
  • files (Iterable[str]) – Iterable of file paths.

  • output_file (str) – Path to the output file.

abutils.io.split_parquet(parquet_file: str, output_directory: str, num_rows: int = 500, num_splits: int | None = None, split_prefix: str = 'chunk_', start_numbering_at: int = 0) Iterable[str]

Splits a parquet file into multiple files.

Parameters:
  • parquet_file (str) – Path to the parquet file to be split.

  • output_directory (str) – Path to the directory where the split files will be saved. If the directory does not exist, it will be created.

  • num_rows (int, optional) – Number of rows per split file. Default is 500. If num_splits is supplied, this argument is ignored.

  • num_splits (int, optional) – Number of split files to create. If not supplied, num_rows is used to determine the number of split files.

  • split_prefix (str, optional) – Prefix for the split files, which is followed directly by the file number. Default is “chunk_”.

  • start_numbering_at (int, optional) – Start numbering the split files at this number. Default is 0.

Returns:

Iterable of file paths for the split files.

Return type:

Iterable[str]

abutils.io.split_fastx(fastx_file: str, output_directory: str, chunksize: int = 500, start_numbering_at: int = 0, fmt: str | None = None) Iterable[str]

Splits a FASTA or FASTQ file into multiple files, each containing a specified number of sequences.

Parameters:
  • fastx_file (str) – Path to the FASTA or FASTQ file to be split.

  • output_directory (str) – Path to the directory where the split files will be saved. If the directory does not exist, it will be created.

  • chunksize (int, optional) – Number of sequences per split file. Default is 500. The last file may contain fewer sequences than this number.

  • start_numbering_at (int, optional) – Start numbering the split files at this number. Default is 0.

  • fmt (str, optional) – Format of the input file. If not supplied, the format will be determined automatically.

Returns:

Iterable of file paths for the split files.

Return type:

Iterable[str]

abutils.io.split_fasta(fasta_file: str, output_directory: str, chunksize: int = 500, start_numbering_at: int = 0) Iterable[str]

Splits a FASTA or FASTQ file into multiple files, each containing a specified number of sequences.

Parameters:
  • fasta_file (str) – Path to the FASTA file to be split.

  • output_directory (str) – Path to the directory where the split files will be saved. If the directory does not exist, it will be created.

  • chunksize (int, optional) – Number of sequences per split file. Default is 500. The last file may contain fewer sequences than this number.

  • start_numbering_at (int, optional) – Start numbering the split files at this number. Default is 0.

Returns:

Iterable of file paths for the split files.

Return type:

Iterable[str]

abutils.io.split_fastq(fastq_file: str, output_directory: str, chunksize: int = 500, start_numbering_at: int = 0) Iterable[str]

Splits a FASTA or FASTQ file into multiple files, each containing a specified number of sequences.

Parameters:
  • fastq_file (str) – Path to the FASTQ file to be split.

  • output_directory (str) – Path to the directory where the split files will be saved. If the directory does not exist, it will be created.

  • chunksize (int, optional) – Number of sequences per split file. Default is 500. The last file may contain fewer sequences than this number.

  • start_numbering_at (int, optional) – Start numbering the split files at this number. Default is 0.

Returns:

Iterable of file paths for the split files.

Return type:

Iterable[str]