path¶
The abutils.io module contains functions for working with file and directory paths. These are
mainly convenience functions to facilitate common tasks like creating directories, deleting files,
renaming files, and splitting/concatenating files.
function |
description |
|---|---|
Creates a directory |
|
Lists files in a directory |
|
Renames a file |
|
Deletes a file |
|
Concatenates multiple files into a single file |
|
Splits a parquet file into multiple files |
|
Splits a FASTA or FASTQ file into multiple files |
|
Splits a FASTA file into multiple files |
|
Splits a FASTQ file into multiple files |
api¶
- abutils.io.make_dir(directory: str) None¶
Makes a directory, if it doesn’t already exist.
- Parameters:
directory (str) – Path to a directory.
- abutils.io.list_files(directory: str, extension: str | Iterable | None = None, recursive: bool = False, match: str | None = None, ignore_dot_files: bool = True) Iterable[str]¶
Lists files in a given directory.
- Parameters:
directory (str) – Path to a directory. If a file path is passed instead, the returned list of files will contain only that file path.
extension (str) – If supplied, only files that end with the specificied extension(s) will be returned. Can be either a string or a list of strings. Extension evaluation is case-insensitive and can match complex extensions (e.g. ‘.fastq.gz’). Default is
None, which returns all files in the directory, regardless of extension.recursive (bool, default=False) – If
True, the directory will be searched recursively, and all files in all subdirectories will be returned.match (str, optional) – If supplied, only files that match the specified pattern will be returned. Regular expressions are supported.
ignore_dot_files (bool, default=True) – If
True, dot files (hidden files) will be ignored.
- Return type:
Iterable[str]
- abutils.io.rename_file(file: str, new_name: str) None¶
Renames a file.
- Parameters:
file (str) – Path to the file to be renamed.
new_name (str) – New name for the file.
- abutils.io.concatenate_files(files: Iterable[str], output_file: str) None¶
Concatenates multiple files into a single file.
- Parameters:
files (Iterable[str]) – Iterable of file paths.
output_file (str) – Path to the output file.
- abutils.io.split_parquet(parquet_file: str, output_directory: str, num_rows: int = 500, num_splits: int | None = None, split_prefix: str = 'chunk_', start_numbering_at: int = 0) Iterable[str]¶
Splits a parquet file into multiple files.
- Parameters:
parquet_file (str) – Path to the parquet file to be split.
output_directory (str) – Path to the directory where the split files will be saved. If the directory does not exist, it will be created.
num_rows (int, optional) – Number of rows per split file. Default is 500. If
num_splitsis supplied, this argument is ignored.num_splits (int, optional) – Number of split files to create. If not supplied,
num_rowsis used to determine the number of split files.split_prefix (str, optional) – Prefix for the split files, which is followed directly by the file number. Default is “chunk_”.
start_numbering_at (int, optional) – Start numbering the split files at this number. Default is 0.
- Returns:
Iterable of file paths for the split files.
- Return type:
Iterable[str]
- abutils.io.split_fastx(fastx_file: str, output_directory: str, chunksize: int = 500, start_numbering_at: int = 0, fmt: str | None = None) Iterable[str]¶
Splits a FASTA or FASTQ file into multiple files, each containing a specified number of sequences.
- Parameters:
fastx_file (str) – Path to the FASTA or FASTQ file to be split.
output_directory (str) – Path to the directory where the split files will be saved. If the directory does not exist, it will be created.
chunksize (int, optional) – Number of sequences per split file. Default is 500. The last file may contain fewer sequences than this number.
start_numbering_at (int, optional) – Start numbering the split files at this number. Default is 0.
fmt (str, optional) – Format of the input file. If not supplied, the format will be determined automatically.
- Returns:
Iterable of file paths for the split files.
- Return type:
Iterable[str]
- abutils.io.split_fasta(fasta_file: str, output_directory: str, chunksize: int = 500, start_numbering_at: int = 0) Iterable[str]¶
Splits a FASTA or FASTQ file into multiple files, each containing a specified number of sequences.
- Parameters:
fasta_file (str) – Path to the FASTA file to be split.
output_directory (str) – Path to the directory where the split files will be saved. If the directory does not exist, it will be created.
chunksize (int, optional) – Number of sequences per split file. Default is 500. The last file may contain fewer sequences than this number.
start_numbering_at (int, optional) – Start numbering the split files at this number. Default is 0.
- Returns:
Iterable of file paths for the split files.
- Return type:
Iterable[str]
- abutils.io.split_fastq(fastq_file: str, output_directory: str, chunksize: int = 500, start_numbering_at: int = 0) Iterable[str]¶
Splits a FASTA or FASTQ file into multiple files, each containing a specified number of sequences.
- Parameters:
fastq_file (str) – Path to the FASTQ file to be split.
output_directory (str) – Path to the directory where the split files will be saved. If the directory does not exist, it will be created.
chunksize (int, optional) – Number of sequences per split file. Default is 500. The last file may contain fewer sequences than this number.
start_numbering_at (int, optional) – Start numbering the split files at this number. Default is 0.
- Returns:
Iterable of file paths for the split files.
- Return type:
Iterable[str]