phylogeny

abutils provides utilities for computing and visualizing phylogenetic trees from sequence data. These utilities include functions for tree inference using FastTree, tree visualization with baltic, and utilities for working with phylogenetic trees in the context of antibody lineage analysis. The phylogeny module integrates sequence clustering, multiple sequence alignment, and tree inference into a cohesive and easy-to-use interface.

All phylogeny functions are accessible through the abutils.tl module, which is the recommended way to use these utilities in your code.


phylogeny utility

description

abutils.tl.fasttree()

computes a phylogenetic tree from a multiple sequence alignment using FastTree

abutils.tl.phylogeny()

creates a Phylogeny object from a list of sequences or a FASTA file

abutils.tl.Phylogeny

a class for representing and visualizing phylogenetic trees

examples

computing a tree with FastTree

Compute a phylogenetic tree from a multiple sequence alignment:

import abutils

# Compute a tree from a FASTA alignment file
tree_file = abutils.tl.fasttree(
    "my_alignment.fasta",
    tree_file="my_tree.newick",
    is_aa=False
)

# Or compute a tree from an alignment string and get the Newick tree as a string
alignment_string = """>seq1
ACGTACGTACGT
>seq2
ACGTACGTACGA
>seq3
ACATACGTACGA
"""
tree_string = abutils.tl.fasttree(alignment_string)

creating and visualizing a phylogeny

Create a Phylogeny object and visualize the tree:

import abutils
import matplotlib.pyplot as plt

# Create a Phylogeny object from a list of Sequence objects
sequences = [...]  # List of abutils.Sequence objects
phylo = abutils.tl.phylogeny(
    sequences,
    name="my_lineage",
    cluster=True,
    clustering_threshold=0.97
)

# Plot the phylogenetic tree
fig = plt.figure(figsize=(10, 8))
ax = phylo.plot(
    size_multiplier=15,
    color="steelblue",
    linewidth=1.5,
    marker="o",
    marker_edgewidth=1,
    marker_edgecolor="black"
)
plt.tight_layout()
plt.show()

customizing phylogenetic tree visualization

Customize the tree visualization with different marker sizes, colors, and layouts:

import abutils
import matplotlib.pyplot as plt

# Create a Phylogeny object
phylo = abutils.tl.phylogeny("sequences.fasta")

# Create a color mapping for specific sequences
color_dict = {
    "seq1": "red",
    "seq2": "blue",
    "seq3": "green"
}

# Create a size mapping for specific sequences
size_dict = {
    "seq1": 3,
    "seq2": 2,
    "seq3": 1
}

# Plot a radial tree with custom colors and sizes
fig = plt.figure(figsize=(10, 10))
ax = phylo.plot(
    color=color_dict,
    size=size_dict,
    radial=True,
    radial_start=0.1,
    radial_fraction=0.8,
    color_branches=True,
    marker="o",
    alpha=0.8
)
plt.tight_layout()
plt.show()

api

inference

abutils.tools.phylo.fasttree(aln: str, tree_file: str | None = None, is_aa: bool = False, fasttree_bin: str | None = None, debug: bool = False, quiet: bool = True) str

Computes a tree file from a multiple seqeunce alignment using FastTree.

Parameters:
  • aln (str) – Path to a multiple sequence alignment file, in FASTA format, or a FASTA-formatted multiple sequence alignment string. Required.

  • tree_file (str) – Path to the tree file which will be output by FastTree. If the parent directory does not exist, it will be created. If not provided, the output (a Newick-formatted tree file) will be returned as a str.

  • is_aa (bool, default=False) – Must be set to True if the input multiple sequence alignment contains amino acid sequences. Default is False, meaning FastTree will expect nucleotide sequences.

  • fasttree_bin (str, optional) – Path to the desired FastTree binary. Default is to use the version of FastTree that is bundled with abutils.

  • debug (bool, default=False) – If True, verbose output is printed. Default is False.

  • quiet (bool, default=True) – Depricated, but retained for backwards compatibility. Use debug instead.

Returns:

tree_file – Path to the tree file produced by FastTree.

Return type:

str

abutils.utils.phylogeny.igphyml(input_file: str | None = None, tree_file: str | None = None, root: str | None = None, verbose: bool = False) str

Computes a phylogenetic tree using IgPhyML.

Note

IgPhyML must be installed. It can be downloaded from https://github.com/kbhoehn/IgPhyML.

Args:

input_file (str): Path to a Phylip-formatted multiple sequence alignment. Required.

tree_file (str): Path to the output tree file.

root (str): Name of the root sequence. Required.

verbose (bool): If True, prints the standard output and standard error for each IgPhyML run.

Default is False.

abutils.utils.phylogeny.lsd(tree, output_file=None, dates_file=None, outgroup_file=None, with_constraints=True, with_weights=True, reestimate_root_position=None, quiet=True)

drawing trees

abutils.tools.phylo.phylogeny(sequences: str | Iterable[Sequence], name: str | None = None, root: str | Sequence | None = None, cluster: bool = True, clustering_threshold: float = 1.0, clustering_algo: str = 'auto', rename: dict | Callable | None = None, id_key: str | None = None, sequence_key: str | None = None) Phylogeny

Phylogenetic representation of an antibody lineage.

Parameters:
  • sequences (str or list of Sequence) – A list of abutils.Sequence objects or the path to a FASTA-formatted file. Required.

  • name (str, default=None) – Name of the lineage. If not provided, a random name will be generated using uuid.uuid4().

  • root (str or Sequence, default=None) – Root of the phylogenetic tree. Can be either a sequence ID (the root sequence must be in sequences) or an abutils.Sequence. If the provided Sequence is already in sequences, the duplicate will be ignored when constructing the tree. If not provided, the germline V-gene will be used as root.

  • cluster (bool, default=True) – Whether or not to cluster seqeunces by identity prior to alignment and tree inference.

  • clustering_threshold (float, default=1.0) – Identity threshold for clustering. Must be between 0-1. Default is 1.0, which collapses identical sequences.

  • rename (dict or Callable, default=None) –

    Used to rename sequences. Can be either a dict of the format:

    {old_name: new_name, ...}

    or a callable function that accepts the old name and returns the new name. Names not found in the dict or for which the function returns None will not be renamed. If not provided, sequences are not renamed.

  • id_key (str, default=None) – Key to retrieve the sequence ID. If not provided or missing, Sequence.id is used.

  • sequence_key (str, default=None) – Key to retrieve the sequence. If not provided or missing, Sequence.sequence is used.

Returns:

phylogeny – An abutils.Phylogeny object.

Return type:

Phylogeny

class abutils.tools.phylo.Phylogeny(sequences: Iterable[Sequence], name: str | None = None, root: str | Sequence | None = None, cluster: bool = True, clustering_threshold: float = 1.0, clustering_algo: str = 'auto', rename: dict | Callable | None = None, id_key: str | None = None, sequence_key: str | None = None)

Base phylogeny class

Phylogenetic representation of an antibody lineage.

Parameters:
  • sequences (list of Sequence) – A list of abutils.Sequence objects. Required.

  • name (str, default=None) – Name of the lineage. If not provided, a random name will be generated using uuid.uuid4().

  • root (str or Sequence, default=None) – Root of the phylogenetic tree. Can be either a sequence ID (the root sequence must be in sequences) or an abutils.Sequence. If the provided Sequence is already in sequences, the duplicate will be ignored when constructing the tree. If not provided, the germline V-gene will be used as root.

  • cluster (bool, default=True) – Whether or not to cluster seqeunces by identity prior to alignment and tree inference.

  • clustering_threshold (float, default=1.0) – Identity threshold for clustering. Must be between 0-1. Default is 1.0, which collapses identical sequences.

  • rename (dict or Callable, default=None) –

    Used to rename sequences. Can be either a dict of the format:

    {old_name: new_name, ...}

    or a callable function that accepts the old name and returns the new name. Names not found in the dict or for which the function returns None will not be renamed. If not provided, sequences are not renamed.

  • id_key (str, default=None) – Key to retrieve the sequence ID. If not provided or missing, Sequence.id is used.

  • sequence_key (str, default=None) – Key to retrieve the sequence. If not provided or missing, Sequence.sequence is used.

property tree: <MagicMock name='mock.tree' id='132796374338960'>

Baltic Tree object

property root: <MagicMock name='mock.leaf' id='132796374355152'> | ~abutils.core.sequence.Sequence | None

Root of the tree.

property sizes: dict

Returns a dict of tip sizes using clustering results. Only clusters with more than one sequence will be in the dict.

cluster()

Cluster sequences prior to alignment.

plot(size: ~typing.Callable | dict | int | float | None = None, color: ~typing.Callable | dict | ~typing.Iterable | str | None = None, alpha: float = 0.75, min_size: int = 1, size_multiplier: int | float = 10, linewidth: int | float = 2, color_branches: bool = False, x_attr: ~typing.Callable = <function Phylogeny.<lambda>>, y_attr: ~typing.Callable = <function Phylogeny.<lambda>>, connection_type: str = 'baltic', radial: bool = False, radial_start: float = 0, radial_fraction: float = 1.0, inward_space: float = 0.1, marker: str = 'o', marker_edgewidth: ~typing.Callable | dict | int | float = 0, marker_edgecolor: ~typing.Callable | dict | ~typing.Iterable | str | None = None, marker_halign: str = 'left', marker_valign: str = 'center', figsize: ~typing.Iterable[int | float] = [8, 8], show: bool = False, figfile: str | <MagicMock name='mock.Path' id='132796374322912'> | None = None, **kwargs) <MagicMock name='mock.pyplot.Axes' id='132796374484784'> | None

Plot the phylogenetic tree.

Parameters:
  • size (callable or dict or int or float or None, optional) – Size of the markers at leaf edges. If a callable, it should take a baltic leaf as input and return a size. If a dict, it should have leaf names as keys and sizes as values. If an int, or float, all markers will have the same size. If None, the cluster sizes will be used.

  • color (callable or dict or iterable or str or None, optional) – Color of the markers at leaf edges. If a callable, it should take a baltic leaf object as input and return a color. If a dict, it should have leaf names as keys and colors as values. If a str, or an iterable of RGB(A) values, all markers will have the same color. If None, all markers will be black.

  • alpha (float, optional) – Alpha value for the markers.

  • min_size (int, optional) – Minimum size of the markers. Any leaf edges with a size smaller than min_size will not have a marker.

  • size_multiplier (int or float, optional) – Multiplier for the marker sizes.

  • linewidth (int or float, optional) – Width of the tree lines.

  • color_branches (bool, optional) – Whether to color the branches of the tree. If True, the branches will be colored according to the color of the leaf edges.

  • x_attr (callable, optional) – Attribute of the baltic tree object to use for the x-axis.

  • y_attr (callable, optional) – Attribute of the baltic tree object to use for the y-axis.

  • connection_type (str, optional) – Type of connection to use. One of 'baltic', 'direct', or 'elbow'.

  • radial (bool, optional) – Whether to plot the tree in a radial fashion.

  • radial_start (float, optional) – Starting point for the radial plot, as a fraction of the circle.

  • radial_fraction (float, optional) – Fraction of the circle to use for the radial plot.

  • inward_space (float, optional) – Fraction of the circle to leave empty at the center of the radial plot.

  • marker (str, optional) –

    Marker style. See matplotlib.pyplot.scatter for more details.

    marker_edgewidthcallable or dict or int or float, optional

    Width of the marker edges. If a callable, it should take a baltic leaf as input and return a width. If a dict, it should have leaf names as keys and widths as values. If an int, or float, all markers will have the same width.

  • marker_edgecolor (callable or dict or str or iterable or None, optional) – Color of the marker edges. If a callable, it should take a baltic leaf object as input and return a color. If a dict, it should have leaf names as keys and colors as values. If a str, or an iterable of RGB(A) values, all markers will have the same color. If None, all marker edges will match the marker color.

  • marker_halign (str, optional) – Horizontal alignment of the markers. One of 'left', 'center', or 'right'.

  • marker_valign (str, optional) – Vertical alignment of the markers. One of 'top', 'center', or 'bottom'.

  • figsize (iterable of int or float, optional) – Figure size, by default [8, 8]

  • show (bool, optional) – Whether to show the figure.

  • figfile (str or Path, optional) – Path to save the figure to.