phylogeny¶
abutils provides utilities for computing and visualizing phylogenetic trees from sequence data. These utilities include
functions for tree inference using FastTree, tree visualization with baltic, and utilities for working with phylogenetic trees
in the context of antibody lineage analysis. The phylogeny module integrates sequence clustering, multiple sequence alignment,
and tree inference into a cohesive and easy-to-use interface.
All phylogeny functions are accessible through the abutils.tl module, which is the recommended way to
use these utilities in your code.
phylogeny utility |
description |
|---|---|
computes a phylogenetic tree from a multiple sequence alignment using FastTree |
|
creates a Phylogeny object from a list of sequences or a FASTA file |
|
a class for representing and visualizing phylogenetic trees |
examples¶
computing a tree with FastTree
Compute a phylogenetic tree from a multiple sequence alignment:
import abutils
# Compute a tree from a FASTA alignment file
tree_file = abutils.tl.fasttree(
"my_alignment.fasta",
tree_file="my_tree.newick",
is_aa=False
)
# Or compute a tree from an alignment string and get the Newick tree as a string
alignment_string = """>seq1
ACGTACGTACGT
>seq2
ACGTACGTACGA
>seq3
ACATACGTACGA
"""
tree_string = abutils.tl.fasttree(alignment_string)
creating and visualizing a phylogeny
Create a Phylogeny object and visualize the tree:
import abutils
import matplotlib.pyplot as plt
# Create a Phylogeny object from a list of Sequence objects
sequences = [...] # List of abutils.Sequence objects
phylo = abutils.tl.phylogeny(
sequences,
name="my_lineage",
cluster=True,
clustering_threshold=0.97
)
# Plot the phylogenetic tree
fig = plt.figure(figsize=(10, 8))
ax = phylo.plot(
size_multiplier=15,
color="steelblue",
linewidth=1.5,
marker="o",
marker_edgewidth=1,
marker_edgecolor="black"
)
plt.tight_layout()
plt.show()
customizing phylogenetic tree visualization
Customize the tree visualization with different marker sizes, colors, and layouts:
import abutils
import matplotlib.pyplot as plt
# Create a Phylogeny object
phylo = abutils.tl.phylogeny("sequences.fasta")
# Create a color mapping for specific sequences
color_dict = {
"seq1": "red",
"seq2": "blue",
"seq3": "green"
}
# Create a size mapping for specific sequences
size_dict = {
"seq1": 3,
"seq2": 2,
"seq3": 1
}
# Plot a radial tree with custom colors and sizes
fig = plt.figure(figsize=(10, 10))
ax = phylo.plot(
color=color_dict,
size=size_dict,
radial=True,
radial_start=0.1,
radial_fraction=0.8,
color_branches=True,
marker="o",
alpha=0.8
)
plt.tight_layout()
plt.show()
api¶
inference¶
- abutils.tools.phylo.fasttree(aln: str, tree_file: str | None = None, is_aa: bool = False, fasttree_bin: str | None = None, debug: bool = False, quiet: bool = True) str¶
Computes a tree file from a multiple seqeunce alignment using FastTree.
- Parameters:
aln (str) – Path to a multiple sequence alignment file, in FASTA format, or a FASTA-formatted multiple sequence alignment string. Required.
tree_file (str) – Path to the tree file which will be output by FastTree. If the parent directory does not exist, it will be created. If not provided, the output (a Newick-formatted tree file) will be returned as a
str.is_aa (bool, default=False) – Must be set to
Trueif the input multiple sequence alignment contains amino acid sequences. Default isFalse, meaning FastTree will expect nucleotide sequences.fasttree_bin (str, optional) – Path to the desired FastTree binary. Default is to use the version of FastTree that is bundled with
abutils.debug (bool, default=False) – If
True, verbose output is printed. Default is False.quiet (bool, default=True) – Depricated, but retained for backwards compatibility. Use debug instead.
- Returns:
tree_file – Path to the tree file produced by FastTree.
- Return type:
str
- abutils.utils.phylogeny.igphyml(input_file: str | None = None, tree_file: str | None = None, root: str | None = None, verbose: bool = False) str¶
Computes a phylogenetic tree using IgPhyML.
Note
IgPhyML must be installed. It can be downloaded from https://github.com/kbhoehn/IgPhyML.
Args:
input_file (str): Path to a Phylip-formatted multiple sequence alignment. Required.
tree_file (str): Path to the output tree file.
root (str): Name of the root sequence. Required.
- verbose (bool): If True, prints the standard output and standard error for each IgPhyML run.
Default is False.
- abutils.utils.phylogeny.lsd(tree, output_file=None, dates_file=None, outgroup_file=None, with_constraints=True, with_weights=True, reestimate_root_position=None, quiet=True)¶
drawing trees¶
- abutils.tools.phylo.phylogeny(sequences: str | Iterable[Sequence], name: str | None = None, root: str | Sequence | None = None, cluster: bool = True, clustering_threshold: float = 1.0, clustering_algo: str = 'auto', rename: dict | Callable | None = None, id_key: str | None = None, sequence_key: str | None = None) Phylogeny¶
Phylogenetic representation of an antibody lineage.
- Parameters:
sequences (str or list of Sequence) – A list of
abutils.Sequenceobjects or the path to a FASTA-formatted file. Required.name (str, default=None) – Name of the lineage. If not provided, a random name will be generated using
uuid.uuid4().root (str or Sequence, default=None) – Root of the phylogenetic tree. Can be either a sequence ID (the root sequence must be in sequences) or an
abutils.Sequence. If the providedSequenceis already in sequences, the duplicate will be ignored when constructing the tree. If not provided, the germline V-gene will be used as root.cluster (bool, default=True) – Whether or not to cluster seqeunces by identity prior to alignment and tree inference.
clustering_threshold (float, default=1.0) – Identity threshold for clustering. Must be between 0-1. Default is
1.0, which collapses identical sequences.rename (dict or Callable, default=None) –
- Used to rename sequences. Can be either a
dictof the format: {old_name: new_name, ...}
or a callable function that accepts the old name and returns the new name. Names not found in the
dictor for which the function returnsNonewill not be renamed. If not provided, sequences are not renamed.- Used to rename sequences. Can be either a
id_key (str, default=None) – Key to retrieve the sequence ID. If not provided or missing,
Sequence.idis used.sequence_key (str, default=None) – Key to retrieve the sequence. If not provided or missing,
Sequence.sequenceis used.
- Returns:
phylogeny – An
abutils.Phylogenyobject.- Return type:
- class abutils.tools.phylo.Phylogeny(sequences: Iterable[Sequence], name: str | None = None, root: str | Sequence | None = None, cluster: bool = True, clustering_threshold: float = 1.0, clustering_algo: str = 'auto', rename: dict | Callable | None = None, id_key: str | None = None, sequence_key: str | None = None)¶
Base phylogeny class
Phylogenetic representation of an antibody lineage.
- Parameters:
sequences (list of Sequence) – A list of
abutils.Sequenceobjects. Required.name (str, default=None) – Name of the lineage. If not provided, a random name will be generated using
uuid.uuid4().root (str or Sequence, default=None) – Root of the phylogenetic tree. Can be either a sequence ID (the root sequence must be in sequences) or an
abutils.Sequence. If the providedSequenceis already in sequences, the duplicate will be ignored when constructing the tree. If not provided, the germline V-gene will be used as root.cluster (bool, default=True) – Whether or not to cluster seqeunces by identity prior to alignment and tree inference.
clustering_threshold (float, default=1.0) – Identity threshold for clustering. Must be between 0-1. Default is
1.0, which collapses identical sequences.rename (dict or Callable, default=None) –
- Used to rename sequences. Can be either a
dictof the format: {old_name: new_name, ...}
or a callable function that accepts the old name and returns the new name. Names not found in the
dictor for which the function returnsNonewill not be renamed. If not provided, sequences are not renamed.- Used to rename sequences. Can be either a
id_key (str, default=None) – Key to retrieve the sequence ID. If not provided or missing,
Sequence.idis used.sequence_key (str, default=None) – Key to retrieve the sequence. If not provided or missing,
Sequence.sequenceis used.
- property tree: <MagicMock name='mock.tree' id='132796374338960'>¶
Baltic
Treeobject
- property root: <MagicMock name='mock.leaf' id='132796374355152'> | ~abutils.core.sequence.Sequence | None¶
Root of the tree.
- property sizes: dict¶
Returns a
dictof tip sizes using clustering results. Only clusters with more than one sequence will be in thedict.
- cluster()¶
Cluster sequences prior to alignment.
- plot(size: ~typing.Callable | dict | int | float | None = None, color: ~typing.Callable | dict | ~typing.Iterable | str | None = None, alpha: float = 0.75, min_size: int = 1, size_multiplier: int | float = 10, linewidth: int | float = 2, color_branches: bool = False, x_attr: ~typing.Callable = <function Phylogeny.<lambda>>, y_attr: ~typing.Callable = <function Phylogeny.<lambda>>, connection_type: str = 'baltic', radial: bool = False, radial_start: float = 0, radial_fraction: float = 1.0, inward_space: float = 0.1, marker: str = 'o', marker_edgewidth: ~typing.Callable | dict | int | float = 0, marker_edgecolor: ~typing.Callable | dict | ~typing.Iterable | str | None = None, marker_halign: str = 'left', marker_valign: str = 'center', figsize: ~typing.Iterable[int | float] = [8, 8], show: bool = False, figfile: str | <MagicMock name='mock.Path' id='132796374322912'> | None = None, **kwargs) <MagicMock name='mock.pyplot.Axes' id='132796374484784'> | None¶
Plot the phylogenetic tree.
- Parameters:
size (callable or dict or int or float or None, optional) – Size of the markers at leaf edges. If a callable, it should take a
balticleaf as input and return a size. If adict, it should have leaf names as keys and sizes as values. If anint, orfloat, all markers will have the same size. IfNone, the cluster sizes will be used.color (callable or dict or iterable or str or None, optional) – Color of the markers at leaf edges. If a callable, it should take a
balticleaf object as input and return a color. If adict, it should have leaf names as keys and colors as values. If astr, or an iterable of RGB(A) values, all markers will have the same color. IfNone, all markers will be black.alpha (float, optional) – Alpha value for the markers.
min_size (int, optional) – Minimum size of the markers. Any leaf edges with a size smaller than min_size will not have a marker.
size_multiplier (int or float, optional) – Multiplier for the marker sizes.
linewidth (int or float, optional) – Width of the tree lines.
color_branches (bool, optional) – Whether to color the branches of the tree. If
True, the branches will be colored according to the color of the leaf edges.x_attr (callable, optional) – Attribute of the
baltictree object to use for the x-axis.y_attr (callable, optional) – Attribute of the
baltictree object to use for the y-axis.connection_type (str, optional) – Type of connection to use. One of
'baltic','direct', or'elbow'.radial (bool, optional) – Whether to plot the tree in a radial fashion.
radial_start (float, optional) – Starting point for the radial plot, as a fraction of the circle.
radial_fraction (float, optional) – Fraction of the circle to use for the radial plot.
inward_space (float, optional) – Fraction of the circle to leave empty at the center of the radial plot.
marker (str, optional) –
Marker style. See
matplotlib.pyplot.scatterfor more details.- marker_edgewidthcallable or dict or int or float, optional
Width of the marker edges. If a callable, it should take a
balticleaf as input and return a width. If adict, it should have leaf names as keys and widths as values. If anint, orfloat, all markers will have the same width.
marker_edgecolor (callable or dict or str or iterable or None, optional) – Color of the marker edges. If a callable, it should take a
balticleaf object as input and return a color. If adict, it should have leaf names as keys and colors as values. If astr, or an iterable of RGB(A) values, all markers will have the same color. IfNone, all marker edges will match the marker color.marker_halign (str, optional) – Horizontal alignment of the markers. One of
'left','center', or'right'.marker_valign (str, optional) – Vertical alignment of the markers. One of
'top','center', or'bottom'.figsize (iterable of int or float, optional) – Figure size, by default [8, 8]
show (bool, optional) – Whether to show the figure.
figfile (str or Path, optional) – Path to save the figure to.