search¶
abutils provides functions for searching sequences against a database of target sequences
using MMseqs2. This allows for fast and efficient similarity search, which is useful for tasks
like sequence identification, annotation, and homology detection.
search method |
function |
|---|---|
MMseqs2 |
abutils.tools.mmseqs_search() |
examples¶
search with MMseqs2
The MMseqs2 search function can accept a path to a FASTA file, an MMseqs2 database, a abutils.Sequence object,
or an iterable of abutils.Sequence objects for both query and target sequences.
import abutils
# search sequences against a target database
results = abutils.tools.mmseqs_search(
query='path/to/query_sequences.fasta',
target='path/to/target_sequences.fasta',
output_path='path/to/output.tsv'
)
customize search parameters
MMseqs2 search can be customized with various parameters to control sensitivity, format, and performance.
import abutils
# search with customized parameters
results = abutils.tools.mmseqs_search(
query='path/to/query_sequences.fasta',
target='path/to/target_sequences.fasta',
output_path='path/to/output.tsv',
search_type=1, # amino acid search
max_seqs=100, # maximum hits per query
max_evalue=1e-5, # stricter E-value cutoff
sensitivity=7.5, # higher sensitivity
format_mode=4, # BLAST-TAB + column headers
threads=8 # use 8 threads
)
customizing output format
You can customize the output format to include specific columns.
import abutils
# customize the output format
results = abutils.tools.mmseqs_search(
query='path/to/query_sequences.fasta',
target='path/to/target_sequences.fasta',
output_path='path/to/output.tsv',
format_mode=4, # BLAST-TAB + column headers
format_output="query,target,evalue,pident,qcov,tcov"
)