Performance
Unlike CPython, Seq compiles to LLVM IR, which in turn compiles to native machine code. Because of this,
Seq's performance is usually comparable to that of C or C++, and can often be even better once
domain-specific compiler optimizations are applied. The Seq compiler performs static type checking, which
avoids the cost of checking types at runtime that most other Python implementations incur.
Below are several real-world benchmarks showing Seq's performance compared to hand-optimized tools.
Below are several real-world benchmarks showing Seq's performance compared to hand-optimized tools.
Smith-Waterman alignment is a standard algorithm
used in many genomics applications. This benchmark consists of aligning sequences as part of long-read mapping, and
compares Seq with minimap2. The Seq implementation makes use of the
compiler's inter-sequence alignment optimization to attain better performance.
Global alignment is an important algorithm in comparative genomics and related fields. This benchmark implements
the popular AVID global alignment tool in Seq and
compares its performance with the original.
CORA is an all-mapping tool for WGS short-reads
that exploits redundancies in the genome to find all mappings of a read. This benchmark implements CORA's index
("homology table") construction step in Seq.
mrsFAST is a perfect-sensitivity all-mapping tool based on Hamming
distance. This benchmark implements mrsFAST's single-end read mapping algorithm in Seq (chr1 results), as well as an
alternative version using an FM-index with Seq's prefetch optimization (hg19 results).
Data preprocessing has become an increasingly time-consuming step in the genomics analysis pipeline. This benchmark
implements in Seq a standard preprocessing task,
GATK's SplitNCigarReads,
and compares performance on several datasets.
"Super-Maximal Exact Matches" (SMEMs) are a central component of many genomics algorithms, ranging from alignment to
assembly. This benchmark implements BWA-MEM's SMEM algorithm in Seq. Seq's
prefetch optimization leads to better performance.
Haplotype phasing is a downstream analysis that can
provide numerous biological insights beyond raw genotypes alone.
HapTree-X is a phasing algorithm implemented entirely
in Seq, whose performance is benchmarked here relative to several widely-used phasing tools.