LLVM dragon


Unlike CPython, Seq compiles to LLVM IR, which in turn compiles to native machine code. Because of this, Seq's performance is usually comparable to that of C or C++, and can often be even better once domain-specific compiler optimizations are applied. The Seq compiler performs static type checking, which avoids the cost of checking types at runtime that most other Python implementations incur.

Below are several real-world benchmarks showing Seq's performance compared to hand-optimized tools.

Smith-Waterman alignment is a standard algorithm used in many genomics applications. This benchmark consists of aligning sequences as part of long-read mapping, and compares Seq with minimap2. The Seq implementation makes use of the compiler's inter-sequence alignment optimization to attain better performance.
Global alignment is an important algorithm in comparative genomics and related fields. This benchmark implements the popular AVID global alignment tool in Seq and compares its performance with the original.
CORA is an all-mapping tool for WGS short-reads that exploits redundancies in the genome to find all mappings of a read. This benchmark implements CORA's index ("homology table") construction step in Seq.
mrsFAST is a perfect-sensitivity all-mapping tool based on Hamming distance. This benchmark implements mrsFAST's single-end read mapping algorithm in Seq (chr1 results), as well as an alternative version using an FM-index with Seq's prefetch optimization (hg19 results).
Data preprocessing has become an increasingly time-consuming step in the genomics analysis pipeline. This benchmark implements in Seq a standard preprocessing task, GATK's SplitNCigarReads, and compares performance on several datasets.
"Super-Maximal Exact Matches" (SMEMs) are a central component of many genomics algorithms, ranging from alignment to assembly. This benchmark implements BWA-MEM's SMEM algorithm in Seq. Seq's prefetch optimization leads to better performance.
Haplotype phasing is a downstream analysis that can provide numerous biological insights beyond raw genotypes alone. HapTree-X is a phasing algorithm implemented entirely in Seq, whose performance is benchmarked here relative to several widely-used phasing tools.