About this idea
The foundational, patented, technology of seqSight is seqLens. SeqLens is a purpose‑built genomic language model engineered to understand biological sequences with far greater nuance than traditional machine‑learning approaches. It is trained on two exceptionally large and evolutionarily diverse genomic datasets—one containing 19,551 reference genomes, including more than 18,000 prokaryotes totaling 115 billion nucleotides, and another balanced dataset of 1,354 genomes spanning both prokaryotes and eukaryotes with 180 billion nucleotides. These expansive corpora allow seqLens to learn patterns across deep evolutionary timescales, enabling stronger generalization across both microbial and eukaryotic biology. To build a model specifically suited for DNA, we developed five custom byte‑pair encoding tokenizers and trained 52 genomic language models to systematically evaluate how tokenization choices, architectures, hyperparameters, pooling strategies, and classification heads influence biological prediction performance. Our experiments revealed critical insights—particularly that larger vocabularies harm generalization, while carefully optimized tokenization and pooling dramatically improve downstream accuracy. This comprehensive architecture search formed the evidence base leading to the final seqLens design. At the core of seqLens is its signature technical innovation: disentangled attention combined with relative positional encoding. This architecture allows the model to separate content from positional information and to reason about the relative arrangement of DNA motifs—essential for identifying regulatory structure, functional sites, and long‑range dependencies in genomes. This design leads seqLens to outperform state‑of‑the‑art models in 13 of 19 phenotypic prediction tasks, demonstrating substantial gains in biological accuracy and interpretability. SeqLens also incorporates advanced strategies for real‑world applicability, including continual pretraining, domain‑specific adaptation, and parameter‑efficient fine‑tuning, enabling the model to rapidly specialize to new organisms, environmental contexts, or genomic challenges. We further showed that seqLens can capture evolutionary relationships, enhancing genome annotation and variant interpretation. Together, these technical innovations form the backbone of seqSight’s platform—providing scalable, biologically aligned intelligence that turns raw DNA sequences into actionable scientific insights.
Impact
seqSight leverages advanced genomic language models to transform raw biological sequences into actionable scientific insights. By applying large‑scale AI models trained on billions of nucleotides, seqSight enables unprecedented accuracy in predicting gene function, identifying pathogenic variants, and uncovering hidden patterns within genomes that traditional analytics often miss. Its platform rapidly interprets complex sequence contexts, detects subtle evolutionary signals, and generates biologically plausible and statistically testable hypotheses that accelerate research and development across biotechnology, therapeutics, and diagnostics. seqSight’s models empower users to analyze genomic data in minutes, prioritize variants with higher confidence, and reveal mechanistic insights that guide experimental design and innovation. Built for scalability and precision, seqSight is redefining how researchers and organizations extract meaning from genomic data—reducing discovery timelines, improving decision‑making, and opening new frontiers in understanding health, disease, and biological complexity.
What I'll do with $5,000
The grant will help us develop our marketing materials with a Muskegon-based marketing firm. We are very close to our first revenue and we are now well-positioned to take advantage of outstanding marketing.
Quick Bio
Keith Crandall is a distinguished serial entrepreneur in the biotechnology sector, recognized for advancing innovative solutions at the intersection of technology and strategic business development
Links
Website
  • This Registration is for voting only.

  • Use this form to register to vote. In order to submit, you'll need to follow the process starting here.
  •  
    Strength indicator
  •  
  •  

This will close in 0 seconds