Bounty Board - Coming Soon!
Testing ideas and product moats through hacking challenges and benchmarking frameworks.
Why Bounties?
Braking things down, or in the reverse way, building things from scratch is at the heart of our learning and due diligence process for validating technologies. These challenges help us understand the underlying technology and help you test real products.
Two Types of Challenges
Hacking Challenge
Rebuild a product's core feature with open-source tools and a minimum budget.
Benchmark / Testing Framework
Build a framework to evaluate a product's real-world performance and defensibility.
How It Works
Each bounty runs as a hacking challenge. Pick a bounty, build your prototype or benchmark within the deadline, and submit your work. Our judges review every qualified submission and select the best — the winner takes home the prize.
Pick a Bounty
Choose an open challenge that matches your skills.
Build & Submit
Hack on it, then submit your source code and schedule a demo call.
Judging & Prizes
Our judges evaluate all submissions and award the prize to the best entry.
What We Look For in a Qualified Submission
- Source code of the MVP or benchmark you built
- A call with our team to show us the demo and walk us through how you implemented & evaluated it
Open Bounties(3)
Replicate a Real-Time Collaborative Editor
Build a functional real-time collaborative document editor (like Notion or Google Docs) using only open-source tools. The goal is to test whether the core collaboration engine represents a defensible moat or if CRDTs and existing libraries make this commodity infrastructure.
Success Criteria
Achieve 60-70% feature parity on real-time text editing with concurrent users, basic formatting, and conflict resolution. Must handle at least 5 simultaneous editors with <200ms sync latency.
Suggested Stack
Benchmark AI Code Generation Accuracy
Design and implement a reproducible benchmarking framework to evaluate AI code generation tools (e.g. Copilot, Cursor, Codeium) on real-world tasks. We want to understand whether proprietary fine-tuning provides meaningful accuracy gains over open-source base models.
Success Criteria
Framework must include at least 50 test cases across 3+ languages, measure pass@1 and pass@5 rates, and produce a publishable comparison report with statistical significance.
Suggested Stack
RAG Pipeline Performance Framework
Build a testing framework that evaluates Retrieval-Augmented Generation pipelines across dimensions like retrieval accuracy, latency, hallucination rate, and cost-per-query. This will help us assess whether startups building RAG products have genuine technical differentiation or if they are thin wrappers around commodity infrastructure.
Success Criteria
Framework must test at least 3 RAG configurations (naive chunking, semantic search, hybrid), measure retrieval precision/recall, answer accuracy via LLM-as-judge, and produce cost analysis per 1K queries.
Suggested Stack
In Progress(1)
Vibe-Code a Vertical SaaS MVP
Using AI-assisted coding tools (Cursor, v0, Bolt, etc.), attempt to replicate the core workflow of a vertical SaaS product targeting property management. Document your process, time spent, and total cost. This tests whether AI-assisted development has eroded the moat of domain-specific software.
Success Criteria
Functional prototype covering tenant management, maintenance requests, and rent tracking. Target 60% feature coverage in under 40 hours of work and <$100 in API/hosting costs.
Suggested Stack