Bounty Board
Pressure-test startup moats through replication challenges and benchmarking frameworks.
Why Bounties?
Independent validation is a critical part of our due diligence. If a scrappy prototype built with open-source tools and vibe coding can achieve 60–70% of a startup's core product performance, it suggests the moat may not be deep enough. These challenges help us — and you — pressure-test real products.
Two Types of Challenges
Replication Challenge
Rebuild a product's core feature with open-source tools and a minimum budget.
Benchmark / Testing Framework
Build a framework to evaluate a product's real-world performance and defensibility.
How It Works
Each bounty runs as a hacking challenge. Pick a bounty, build your prototype or benchmark within the deadline, and submit your work. Our judges review every qualified submission and select the best — the winner takes home the prize.
Pick a Bounty
Choose an open challenge that matches your skills.
Build & Submit
Hack on it, then submit your source code and schedule a demo call.
Judging & Prizes
Our judges evaluate all submissions and award the prize to the best entry.
What We Look For in a Qualified Submission
- Source code of the MVP or benchmark you built
- A call with our team to show us the demo and walk us through how you implemented & evaluated it
Open Bounties(3)
Replicate a Real-Time Collaborative Editor
Build a functional real-time collaborative document editor (like Notion or Google Docs) using only open-source tools. The goal is to test whether the core collaboration engine represents a defensible moat or if CRDTs and existing libraries make this commodity infrastructure.
Success Criteria
Achieve 60-70% feature parity on real-time text editing with concurrent users, basic formatting, and conflict resolution. Must handle at least 5 simultaneous editors with <200ms sync latency.
Suggested Stack
Benchmark AI Code Generation Accuracy
Design and implement a reproducible benchmarking framework to evaluate AI code generation tools (e.g. Copilot, Cursor, Codeium) on real-world tasks. We want to understand whether proprietary fine-tuning provides meaningful accuracy gains over open-source base models.
Success Criteria
Framework must include at least 50 test cases across 3+ languages, measure pass@1 and pass@5 rates, and produce a publishable comparison report with statistical significance.
Suggested Stack
RAG Pipeline Performance Framework
Build a testing framework that evaluates Retrieval-Augmented Generation pipelines across dimensions like retrieval accuracy, latency, hallucination rate, and cost-per-query. This will help us assess whether startups building RAG products have genuine technical differentiation or if they are thin wrappers around commodity infrastructure.
Success Criteria
Framework must test at least 3 RAG configurations (naive chunking, semantic search, hybrid), measure retrieval precision/recall, answer accuracy via LLM-as-judge, and produce cost analysis per 1K queries.
Suggested Stack
In Progress(1)
Vibe-Code a Vertical SaaS MVP
Using AI-assisted coding tools (Cursor, v0, Bolt, etc.), attempt to replicate the core workflow of a vertical SaaS product targeting property management. Document your process, time spent, and total cost. This tests whether AI-assisted development has eroded the moat of domain-specific software.
Success Criteria
Functional prototype covering tenant management, maintenance requests, and rent tracking. Target 60% feature coverage in under 40 hours of work and <$100 in API/hosting costs.
Suggested Stack