Bounty Board - Coming Soon!

Testing ideas and product moats through hacking challenges and benchmarking frameworks.

Why Bounties?

Braking things down, or in the reverse way, building things from scratch is at the heart of our learning and due diligence process for validating technologies. These challenges help us understand the underlying technology and help you test real products.

Two Types of Challenges

Hacking Challenge

Rebuild a product's core feature with open-source tools and a minimum budget.

Benchmark / Testing Framework

Build a framework to evaluate a product's real-world performance and defensibility.

How It Works

Each bounty runs as a hacking challenge. Pick a bounty, build your prototype or benchmark within the deadline, and submit your work. Our judges review every qualified submission and select the best — the winner takes home the prize.

1

Pick a Bounty

Choose an open challenge that matches your skills.

2

Build & Submit

Hack on it, then submit your source code and schedule a demo call.

3

Judging & Prizes

Our judges evaluate all submissions and award the prize to the best entry.

What We Look For in a Qualified Submission

  • Source code of the MVP or benchmark you built
  • A call with our team to show us the demo and walk us through how you implemented & evaluated it

Open Bounties(3)

OpenReplication Challenge

Replicate a Real-Time Collaborative Editor

$750

Build a functional real-time collaborative document editor (like Notion or Google Docs) using only open-source tools. The goal is to test whether the core collaboration engine represents a defensible moat or if CRDTs and existing libraries make this commodity infrastructure.

Success Criteria

Achieve 60-70% feature parity on real-time text editing with concurrent users, basic formatting, and conflict resolution. Must handle at least 5 simultaneous editors with <200ms sync latency.

AdvancedDue May 31, 2026

Suggested Stack

YjsHocuspocusTiptapNext.jsWebSocket
Full-StackReal-TimeCRDTs
OpenBenchmark / Testing

Benchmark AI Code Generation Accuracy

$1,000

Design and implement a reproducible benchmarking framework to evaluate AI code generation tools (e.g. Copilot, Cursor, Codeium) on real-world tasks. We want to understand whether proprietary fine-tuning provides meaningful accuracy gains over open-source base models.

Success Criteria

Framework must include at least 50 test cases across 3+ languages, measure pass@1 and pass@5 rates, and produce a publishable comparison report with statistical significance.

IntermediateDue Jun 15, 2026

Suggested Stack

PythonHumanEvalSWE-benchDockerJupyter
AI/MLDeveloper ToolsBenchmarking
OpenBenchmark / Testing

RAG Pipeline Performance Framework

$800

Build a testing framework that evaluates Retrieval-Augmented Generation pipelines across dimensions like retrieval accuracy, latency, hallucination rate, and cost-per-query. This will help us assess whether startups building RAG products have genuine technical differentiation or if they are thin wrappers around commodity infrastructure.

Success Criteria

Framework must test at least 3 RAG configurations (naive chunking, semantic search, hybrid), measure retrieval precision/recall, answer accuracy via LLM-as-judge, and produce cost analysis per 1K queries.

AdvancedDue Jul 1, 2026

Suggested Stack

LangChainLlamaIndexChromaDBRagasPython
AI/MLRAGInfrastructure

In Progress(1)

In ProgressReplication Challenge

Vibe-Code a Vertical SaaS MVP

$500 + podcast feature

Using AI-assisted coding tools (Cursor, v0, Bolt, etc.), attempt to replicate the core workflow of a vertical SaaS product targeting property management. Document your process, time spent, and total cost. This tests whether AI-assisted development has eroded the moat of domain-specific software.

Success Criteria

Functional prototype covering tenant management, maintenance requests, and rent tracking. Target 60% feature coverage in under 40 hours of work and <$100 in API/hosting costs.

Intermediate

Suggested Stack

CursorNext.jsSupabaseTailwindv0
Vibe CodingSaaSAI-Assisted Dev