Why We Need ExperimentOps

(and an unconference to put it into action)

Jul 25, 2025

If you're building with foundation models, steady progress is a challenge. A prototype crushes offline evals, then real users expose gaps you never measured. The "next best" move becomes an avalanche of options (some contradict—hello, Simpson's paradox). Every lever is entangled: that "simple" LoRA finetune experiment can depend on base architecture, data distributions, and pre-training context.

Many teams are flying blind. There's no institutional memory of what was tried, which metrics mattered, or how design choices interact. Without that shared knowledge, decisions are guesses.

Teams are seeing breakthroughs by systematizing experimentation and institutionalizing knowledge:

Capture every tweak (prompt, finetune, model swap, retrieval change).
Qualify experiment candidates using experiment history, latest arXiv discoveries, online metrics
Make the learnings searchable, reusable, and actionable so that with each iteration, they compound.

Some teams already operate this way -- check out the recent engineering posts from Yelp, Netflix, and Stripe.

So… why a conference?

Because the insights on how you define "good," interpret eval results, or qualify an experiment to try next are rarely thoroughly documented in blogs or white papers. We want the unvarnished versions: failures, hacks, and what finally moved the needle.

Experiment 2025 — October 30, 9:00 am–1:00 pm, San Francisco — is a half‑day, science‑fair/open space unconference to collectively learn what's worked, what's failed, and what's next.

What you'll do

Co-create the agenda. Pitch the pain points or wins you want dissected; the group clusters around the ones that resonate most.
Swap raw experiment stories in breakouts. Compare dataset choices, evaluation suites, infrastructure trade-offs, finetuning tips—no slides required, just notebooks, logs, or whiteboard sketches.
Extract patterns and pitfalls. Each session wraps with "keep, tweak, ditch" takeaways you can test on Monday.

Bring

One interesting experiment story.
One metric you're trying to optimize.
One hypothesis you've been thinking about.

Seats are limited - grab a spot today!

P.S. For more context, the Generationship podcast episode #40 is a good primer on how we think about ExperimentOps.

Myx'd Results

Discussion about this post

Ready for more?