The AI PM

Tuning Coding Agents via Implicit Preference Distillation

May 20, 2026

While AI has made code implementation cheap, the onus of selection is still on the operator. And so, Silicon Valley has converged on the narrative that “taste is the new core skill” but how will this scale to the volume of candidate ideas your team can consider?

Discovery and selection are upstream of code generation. The discourse around “taste” fills the vacuum where automated selection would otherwise sit. Now the operator chooses the prompt and waits for the drafted implementation by a coding agent.

Currently, the backlog of candidate features grows faster than any human can evaluate through vibes and taste. What’s missing is the AI Product Manager to rank candidates against the learned development preferences, providing the shortlist to the AI coder to test.

Last week, we discussed learning to model developer preferences from the merge history of a code repository. Compared to the high-recall recommendation strategies of matching candidate ideas based on semantic similarities to the code, our preference model offers greater precision to predict what a team is likely to consider next.

Each PR against a deployed project provides a pairwise comparison between the feature branch preferred and the baseline at main. By fitting a Gaussian Process to the accumulated dispositions, we derived a project-specific utility model, which we used to rank semantically plausible candidate ideas before implementation.

To pressure test these ideas, I’ve applied our implicit preference learning pipeline, which learned development preferences from VQASynth, to synthesize training data samples over dozens of candidate features based on newly published arXiv methods. With this dataset, I’ve trained a 2B coding model to rank candidates by training via Direct Preference Optimization using LoRA.

After 1 epoch, the 2B model reaches 87.4% reward accuracy on the held-out eval split against 92.3% on training. A 5pp gap with single-epoch LoRA training on ~0.5% trainable parameters is consistent with the model learning developer tastes rather than memorizing the training set, though a proper generalization test will come with the corpus-scale run on held-out repos.

Check out the dataset and model artifacts.

When the AI PM’s scoring is calibrated well enough that teams trust it to help in exploration, humans will be reviewing experiment results instead of building candidate features.

The natural extension of this idea is to expand our data synthesis pipeline to more code repositories. Using the GitHub APIs, we can seed our dataset with thousands of AI/ML repos and extract structured data from their merge histories, using this context to generate dozens of candidate methods for each repo, and ultimately, scoring each to generate training samples based on the learned preferences from these merge histories.

Building this capability into the weights, your agent can triage relevant ideas discovered with search tools to manage context to send to coding agents for implementation. The vision is development where humans review the evidence behind several promising candidate features before launching, instead of babysitting an AI coder.

Product management is orchestration: deciding which experiments are worth running, sequencing them against the team’s capacity, and routing results back into the next round of choices. AI/ML engineering has always strained this role because the experimental surface grows faster than any human can curate. An AI PM is the orchestration layer that operates at the cadence the work now demands.

Myx'd Results

Discussion about this post

Ready for more?