Coinjure: A Trading Agent Harness
for Prediction Markets

Coinjure mascot

March 2026

Haofei Yu   Yicheng Yang   Yuxiang Liu   Jiaxuan You
University of Illinois Urbana-Champaign
haofeiy2@illinois.edu   jiaxuan@illinois.edu
TL;DRCoinjure is a CLI-based harness for trading agents in prediction markets. It empowers agents to drive the entire strategy lifecycle purely by interacting with a command-line interface. By simply issuing CLI commands, an agent can autonomously discover cross-market relations, compile executable strategies, run large-scale backtests, and deploy to live execution. Using Coinjure, agents like Claude Code or Codex can discover over 100 backtest-positive strategies in a single hour—a capability we have validated by deploying to live trading and generating real profit on prediction market exchanges.

Demonstration

1min Introduction of Coinjure

Basic Functionality Demo of Coinjure

Claude Code + Coinjure Demo

Introduction

Quant Trading vs Agent-Native Trading
Figure 1. Trading is moving into an agent-native era. In algorithmic trading (left), humans design, validate, and manage strategies while algorithms execute. In agent-native trading (right), the LLM agent drives the full management cycle; humans only monitor.

Trading is entering an AI-native era. Modern quantitative infrastructure has largely automated low-level execution, yet the higher-order loop—spanning strategy discovery, capital allocation, and adaptive repositioning—remains stubbornly human-driven and intuition-based (Figure 1). As LLM agents become increasingly capable, fully automating this loop is now a plausible reality. This evolution raises a critical systems question: what scaffold does an LLM agent actually require to discover, validate, deploy, and manage trading strategies effectively?

Prediction markets are a natural benchmark for LLM agents. Prediction markets are a natural benchmark for LLM agents. Unlike traditional financial exchanges, where alpha is typically extracted through low-latency execution or microstructure optimization, prediction markets are fundamentally semantic and event-driven. Price discovery here relies not on pure numerical data, but on interpreting political shifts, breaking news, and evolving public narratives. Consequently, success in these markets depends heavily on social understanding, information synthesis, and probabilistic reasoning—domains where LLM agents uniquely excel. In this sense, prediction markets are not merely an alternative trading venue; they provide a dynamic environment where an agent must continuously interpret society, adapt to unstructured information, and autonomously self-evolve.

Existing trading systems are human-native, not agent-native. Most platforms today are still built around visual dashboards and UI workflows optimized for human perception. For an AI agent, however, a command-line interface (CLI) is the natural environment. Agents operate through text, programmatic commands, and structured outputs—not visual screens. A well-designed CLI exposes exactly the primitives an agent requires: market discovery, position management, and order execution, all formatted to be seamlessly composed within a reasoning loop. Therefore, the core challenge is not making graphical UIs more agent-friendly, but architecting systems that are natively accessible through an agent-first interface.

What AI-native trading needs is not better Claude skills, but a trading agent harness. There are two fundamental reasons for this. First, generic tool-calling skills do not make strategy discovery scalable. While they teach an agent how to invoke APIs, they cannot elevate the abstraction level of the underlying harness. If trading actions remain bound to raw, low-level endpoints, the agent squanders its context budget on execution boilerplate instead of strategic market reasoning, rendering large-scale strategy search slow, brittle, and inefficient. Second, even if discovery becomes scalable, the harder operational challenges emerge immediately afterward: strategies must be rigorously verified, deployed, monitored, updated, and eventually pruned. Once an agent generates a high volume of candidate strategies, the primary bottleneck is no longer prompting, but operating a full-lifecycle infrastructure around them. Ultimately, the missing piece is not a refined set of skills, but a comprehensive operational scaffold explicitly designed for autonomous trading.

The Coinjure Solution

Not all prediction market strategies benefit equally from an LLM agent. Directional event bets, market making, and statistical arbitrage are already well-served by conventional quantitative pipelines. Where an LLM agent possesses a unique, structural advantage is in processing thousands of free-text market descriptions to discover the underlying semantic and logical relations between them.

Cross-market relations can be discovered at massive scale by an LLM agent. As shown in Figure 2, Polymarket might list “Will Iran close the Strait of Hormuz before 2027?” while Kalshi lists “Will Iran close Strait of Hormuz before Jan 2027?”—nearly identical questions with independently moving prices. Similarly, mutually exclusive candidates for the same office represent probabilities that must logically sum to at most one. These relations come in dozens of typed varieties, each mapping to a well-defined arbitrage strategy. While this search space is combinatorially large and constantly shifting, cross-referencing thousands of unstructured text descriptions is exactly what LLMs do natively.

Coinjure provides a full-lifecycle scaffold to manage the massive influx of discovered strategies. When the agent surfaces thousands of candidate relations, Coinjure takes over the execution lifecycle. It automatically backtests each relation against historically resolved markets before committing capital, executes them in parallel via an asynchronous trading engine (supporting both paper and live execution), and retires strategies the moment their underlying markets resolve. As new markets open daily, stale relations are pruned without manual intervention. Ultimately, Coinjure translates the agent’s scalable discovery into scalable execution, backed by a human monitoring layer for oversight and emergency control.

Harness Overview

Discovery Backtest Paper Trade Live Deploy LLM Agent + market data Human Monitor observe · pause / resume · kill-switch discovered backtest passed paper ok eliminated live profitable marginal
Figure 3. The full strategy lifecycle. The LLM agent discovers candidate relations (grey), backtesting validates a subset (blue), paper trading eliminates quickly-losing strategies (red) and promotes survivors (teal), and live deployment tracks real performance (green/yellow)—all under human oversight.

This section details the architecture of Coinjure, which is divided into four primary components (Figure 3): Strategy Discovery, Backtesting, Trade Execution, and Human Monitoring.

1. Strategy Discovery

Relation–Strategy Mapping. Every exploitable opportunity stems from a relation between markets, and every relation type maps 1-to-1 to a strategy that trades it. The LLM agent analyzes free-text market descriptions to discover these relations; Coinjure automatically instantiates the corresponding strategy. Discovering a new relation immediately yields a ready-to-backtest strategy with no additional code.

Example: same_eventDirectArbStrategy. The agent finds that Polymarket’s “Will Iran close the Strait of Hormuz before 2027?” and Kalshi’s “Will Iran close Strait of Hormuz before Jan 2027?” are the same question on different exchanges. It labels this a same_event relation, and a DirectArbStrategy monitors the cross-platform spread and trades whenever it exceeds a threshold.

Example: temporalLeadLagStrategy. Not all relations are logical constraints. When one market consistently moves before another—e.g., a broad “Will crude oil hit $100?” market reacting before a narrower date-specific contract—the agent labels this a temporal relation, and a LeadLagStrategy trades the lagging market before it catches up.

Built-in Relations and Extensibility. Coinjure ships with 8 relation types, each paired with a pre-built strategy (Table 1). The framework is extensible: defining a new relation type and its strategy class is all that is needed to teach the harness a new category of opportunities.

Table 1. Built-in relation types and their corresponding strategies. Each relation maps 1-to-1 to a strategy class; the framework is extensible to new types.
Relation Constraint Strategy
same_event Identical market across platforms DirectArbStrategy
complementary Outcomes sum to 1 GroupArbStrategy
implication A ⇒ B price ordering ImplicationArbStrategy
exclusivity Mutually exclusive GroupArbStrategy
correlated Cointegrated prices CointSpreadStrategy
structural Monotonic price nesting StructuralArbStrategy
conditional Conditional probability bounds ConditionalArbStrategy
temporal Lead-lag information flow LeadLagStrategy

2. Strategy Backtesting

Deterministic Validation Pipeline. Before capital is committed, every discovered relation is backtested. Because LLMs can hallucinate correlations, Coinjure enforces a strict verification boundary between semantic discovery and live trading. The backtester replays historical data through the same execution paths used in live trading, supporting both high-resolution order book snapshots and standard exchange price history APIs.

Microstructure Simulation. Fill rates are modeled with a Beta(5,1) distribution for realistic partial fills, configurable slippage accounts for price impact, and resting orders are constrained by a 5-minute TTL to prevent ghost positions.

Walk-Forward Robustness. For statistical strategies, the pipeline performs walk-forward validation with a 60/40 train-test split, resetting internal state between phases to guard against data leakage and overfitting.

3. Trade Execution

Agent-Native Execution Engine. Validated strategies enter an asynchronous execution engine that handles market interaction, state management, order routing, and event batching. The engine drains all pending events before each strategy evaluation to prevent stale-price decisions, freeing the LLM to focus on higher-order strategic adjustments.

Code-Path-Preserving Promotion. The same event loop drives both paper and live modes; only the execution module is swapped. Live trading uses fill-or-kill (FOK) orders with adaptive, rate-limit-aware retry (exponential backoff with jitter) for robust execution under volatile or throttled conditions.

Risk Management and Process Isolation. The harness forces a strategy into read-only mode after consecutive failures or drawdown breaches. Each engine instance runs as an independent OS process, allowing hundreds of strategies to execute in parallel without shared-state contention.

4. Human Monitoring

Decoupled Observability Layer. A real-time monitoring suite runs outside the critical execution path, letting human operators audit the agent’s decisions and monitor aggregate portfolio exposure without adding latency to the trading engine.

Dynamic Intervention Controls. Operators can pause, tune, or retire individual strategies in real time. A global kill-switch and hot-swap capabilities allow strategy logic to be replaced without restarting the engine, ensuring human control at any granularity.

Experiments

We evaluate Coinjure’s pipeline on live Polymarket and Kalshi prediction markets.

1. Strategy Discovery

In a single one-hour session, the agent autonomously discovered 250+ market relations across all 7 relation types from live Polymarket and Kalshi markets. Below we show one example for each type.

Implication. If event A happens, event B must also happen (A ⇒ B), so price(A) ≤ price(B). The agent chains temporal deadlines into implication ladders.

implication Iran Strait of Hormuz — Deadline Chain

Same Event / Cross-Platform. The same question listed on both Polymarket and Kalshi. Persistent price gaps are a direct arbitrage.

same_event Iran Closes the Strait of Hormuz

Complementary. Markets whose prices must sum to 1 because they partition all outcomes of a single event.

complementary US Forces Enter Iran — Time Buckets

Exclusivity. Mutually exclusive outcomes—at most one can resolve YES, so sum(prices) ≤ 1.

exclusivity Next President of Vietnam

Structural. A group of markets with a monotonic price-nesting constraint: higher strike targets must have lower probabilities.

structural Crude Oil (CL) Price Targets — 10 Markets

Correlated. Two semantically related markets whose prices co-move due to a shared underlying driver, suitable for mean-reversion spread trading.

correlated Iran War Outcomes — Ceasefire & Regime Change

Conditional. Conditional probability bounds: P(A | B) constrains the joint pricing of related markets.

conditional EU Country Strikes Iran — Subset Bound

Importantly, the discovery process is cumulative. The 250+ relations above were actively searched and identified by the LLM agent within a single one-hour session. As the agent continues to run over longer time horizons, it accumulates an ever-growing pool of validated relations ready for backtesting and deployment—new markets appear daily, and each session builds on prior discoveries. Scaling to thousands or 10,000+ relations is simply a matter of letting the agent run longer and across more exchanges.

2. Backtesting

All discovered relations are automatically equipped with their corresponding arbitrage strategy and backtested against historical Polymarket order book snapshots. Around half of the discovered relations passed the backtest gate with positive PnL, confirming that a meaningful fraction of the structural mispricings identified by the agent are genuinely tradeable (Figure 4).

Backtest PnL by Market Relation Strategy Ranked high → low · replayed on historical order book snapshots −$40 −$20 $0 +$20 +$40 Oil CL Strike Ladder structural +$45.80 Vietnam President exclusivity +$34.20 Hormuz Deadline Chain implication +$28.60 Hormuz Cross-Platform same_event +$21.40 EU Strike Iran Subset conditional +$16.50 US Enter Iran Buckets complementary +$11.30 Iran Ceasefire / Regime correlated +$5.80 BTC ETF / BTC Price correlated −$3.20 Natural Gas Targets structural −$8.70 Fed Rate Cut Buckets complementary −$14.50 Taiwan Invasion Chain implication −$19.80 NATO Strike Iran conditional −$25.60 Ukraine Ceasefire X-Plat same_event −$31.40 Oil / Gold Lead-Lag temporal −$38.90
Figure 4. Backtest results for 14 representative market relation strategies. Around half pass the PnL gate, confirming that discovered mispricings are genuinely tradeable.

3. Live Trading

To validate end-to-end performance, we funded a Kalshi account with limited real capital and let Coinjure autonomously allocate positions. The harness automatically sized each position to roughly $20, identified opportunities, and executed trades without manual intervention. Within a short initial period the portfolio reached $53.97 ($22.08 in open positions, $31.89 cash), reflecting a gain of $3.40 (6.72%). These are early, short-horizon results and do not constitute a claim of long-term performance; we present them solely as evidence that the pipeline can identify and execute real opportunities end-to-end.

Figure 5 highlights one representative trade where the agent identified and executed a complementary-set arbitrage.

Live Arbitrage: Fed Dissenting Votes Kalshi · Executed March 14, 2026 · Complementary outcome set Yes · 0 pays $20 $0.43 Yes · 1 ✓ pays $20 $4.80 Yes · 2 pays $20 $10.80 Yes · 3 pays $20 $2.00 Yes · 4 pays $22 $0.26 any 1 of 5 outcomes settles ≥ $20 guaranteed payout Total Cost: $18.29 +$1.71 ≥ $20.00 guaranteed payout → locked profit ≥ $1.71 (9.3% return) Settled: 1 dissent → Payout $20.24 → Net profit $1.95
Figure 5. The agent bought Yes contracts on every mutually exclusive outcome, locking in guaranteed profit before settlement. ✓ marks the resolved outcome.

As shown in Figure 5, at around 11:24 pm on March 14, 2026 the agent recognized that the five mutually exclusive outcomes of “How many dissenting votes at the next Fed meeting?” were collectively priced below their guaranteed settlement value. It then purchased Yes contracts across all outcomes in rapid succession—not simultaneously, but within a short time window—assembling a near-risk-free position: regardless of which outcome resolved, at least one contract would pay out, yielding a locked-in profit once the full basket was in place. This trade was triggered while the harness was running roughly 50 strategies concurrently—just one of many opportunities being monitored in parallel. With more strategies deployed, the harness can surface and execute a proportionally larger number of such opportunities.

Citation

If you find Coinjure useful in your research or work, please cite:

@software{coinjure2026,
  title   = {Coinjure: A Trading Agent Harness for Prediction Markets},
  author  = {Yu, Haofei and Yang, Yicheng and Liu, Yuxiang and You, Jiaxuan},
  year    = {2026},
  url     = {https://github.com/ulab-uiuc/coinjure},
  note    = {University of Illinois Urbana-Champaign}
}