Coinjure: A Trading Agent Harness for Prediction Markets

Demonstration

1min Introduction of Coinjure

Basic Functionality Demo of Coinjure

Claude Code + Coinjure Demo

Introduction

Figure 1. Trading is moving into an agent-native era. In algorithmic trading (left), humans design, validate, and manage strategies while algorithms execute. In agent-native trading (right), the LLM agent drives the full management cycle; humans only monitor.

Trading is entering an AI-native era. Modern quantitative infrastructure has largely automated low-level execution, yet the higher-order loop—spanning strategy discovery, capital allocation, and adaptive repositioning—remains stubbornly human-driven and intuition-based (Figure 1). As LLM agents become increasingly capable, fully automating this loop is now a plausible reality. This evolution raises a critical systems question: what scaffold does an LLM agent actually require to discover, validate, deploy, and manage trading strategies effectively?

Prediction markets are a natural benchmark for LLM agents. Prediction markets are a natural benchmark for LLM agents. Unlike traditional financial exchanges, where alpha is typically extracted through low-latency execution or microstructure optimization, prediction markets are fundamentally semantic and event-driven. Price discovery here relies not on pure numerical data, but on interpreting political shifts, breaking news, and evolving public narratives. Consequently, success in these markets depends heavily on social understanding, information synthesis, and probabilistic reasoning—domains where LLM agents uniquely excel. In this sense, prediction markets are not merely an alternative trading venue; they provide a dynamic environment where an agent must continuously interpret society, adapt to unstructured information, and autonomously self-evolve.

Existing trading systems are human-native, not agent-native. Most platforms today are still built around visual dashboards and UI workflows optimized for human perception. For an AI agent, however, a command-line interface (CLI) is the natural environment. Agents operate through text, programmatic commands, and structured outputs—not visual screens. A well-designed CLI exposes exactly the primitives an agent requires: market discovery, position management, and order execution, all formatted to be seamlessly composed within a reasoning loop. Therefore, the core challenge is not making graphical UIs more agent-friendly, but architecting systems that are natively accessible through an agent-first interface.

What AI-native trading needs is not better Claude skills, but a trading agent harness. There are two fundamental reasons for this. First, generic tool-calling skills do not make strategy discovery scalable. While they teach an agent how to invoke APIs, they cannot elevate the abstraction level of the underlying harness. If trading actions remain bound to raw, low-level endpoints, the agent squanders its context budget on execution boilerplate instead of strategic market reasoning, rendering large-scale strategy search slow, brittle, and inefficient. Second, even if discovery becomes scalable, the harder operational challenges emerge immediately afterward: strategies must be rigorously verified, deployed, monitored, updated, and eventually pruned. Once an agent generates a high volume of candidate strategies, the primary bottleneck is no longer prompting, but operating a full-lifecycle infrastructure around them. Ultimately, the missing piece is not a refined set of skills, but a comprehensive operational scaffold explicitly designed for autonomous trading.

The Coinjure Solution

Not all prediction market strategies benefit equally from an LLM agent. Directional event bets, market making, and statistical arbitrage are already well-served by conventional quantitative pipelines. Where an LLM agent possesses a unique, structural advantage is in processing thousands of free-text market descriptions to discover the underlying semantic and logical relations between them.

Cross-market relations can be discovered at massive scale by an LLM agent. As shown in Figure 2, Polymarket might list “Will Iran close the Strait of Hormuz before 2027?” while Kalshi lists “Will Iran close Strait of Hormuz before Jan 2027?”—nearly identical questions with independently moving prices. Similarly, mutually exclusive candidates for the same office represent probabilities that must logically sum to at most one. These relations come in dozens of typed varieties, each mapping to a well-defined arbitrage strategy. While this search space is combinatorially large and constantly shifting, cross-referencing thousands of unstructured text descriptions is exactly what LLMs do natively.

Coinjure provides a full-lifecycle scaffold to manage the massive influx of discovered strategies. When the agent surfaces thousands of candidate relations, Coinjure takes over the execution lifecycle. It automatically backtests each relation against historically resolved markets before committing capital, executes them in parallel via an asynchronous trading engine (supporting both paper and live execution), and retires strategies the moment their underlying markets resolve. As new markets open daily, stale relations are pruned without manual intervention. Ultimately, Coinjure translates the agent’s scalable discovery into scalable execution, backed by a human monitoring layer for oversight and emergency control.

Harness Overview

Figure 3. The full strategy lifecycle. The LLM agent discovers candidate relations (grey), backtesting validates a subset (blue), paper trading eliminates quickly-losing strategies (red) and promotes survivors (teal), and live deployment tracks real performance (green/yellow)—all under human oversight.

This section details the architecture of Coinjure, which is divided into four primary components (Figure 3): Strategy Discovery, Backtesting, Trade Execution, and Human Monitoring.

1. Strategy Discovery

Relation–Strategy Mapping. Every exploitable opportunity stems from a relation between markets, and every relation type maps 1-to-1 to a strategy that trades it. The LLM agent analyzes free-text market descriptions to discover these relations; Coinjure automatically instantiates the corresponding strategy. Discovering a new relation immediately yields a ready-to-backtest strategy with no additional code.

Example: same_event → DirectArbStrategy. The agent finds that Polymarket’s “Will Iran close the Strait of Hormuz before 2027?” and Kalshi’s “Will Iran close Strait of Hormuz before Jan 2027?” are the same question on different exchanges. It labels this a same_event relation, and a DirectArbStrategy monitors the cross-platform spread and trades whenever it exceeds a threshold.

Example: temporal → LeadLagStrategy. Not all relations are logical constraints. When one market consistently moves before another—e.g., a broad “Will crude oil hit $100?” market reacting before a narrower date-specific contract—the agent labels this a temporal relation, and a LeadLagStrategy trades the lagging market before it catches up.

Built-in Relations and Extensibility. Coinjure ships with 8 relation types, each paired with a pre-built strategy (Table 1). The framework is extensible: defining a new relation type and its strategy class is all that is needed to teach the harness a new category of opportunities.

**Table 1.** Built-in relation types and their corresponding strategies. Each relation maps 1-to-1 to a strategy class; the framework is extensible to new types.
Relation	Constraint	Strategy
same_event	Identical market across platforms	`DirectArbStrategy`
complementary	Outcomes sum to 1	`GroupArbStrategy`
implication	A ⇒ B price ordering	`ImplicationArbStrategy`
exclusivity	Mutually exclusive	`GroupArbStrategy`
correlated	Cointegrated prices	`CointSpreadStrategy`
structural	Monotonic price nesting	`StructuralArbStrategy`
conditional	Conditional probability bounds	`ConditionalArbStrategy`
temporal	Lead-lag information flow	`LeadLagStrategy`

2. Strategy Backtesting

Deterministic Validation Pipeline. Before capital is committed, every discovered relation is backtested. Because LLMs can hallucinate correlations, Coinjure enforces a strict verification boundary between semantic discovery and live trading. The backtester replays historical data through the same execution paths used in live trading, supporting both high-resolution order book snapshots and standard exchange price history APIs.

Microstructure Simulation. Fill rates are modeled with a Beta(5,1) distribution for realistic partial fills, configurable slippage accounts for price impact, and resting orders are constrained by a 5-minute TTL to prevent ghost positions.

Walk-Forward Robustness. For statistical strategies, the pipeline performs walk-forward validation with a 60/40 train-test split, resetting internal state between phases to guard against data leakage and overfitting.

3. Trade Execution

Agent-Native Execution Engine. Validated strategies enter an asynchronous execution engine that handles market interaction, state management, order routing, and event batching. The engine drains all pending events before each strategy evaluation to prevent stale-price decisions, freeing the LLM to focus on higher-order strategic adjustments.

Code-Path-Preserving Promotion. The same event loop drives both paper and live modes; only the execution module is swapped. Live trading uses fill-or-kill (FOK) orders with adaptive, rate-limit-aware retry (exponential backoff with jitter) for robust execution under volatile or throttled conditions.

Risk Management and Process Isolation. The harness forces a strategy into read-only mode after consecutive failures or drawdown breaches. Each engine instance runs as an independent OS process, allowing hundreds of strategies to execute in parallel without shared-state contention.

4. Human Monitoring

Decoupled Observability Layer. A real-time monitoring suite runs outside the critical execution path, letting human operators audit the agent’s decisions and monitor aggregate portfolio exposure without adding latency to the trading engine.

Dynamic Intervention Controls. Operators can pause, tune, or retire individual strategies in real time. A global kill-switch and hot-swap capabilities allow strategy logic to be replaced without restarting the engine, ensuring human control at any granularity.

Experiments

We evaluate Coinjure’s pipeline on live Polymarket and Kalshi prediction markets.

1. Strategy Discovery

In a single one-hour session, the agent autonomously discovered 250+ market relations across all 7 relation types from live Polymarket and Kalshi markets. Below we show one example for each type.

Implication. If event A happens, event B must also happen (A ⇒ B), so price(A) ≤ price(B). The agent chains temporal deadlines into implication ladders.

implication Iran Strait of Hormuz — Deadline Chain

Polymarket Will Iran close the Strait of Hormuz by March 31?
⇓ implies
Polymarket Will Iran close the Strait of Hormuz by June 30?
⇓ implies
Polymarket Will Iran close the Strait of Hormuz before 2027?

Constraint: P(Mar 31) ≤ P(Jun 30) ≤ P(before 2027) — any violation is arbitrage.

Same Event / Cross-Platform. The same question listed on both Polymarket and Kalshi. Persistent price gaps are a direct arbitrage.

same_event Iran Closes the Strait of Hormuz

Polymarket Will Iran close the Strait of Hormuz before 2027?
↔ same question
Kalshi Will Iran close Strait of Hormuz before Jan 2027?

Constraint: P_poly ≈ P_kalshi — DirectArbStrategy

Complementary. Markets whose prices must sum to 1 because they partition all outcomes of a single event.

complementary US Forces Enter Iran — Time Buckets

Polymarket US forces enter Iran by March 14?
Polymarket …by March 31?
Polymarket …by December 31?

Constraint: sum(prices) = 1 — GroupArbStrategy

Exclusivity. Mutually exclusive outcomes—at most one can resolve YES, so sum(prices) ≤ 1.

exclusivity Next President of Vietnam

Polymarket Will Tô Lâm be the next President?
Polymarket Will Trần Cẩm Tú be the next President?
Polymarket Will Trần Thanh Mẫn be the next President?

3 markets — Constraint: sum(prices) ≤ 1 — GroupArbStrategy

Structural. A group of markets with a monotonic price-nesting constraint: higher strike targets must have lower probabilities.

structural Crude Oil (CL) Price Targets — 10 Markets

Polymarket Will Crude Oil (CL) hit $90 by end of March?
Polymarket …$100 / $105 / $110 / $120 / $130 / $140 / $150 / $180 / $200?

Constraint: P($200) ≤ P($180) ≤ … ≤ P($90) — StructuralArbStrategy

Correlated. Two semantically related markets whose prices co-move due to a shared underlying driver, suitable for mean-reversion spread trading.

correlated Iran War Outcomes — Ceasefire & Regime Change

Polymarket US × Iran ceasefire by April 30?
↑↓ co-move
Polymarket Will the Iranian regime fall by June 30?

Both driven by the same geopolitical conflict — CointSpreadStrategy trades the spread.

Conditional. Conditional probability bounds: P(A | B) constrains the joint pricing of related markets.

conditional EU Country Strikes Iran — Subset Bound

Polymarket Will any E.U. country strike Iran by March 31?
≥ max of
Polymarket Will France strike Iran by March 31?
Polymarket Will UK strike Iran by March 31?
Polymarket Will Germany strike Iran by March 31?

Constraint: P(any EU) ≥ max(P(France), P(UK), P(Germany)) — ConditionalArbStrategy

Importantly, the discovery process is cumulative. The 250+ relations above were actively searched and identified by the LLM agent within a single one-hour session. As the agent continues to run over longer time horizons, it accumulates an ever-growing pool of validated relations ready for backtesting and deployment—new markets appear daily, and each session builds on prior discoveries. Scaling to thousands or 10,000+ relations is simply a matter of letting the agent run longer and across more exchanges.

2. Backtesting

All discovered relations are automatically equipped with their corresponding arbitrage strategy and backtested against historical Polymarket order book snapshots. Around half of the discovered relations passed the backtest gate with positive PnL, confirming that a meaningful fraction of the structural mispricings identified by the agent are genuinely tradeable (Figure 4).

Figure 4. Backtest results for 14 representative market relation strategies. Around half pass the PnL gate, confirming that discovered mispricings are genuinely tradeable.

3. Live Trading

To validate end-to-end performance, we funded a Kalshi account with limited real capital and let Coinjure autonomously allocate positions. The harness automatically sized each position to roughly $20, identified opportunities, and executed trades without manual intervention. Within a short initial period the portfolio reached $53.97 ($22.08 in open positions, $31.89 cash), reflecting a gain of $3.40 (6.72%). These are early, short-horizon results and do not constitute a claim of long-term performance; we present them solely as evidence that the pipeline can identify and execute real opportunities end-to-end.

Figure 5 highlights one representative trade where the agent identified and executed a complementary-set arbitrage.

Figure 5. The agent bought Yes contracts on every mutually exclusive outcome, locking in guaranteed profit before settlement. ✓ marks the resolved outcome.

As shown in Figure 5, at around 11:24 pm on March 14, 2026 the agent recognized that the five mutually exclusive outcomes of “How many dissenting votes at the next Fed meeting?” were collectively priced below their guaranteed settlement value. It then purchased Yes contracts across all outcomes in rapid succession—not simultaneously, but within a short time window—assembling a near-risk-free position: regardless of which outcome resolved, at least one contract would pay out, yielding a locked-in profit once the full basket was in place. This trade was triggered while the harness was running roughly 50 strategies concurrently—just one of many opportunities being monitored in parallel. With more strategies deployed, the harness can surface and execute a proportionally larger number of such opportunities.

Citation

If you find Coinjure useful in your research or work, please cite:

@software{coinjure2026,
  title   = {Coinjure: A Trading Agent Harness for Prediction Markets},
  author  = {Yu, Haofei and Yang, Yicheng and Liu, Yuxiang and You, Jiaxuan},
  year    = {2026},
  url     = {https://github.com/ulab-uiuc/coinjure},
  note    = {University of Illinois Urbana-Champaign}
}