Resolution Clarity Grade · Polymarket validation backtest

Do prediction markets resolve fairly?

Name: ClearMarket Resolution Clarity Grade: resolution-risk validation
Creator: ClearMarket
License: https://github.com/JDSource/clearmarket

A 7,166-contract Polymarket backtest.

Most prediction-market contracts on Polymarket and Kalshi resolve cleanly. Some don't. They end in a dispute, or settle against what actually happened. That is the resolution risk traders and funds care about. ClearMarket grades it before a market resolves, from the resolution rules alone, as A, B, or C. This page tests the grade against the public record: do the contracts it flags as unclear actually get disputed?

Key finding. Across 7,166 Polymarket markets with public dispute records, contracts ClearMarket rated C (the lowest of its three Resolution Clarity Grades) were formally disputed at 20 times the rate of A- and B-rated contracts (1.59% vs 0.08%). 52 of the 55 disputes landed on C-rated contracts. The three challenges to A-rated contracts all failed: each market settled exactly as its rules specified. Reproducible at github.com/JDSource/clearmarket.

1 · THE FINDING

Across 7,166 Polymarket markets with public dispute records, 52 of the 55 markets that ended up in a resolution dispute had been rated C by ClearMarket, the lowest clarity tier. C-rated contracts dispute at 1.59%, 20 times the 0.08% rate of A- and B-rated contracts combined.

RCG	Resolution disputes	Markets	Dispute rate
A	3	2,587	0.12%
B	0	1,302	0.00%
C	52	3,277	1.59%

Disputes concentrate in the bottom tier, the way loan defaults concentrate in the lowest credit rating. ClearMarket assigns the grade from rules text alone and never reads the dispute history, so the result is not circular. A dispute here means a formal on-chain challenge to a proposed outcome, including challenges that fail. All three challenges to A-rated contracts failed: two markets settling on the official CME silver print and one on Colombia's national electoral registry were each challenged, re-proposed, and settled exactly as their rules specified. That is the grade's claim in action: not that no one will ever contest a clear market, but that a clearly-worded market with a committed source gives a challenge nothing to stand on.

The rate is measured on Polymarket because it is the only venue with a complete, publicly auditable dispute record: every challenge is posted on-chain, whether it succeeds or fails. Kalshi publishes no dispute feed, so its contested settlements become visible only when they reach a courtroom or the press. Section 5 covers what that public record shows.

2 · IS THE GRADE JUST FLAGGING BIG MARKETS?

A fair objection to the table above: maybe C-rated markets get disputed more simply because they are the big, controversial ones. Money attracts fights, and if vague wording also clusters in big markets, the grade could just be volume in disguise. So we compared markets only against markets of similar trading volume.

Volume band	RCG A	RCG B	RCG C
Lowest quartile	0 / 754	0 / 326	3 / 712 · 0.42%
Highest quartile	3 / 621 · 0.48%	0 / 191	39 / 979 · 3.98%

(B is 0% across all four quartiles; A's only challenges are the three failed ones, all in the top band. The middle two quartiles are omitted here.)

A dispute needs a vague rule and enough money to make the fight worthwhile. The highest-volume quartile held 812 A- and B-rated markets, as big and as watched as anything on the platform, and they were disputed at 0.37%; all three challenges failed against the committed source. C-rated markets of the same size disputed at 3.98%, more than ten times as often. The money was there either way. The wording is what predicted the fight.

3 · WHAT A C LOOKS LIKE

The three largest disputed markets, each rated C by ClearMarket before it resolved, then disputed exactly where its rules were weak:

Market	Venue	RCG	Volume	The defect
MicroStrategy sells any Bitcoin by May 31, 2026	Polymarket	C	$230M	No committed source of record, only "a consensus of credible reporting"; rules silent on event date vs. confirmation date
Netanyahu out by March 31	Polymarket	C	$104M	Subjective trigger (what counts as "out"), plus two sources with no tiebreak: the subject's own government and "a consensus of credible reporting"
US × Iran permanent peace deal by April 22	Polymarket	C	$26M	Settles on what the US and Iranian governments themselves announce; whether their language "clearly signals" a permanent end to hostilities is interpretation, with no independent source of record

The MicroStrategy market is the most instructive of the three. It shows why the grade is more than a score. It scored 59, a B on the weighted factors, but its rules named no real source of record, pointing only to "a consensus of credible reporting." A hard cap held it at C. Strategy did sell Bitcoin inside the window, between May 26 and 31, but disclosed it on June 1, one day after the deadline. With no source of record committed, Polymarket ruled the late confirmation did not qualify, and the market resolved NO against a sale that had actually happened. One trader reported a loss of $527,000. The cap caught what the score alone would have missed.

ClearMarket's parsed resolution architecture for the MicroStrategy market: source uncommitted, citation not provided, arbitration by Optimistic Oracle (UMA). — A live screenshot of the MicroStrategy event page on ClearMarket. The parsed resolution architecture flags the source as uncommitted and the citation as not provided; the verbatim rules show why, naming only "a consensus of credible reporting." That placeholder is what capped the grade at C.

ClearMarket's verbatim-rules panel for the MicroStrategy market: the resolution rules name only a consensus of credible reporting, and the platform source field is flagged empty with no committed source. — A live screenshot of the MicroStrategy event page on ClearMarket. The parsed resolution architecture flags the source as uncommitted and the citation as not provided; the verbatim rules show why, naming only "a consensus of credible reporting." That placeholder is what capped the grade at C.

4 · HOW THE GRADE WORKS

The grade is a weighted score across seven factors of the resolution rules text, banded A/B/C, then capped. It never reads the outcome. The methodology page carries the full specification: the scoring rules, how each factor is judged, and the complete cap list. In brief:

Factor	Weight	What it asks
Trigger objectivity	28	Is the deciding condition objective, or open to interpretation?
Contested reality	22	Is the underlying fact controlled or disputed by an interested party?
Source clarity	18	Is the source of record named and verifiable?
Arbiter incentive	12	Is resolution handled by a regulated, accountable arbiter, or a token-holder vote that whales can sway?
Source conflict	8	Are there conflicting sources with no rule for which wins?
Temporal precision	7	Is the deadline precisely defined?
Source mutability	5	Can the source be changed or edited after the fact?

The arbiter-incentive factor reflects a documented risk. A Wall Street Journal investigation (May 2026) found that in most disputed Polymarket markets a majority of the votes came from the ten largest UMA token wallets, and roughly one in five disputes had a voter who held a stake in the outcome they were judging.

Hard caps

Some defects are fatal. A market can score well on the seven factors and still be ceilinged. The grade is the worse of the score and any cap, so a single flaw can't be averaged away:

No source of record, or only placeholder language ("a consensus of credible reporting," no named authority) → C.
A source given only as an illustrative example ("for example, Reuters"), never committed to → B.
Conflicting sources with no rule for which wins → C.
A subjective trigger with no objective anchor → C.
An underlying fact controlled by an interested party → B.

5 · SCOPE & METHOD

Stated plainly, because the honesty is the point.

"Disputed" means formally challenged through Polymarket's on-chain UMA process, not necessarily overturned.
ClearMarket grades Kalshi markets too, and the grade applies identically to both venues. This dispute test is Polymarket-only because Polymarket's disputes are public and on-chain; Kalshi publishes no dispute feed, so a Kalshi challenge rate cannot be independently computed by anyone, at any standard. Its contested settlements reach the public record only through courts, regulators, and the press.
Where Kalshi disputes do surface publicly, they show the same failure mode this grade measures. The largest to date: Kalshi's February 2026 market on whether Iran's Supreme Leader would leave office, roughly $54M in volume, settled at the last traded price under a death carveout that appeared in the CFTC-filed terms but not on the trading page most traders read. Kalshi refunded $2.2M in trades and fees, a class action is pending, and the episode forced a new exchange-wide settlement rule. A divergence between the filed resolution terms and what traders see is exactly the kind of defect a grade built on the rules text exists to flag before the money moves.
Reproducible. The grading model, dispute labels, and this analysis are open at github.com/JDSource/clearmarket.