Home · Detection Architecture

ACR Poker Cheating Detection: Architecture, Signals, Failure Modes

14 min read

By Raul Moriarty ·Poker Software Expert

A reverse-engineered map of what the Winning Poker Network security stack looks like from the outside — hand-history forensics (the layer that drove the 2015 $1.4M bot-ring bust), behavioural fingerprinting, cross-skin account graphs joining ACR with BlackChip, TruePoker and YaPoker, and the human review layer that signs everything off.

Summary

  • WPN operates a four-layer detection stack. No single layer is decisive on its own; signals accumulate over weeks under an account-specific risk score with a tunable false-positive budget.
  • Hand-history forensics is WPN's most publicly visible layer — the one that drove the 2015 takedown of a Russian-speaking bot ring (referred to in community accounts as the "KhanZ" group) with roughly $1.4M in funds clawed back across approximately thirty accounts.
  • Behavioural fingerprinting on the client (input-timing distribution, mouse-path curvature, action-confirmation latency, idle behaviour) is the cheapest layer and the one that catches naive bot implementations first.
  • The cross-skin account graph joins ACR with BlackChip Poker, TruePoker and YaPoker — multi-accounting across skins under a single fingerprint is the easiest possible catch and the highest-priority signal for regulatory reasons.
  • Pure-GTO output is paradoxically easier to flag at the play-pattern layer than a noisier strong-human strategy: distributional outliers stand out against the population baseline, not against a "what a human looks like" reference.
  • Anti-detection at this operator is best framed as adversarial classification (Dalvi 2004, Lowd & Meek 2005) — shaping the output distribution to sit inside the population envelope while preserving EV — not as a checklist of features.

What counts as cheating in WPN's terms

The category split matters because each prohibited behaviour has its own signal stack, false-positive budget, and consequence path. The WPN security team works against six broad categories explicitly named in the public terms of service and in subsequent operator statements.

Prohibited categories — operator priority and detection difficulty
CategoryOperator priorityDetection difficultyTypical signal
Collusion / chip dumpingHighest (regulatory exposure)MediumCross-skin account graph + suspicious hand sequences
Multi-accounting across skinsHighLow–MediumDevice fingerprint + KYC + crypto wallet join
Botting (automated play)High (public bust history)MediumHand-history forensics + behavioural fingerprint
Real-time assistance (RTA)Medium-HighHighStatistical play-pattern over volume
Predatory short-stacking / bumhuntingMedium (policy-driven)Low (table behaviour)Seating + buy-in pattern analytics
GhostingMedium (event-driven spikes around Venom MTTs)HighWinrate vs known-skill baseline + IP joins

Collusion sits at the top of the priority list because it is the category with the most direct regulatory exposure — a Curaçao operator handling US-facing crypto deposits cannot afford a published chip-dumping pattern. Botting and RTA are second, given WPN's public history of bust reports. External HUD enforcement is comparatively light at this operator because fixed screen names make HUD use widespread enough that aggressive enforcement would damage the legitimate winning-player population.

The four-layer detection model

The externally observable stack has four components. Additional internal layers — unobservable heuristics, AI-scored risk models, hidden signals — almost certainly exist; the four below are what can be inferred from public bust reports, the patterns of customer account actions, and the structure of WPN's published terms.

Layer 1: Hand-history forensics
The signature WPN layer. Per-account distributional analysis on VPIP, PFR, 3-bet by position, fold-to-cbet by board texture, bet-sizing histograms, river aggression, all-in equity at showdown. Heavy compute, runs offline on a regular cadence, produces a play-pattern outlier score that decays slowly and is the primary input to bot-related review queues.
Layer 2: Behavioural fingerprinting on the client
Client telemetry on input timing, mouse-path geometry, touch dwell where relevant, action-confirmation latency, idle behaviour between hands. ACR's desktop-primary client gives the operator a wide telemetry surface. Cheap to compute, runs continuously, feeds into a behavioural score per session. Bites naive bot implementations hardest.
Layer 3: Cross-skin account graph
Account graph joined by IP, device fingerprint, deposit method (including crypto wallet clustering across the public blockchain), KYC document, table co-occurrence, and action correlations within hands. Spans ACR, BlackChip Poker, TruePoker and YaPoker as a single fingerprint pool. Catches multi-accounting and chip dumping directly; botting falls out as a side-product when a farm runs under a single fingerprint across skins.
Layer 4: Human review
The final decision point. Reviewers use mathematical model output as input, then read hand history in detail, check chat behaviour, look at session start/stop patterns relative to the player's stated time zone, examine withdrawal patterns, and look for the small human errors a bot does not produce — a misclick, a chat outburst, a sit-out mid-session to take a phone call. Most botting bans are signed off here.

The four layers are weighted asynchronously. Layer 1 produces a slowly-evolving per-account risk score that the operator can let mature over weeks before acting. Layer 2 produces a high-frequency session score that mostly stays under threshold. Layer 3 is event-driven by graph changes (new account, KYC update, large deposit, withdrawal request). Layer 4 is the bottleneck — reviewer capacity is finite, and the queue is prioritised by combined risk score, expected revenue impact, and recent withdrawal activity.

Case study: the 2015 KhanZ bot-ring bust

The publicly best-documented WPN enforcement action is the 2015 takedown of a Russian-speaking bot ring referred to in community accounts as the "KhanZ" group. The investigation was first surfaced in community forum analysis of suspect accounts, then confirmed by WPN's then-CEO with a public statement: approximately thirty accounts identified, roughly $1.4 million in winnings confiscated and returned to affected opponents, the investigation extending over several months.

The interesting part for engineers is which layer caught them. According to subsequent operator statements and community analysis, the primary signal was Layer 1 — hand-history pattern matching across the suspect accounts revealed shared distributional fingerprints (bet sizings clustered on identical pot fractions, VPIP / PFR pairs sitting on solver mass with too-low variance, fold-to-3bet response curves nearly identical across accounts that should have been independent humans). Layer 3 then joined the accounts on shared deposit and device features. Layer 4 reviewers signed off the final action.

Three operational lessons fall out of the public record. First: the cycle from first quiet flag to confiscation ran for several months — the mathematical signal was present for weeks before any visible action. Second: the trigger that accelerated review was a withdrawal pattern across the suspect accounts, not the play itself. Third: the public communication served an enforcement-marketing function for the operator — the cost of pursuing thirty accounts was largely amortised across the deterrent effect on the broader bot population. WPN has used the same playbook in the 2019–2020 cleanups, though without dollar figures attached.

Signal weights and observable failure modes

The exact signal weights are confidential; the relative weighting can be inferred from the order in which accounts get caught and what triggers the catch. The pattern is consistent enough across observed bans to be useful both for engineers building systems against this stack and for defenders evaluating their own implementations.

Detection signals × observable weight × failure mode at WPN
SignalLayerRelative weightNaive failure mode
VPIP/PFR ratio at population mass with low varianceL1Very HighPure GTO baseline, no human-noise overlay
Bet sizing clustered on exact pot fractionsL1HighSolver output without sizing perturbation
Fold-to-3bet response curve identical across accountsL1Very HighShared engine across a bot farm
Winrate persistently outside skill-pool envelopeL1Very HighHot run, mid stakes, no manual sessions interleaved
Action-timing variance < populationL2HighConstant-latency action emission
Mouse-coordinate clustering on click targetsL2MediumPixel-perfect click on button centroid
Idle behaviour between hands too uniformL2MediumNo tab-switch, no chat, no occasional pause
Shared device fingerprint across ACR / BlackChip / TruePokerL3Very High (regulatory)Bot farm on one device across skins
Crypto wallet clustering on the chainL3HighSingle wallet feeding multiple accounts
Withdrawal pattern → big-bang on first cashoutL3+L4HighQuiet grind for 30 days, then large withdrawal
Chat behaviour: zero outgoing messages over 5k+ handsL4MediumBot never says "nh"
Sit-out behaviour: never sits out on bad tableL4MediumBot grinds whoever sits down

The pattern is consistent: the cheapest layers (L2 client telemetry, L3 graph events) catch the most casual implementations first, while the compute-heavy or human-heavy layers (L1 hand-history forensics, L4 human review) catch the more capable implementations after a longer lag. This is why a bot can run for months without visible action and then suddenly trigger a review — the mathematical signal accumulates faster than the operator's review-queue capacity processes it. The typical interval from first deployment to confirmed ban runs 2 to 9 months, with the median around 8 to 14 weeks for moderately careful implementations.

Action-timing fingerprints

The most discussed and worst-implemented signal in this stack. A naive bot emits actions at fixed intervals, or with uniform noise around a centroid. Both are catastrophic — the distributional shape is wrong before the mean and variance are even examined.

Real human action-timing distributions are log-normal-shaped with a long right tail, state-conditional, and meaningfully different across decision types. A snap-fold of obvious garbage takes 600–1200ms. A boundary river decision takes 5–30 seconds. A routine flop continuation-bet on a clean board takes 1.5–4 seconds. The distribution has a state-independent "distraction tail" — roughly 3% of actions spike into the 8–25 second range regardless of difficulty, because humans look away from the table. A bot whose timing distribution lacks any of these features is identifiable before any play-pattern analysis runs.

# Schematic: behaviourally-shaped action timing
# Conceptual, not the production implementation

def sample_action_delay(decision_difficulty, action_type, hand_state):
    """Return seconds-to-act drawn from a state-conditional log-normal."""
    # Difficulty in [0,1]: 0 = trivial fold, 1 = boundary call
    mu_base = {
        'fold_trivial':   math.log(0.9),
        'cbet_routine':   math.log(2.4),
        'check_routine':  math.log(1.6),
        'river_boundary': math.log(8.5),
        'all_in_decision':math.log(12.0),
    }[action_type]

    # Difficulty stretches mu logarithmically
    mu = mu_base + 0.7 * decision_difficulty

    # Sigma rises with difficulty — humans deliberate variably on hard spots
    sigma = 0.35 + 0.55 * decision_difficulty

    delay = random.lognormvariate(mu, sigma)

    # ~3% chance of distraction tail: 8–25s independent of difficulty
    if random.random() < 0.03:
        delay += random.uniform(8, 25)

    # Floor at a non-zero minimum; humans cannot react in < 250ms
    return max(0.25, delay)

The example is schematic. Production systems condition on more variables — stack depth, opponent action sequence, position, multiway vs heads-up, and a per-session "alertness" parameter that drifts down over long sessions to mimic fatigue. The correct framing is not "add noise" but "draw from a distribution whose shape matches the population, conditioned on state."

False-positive budget and review pipeline

The primary constraint on the entire stack is false-positive cost. WPN cannot afford to ban legitimate winning players in volume — each false positive produces a regulatory complaint, a chargeback, a public forum post, a churned customer. The detection system is therefore tuned conservatively at the automated layer: a high score does not produce an automatic action, it produces a queue placement.

The visible stages from outside, in order:

  1. Quiet flag. Account moves into a higher-scrutiny review bucket. No visible change to the player; telemetry continues, potentially with additional client-side instrumentation enabled.
  2. Soft restriction. Withdrawal limits drop, KYC re-verification is requested, bonus eligibility is quietly removed, rakeback drops to base. Some players notice and modify behaviour; most do not.
  3. Structured interview. Support requests "clarifying information" about play style, schedule, and software use. The interview is logged and answers are matched against the play-pattern model.
  4. Confiscation and closure. Winnings voided, balance held pending investigation, account closed. The investigation period runs weeks to months; on WPN with crypto cashouts, the operator typically distributes confiscated funds back to identifiable affected opponents as in the 2015 case.

The cycle from first quiet flag to confiscation typically runs 14 days to 9 months, anchored on review-queue capacity and triggering events. The single biggest accelerator is a large first withdrawal — the mathematical signal can be present for weeks before any human looks at the account, and the withdrawal is what moves it to the top of the queue.

Anti-detection as adversarial classification

The standard mistake among bot builders is to treat detection as a feature checklist — add latency noise, vary mouse coordinates, randomise schedule. This is the wrong frame. Detection is an adversarial classifier: the operator builds a model that distinguishes bot behaviour from human behaviour at the population level, and the bot's job is to produce a behaviour distribution the classifier cannot separate from the human distribution while preserving EV.

The formal literature on this dates to Dalvi, Domingos, Mausam, Sanghai & Verma (2004), Adversarial Classification, and Lowd & Meek (2005), Adversarial Learning. The setting is structurally identical: an attacker (here, the bot) chooses an action that maximises expected utility under a classifier whose decision boundary the attacker can probe but not fully observe. The modern adversarial-ML literature (Goodfellow et al. 2014 onward) extends this with neural-network classifiers, gradient-based attacks, and the certified-robustness lineage.

Three operational consequences fall out of this framing:

The decision boundary is non-stationary
Operators retrain their models as new bot generations appear. Behaviour that was undetectable in 2024 may be a clean signal by 2026 — the 2015 KhanZ patterns would be caught much faster against today's models than they were originally. Anti-detection work has a half-life.
Population baseline is the right reference, not "looking human"
The classifier separates the bot's distribution from the population distribution, not from an abstract "what a human looks like" template. If the NL50 6-max population on ACR has a specific bet-sizing histogram with a long tail of small overbet sizes, the bot's histogram must too. The goal is not "more human" — it is "indistinguishable from the population under the operator's classifier."
EV–detection tradeoff is the right optimisation target
Pure-GTO output maximises EV against fixed opponents. Behaviourally-shaped output gives up some EV in exchange for a lower detection score. The right optimum is not zero detection — it is the EV-maximising point under a budgeted detection probability over the account's expected lifetime.

This frame also explains an apparent paradox visible in the bust record: pure-GTO bots tend to get caught faster than less-optimal bots with overlaid human-population noise. The GTO bot wins more per hand but plays fewer hands before being identified; the noisier bot wins less per hand but plays many more hands before detection. Total realised EV across the account's lifetime is the metric that matters, not bb/100 in isolation.

Have a question? Talk to us

Adversarial classification in this domain, hand-history forensic countermeasures, behavioural shaping under EV constraints, the WPN detection topology from the operator side — questions on any of it land with the Poker Bot AI team.

Join the chat

References and related work

Selected sources on the above topics.

  • Brown & Sandholm, 2019. Superhuman AI for multiplayer poker. Science 365 (Pluribus). Reference result for 6-max NLH at superhuman level.
  • Moravčík et al., 2017. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356. arXiv:1701.01724.
  • Brown & Sandholm, 2017. Safe and nested subgame solving for imperfect-information games. NeurIPS. The Libratus core technique.
  • Dalvi, Domingos, Mausam, Sanghai & Verma, 2004. Adversarial Classification. KDD. Foundational paper on the adversarial-classifier framing applied here.
  • Lowd & Meek, 2005. Adversarial Learning. KDD. Probing the decision boundary of a deployed classifier.
  • Heinrich & Silver, 2016. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games. NIPS DRL workshop. arXiv:1603.01121.
  • WPN public statement, 2015. Bot-ring takedown announcement (approximately $1.4M returned, ~30 accounts). Operator communication archived in community forum threads.

The companion notes on this site cover the broader picture: why "ACR Poker hacks" do not exist and the homepage's overview of what we mean by "poker bot" in 2026. The FAQ answers specific implementation questions that come up in the chat.