The Scale
Mathematical proofs are necessary but not sufficient for engineering trust. The 8N+3 theorem proves that ZPL's tie-impossibility holds in theory. The AIN property proves bias neutralization holds asymptotically. But what does ZPL actually do across thousands of real configurations, millions of samples, and the full range of input conditions? That question required empirical verification at scale.
The verification study tested ZPL against three competing cellular automaton and voting systems: Conway's Game of Life, the Ising Model, and a standard Majority Vote system. Each system was evaluated on the same metric: mean absolute deviation from 0.5 across all configurations and bias conditions.
The Methodology
The study was designed to be as adversarial as possible. Rather than testing ZPL under ideal conditions, we deliberately varied every parameter that could introduce bias:
The Results
The headline number: ZPL achieved a mean absolute deviation of 0.019 from perfect neutrality across all 2.64 billion computations. The next-best system, the Ising Model, achieved 0.329 — more than 17 times worse. Majority Vote, the most common alternative, showed a deviation of 0.462 — essentially tracking the input bias directly.
| System | Mean Deviation | vs ZPL | Tie-Free? | Bias-Neutral? |
|---|---|---|---|---|
| ZPL (8N+3) | Baseline | Yes | Yes | |
| Ising Model | 17.3x worse | Partial | No | |
| Conway GoL | 21.2x worse | No | No | |
| Majority Vote | 24.3x worse | No | No |
Comparison with Other Systems
The performance gap is not marginal — it is structural. Conway's Game of Life and the Ising Model are not designed to neutralize bias; they are designed to evolve interesting patterns. Their probability outputs reflect the attractor states of their respective dynamics, which vary wildly based on initial conditions. Under high input bias, both systems essentially lock into biased attractor states and stay there.
Majority Vote performs worst because it has no neutralization mechanism at all. If 7 out of 9 agents are biased toward 1, the output is 1. The output deviation closely tracks the input bias, making it unsuitable for any application where fairness matters.
Key insight: The Ising Model at 0.329 might seem "somewhat reasonable" — but 0.329 mean deviation means the system is biased by a third of the full scale on average. For a loot system, this would mean your "1% legendary drop rate" could realistically be anywhere from 0.67% to 1.33% depending on context. For ZPL at 0.019, the same drop rate stays between 0.981% and 1.019%.
The Dataset
The complete verification dataset — all 86,016 configuration results, raw samples, and analysis scripts — is publicly available on Zenodo. The dataset includes:
Configuration parameters (grid size, layer depth, input bias), per-sample p_output values, aggregated statistics (mean, variance, percentiles), comparison data for all four systems, and the Python scripts used to run the verification.
Independent replication is encouraged. The methodology is fully documented and the scripts are self-contained. If you find any configuration where ZPL deviates from 0.5 by more than 0.05 under standard conditions, contact the research team — we maintain an active erratum process.
What the Numbers Mean
A deviation of 0.019 might seem like a small number, but its implications are significant. It means that across nearly every configuration and bias condition we could construct, ZPL stayed within 1.9 percentage points of perfect neutrality. For a system operating under adversarial conditions — biased inputs, extreme configurations, stressed entropy pools — this is a strong result.
More importantly, the deviation did not grow with input bias. Systems like Majority Vote show increasing deviation as input bias increases. ZPL's deviation remained flat across the entire 10%–90% input bias range, which is the empirical confirmation of the AIN property: the output is decoupled from the input.
The 2.64 billion computation study is the empirical foundation of ZPL's engineering trust. The mathematics tells you it should work. The data tells you it does work — at scale, across conditions, against fair competition.