Find the Bugs Your Tests Miss.
Before Your On-Call Rotation Does.
I’m Dr. Thomas Pani. I help distributed systems teams find the failure modes their tests were not designed to catch — using model-based adversarial testing, executable specifications, and formal verification.
Model-Based Adversarial Testing · AI-Generated Code Testing · Software Reliability Consulting
Find the Bugs Your Tests Miss.
Before Your On-Call Rotation Does.
I’m Dr. Thomas Pani. I help distributed systems teams find the failure modes their tests were not designed to catch — using model-based adversarial testing, executable specifications, and formal verification.
Model-Based Adversarial Testing · AI-Generated Code Testing · Software Reliability Consulting
What I Do
The answer is not always "do more testing." Sometimes it is "what are you testing against?" I help teams building production infrastructure find the failure modes their tests were not designed to catch — by writing executable protocol specifications and using them to drive adversarial tests directly against the implementation.
Maybe you have wired up property-based testing or deterministic simulation and are still not confident your protocol is correct. Maybe you are shipping AI-generated code faster than your test suite can keep pace. I work with distributed systems engineers, SREs, and QA architects who are ready to close that gap systematically.
Model-Based Adversarial Testing
End-to-end engagement: I write an executable specification of your protocol, use it to generate adversarial test scenarios via random and symbolic execution, and drive conformance testing against the implementation. If your team already has DST, fuzzers, or property-based testing in place, I can help you design a specification-based oracle that dramatically increases the signal from tools you already run.
Deliverables: the spec, a test harness, and a prioritized report of failures found.
AI-Generated Code Testing
LLM-assisted development ships protocol logic faster than review can keep up. I write executable specs that serve as precise oracles for generated code, invokable by agents as a verification tool at generation time. This catches the class of correctness failures that code review and unit tests routinely miss.
Deliverables: executable specs as oracles, generation-time verification harness.
Protocol Specification and Review
I write or review executable specs for critical components — distributed protocols, cloud-native architectures, payment infrastructure. Includes automated checks via invariant writing, simulation, and model checking. Useful as a pre-condition to full verification or as design documentation that doubles as a test asset.
Deliverables: executable specs, formally stated invariants, model checking report.
In-House Workshop
A hands-on session for your engineering team: frame the problem, introduce executable specifications as a practical tool (not a formal methods lecture), and spend the majority of time on guided coding. Teams leave with a working spec and a concrete technique they can apply immediately. Available remote or on-site.
Capabilities & Tooling
I choose tools that maximize signal: reproducible bugs, meaningful coverage, and verification of critical invariants.
Languages & Targets
Specification & Model Checking
Testing & Simulation
Why Executable Specs?
Here is how testing usually goes. You start with unit and integration tests. Then you add property-based testing, which is great at the unit level but does not compose naturally across several components. Then you try chaos and fault injection (Jepsen, Antithesis): it finds real bugs in running systems but tells you that something failed, not what the correct behavior should have been. Then you look at formal methods (TLA+, Lean) and discover the adoption ceiling is high: deep expressiveness, but most teams will not invest in learning a new language from scratch.
Executable specs sit between chaos testing and classical formal methods. They let you:
- Express the protocol's intended behavior, including non-determinism, as runnable code
- Replace informal sequence diagrams with executions of a coded spec
- Generate test cases (both happy and adversarial) via random and symbolic execution — not just hand-written scenarios
- Drive tests of the implementation from the spec, using the spec as a differential oracle
- Establish safety and liveness properties through model checking when you need the strongest guarantees
The result is a golden reference model that makes your existing test infrastructure smarter and catches the class of failures that standard fuzzing and property-based tests routinely miss.
A note on AI-generated code
LLMs produce plausible-looking protocol code at speed. They are not good at reasoning about concurrent interleavings, failure atomicity, or protocol invariants. Executable specs are one of the few techniques that can keep pace: they describe intent precisely enough to test generated code against, and they can be invoked by agents as a verification tool at generation time — turning the spec into a zero-degree-of-freedom oracle. They can also be written and checked faster than the code they govern.
How This Fits in the Testing Landscape
If you are evaluating your options for distributed systems testing, here is how the landscape looks and where executable specs fit in.
Jepsen (chaos and fault injection). If you want someone to find bugs in a running system by injecting faults, Kyle Kingsbury's Jepsen is the gold standard (Kafka, Redis, MongoDB, etcd). Jepsen's hand-crafted tests tell you that something failed. A spec tells you what the correct behavior should have been. Most teams benefit from both.
Antithesis (deterministic simulation testing). Deterministic simulation as a service, well-funded and gaining traction with exactly the same buyers. It helps you reproduce failures reliably. It does not tell you what correct looks like, and complex properties are hard to express.
Property-based testing. Hypothesis, fast-check. The right starting point at the unit level, and the lowest-friction option if your system is not yet distributed. Does not compose naturally across nodes or network failures.
TLA+ and classical formal methods. TLC, Apalache, Lean. The highest-expressiveness option. Most teams will not invest in learning TLA+ without tooling that lowers the barrier significantly. My work sits on top of this toolchain and makes it accessible without requiring your team to become TLA+ experts.
FizzBee. Python-like MBT targeting distributed systems teams. Worth looking at if Python syntax is a hard requirement. Uses a bespoke language without TLA+ semantics, so you lose interop with tools like Apalache or TLC.
Quint. Modern TypeScript-like syntax over TLA+ semantics, from Informal Systems. I was part of the original dev team. Strong theoretical foundation, but the toolchain is still maturing and there is no native test generation. A solid choice if staying close to TLA+ semantics matters and you have the patience to work around rough edges.
Formal verification consulting. Certora, Runtime Verification, Galois, Informal Systems. Right for high-assurance systems where you need full proof. High-touch and expensive. My engagements are faster, more targeted, and leave your team with tools they can continue using.
In-house simulation (TigerBeetle, FoundationDB). If your team has already built a simulation layer, you are ahead of most. I can help you get more out of what you have built by pairing it with a specification-based oracle.
Selected Work
A sample of engagements in distributed systems correctness and software reliability:Aztec Governance Protocol: Formal Verification
2025Massively parallel symbolic verification of 125 invariants across a multi-contract governance system.
Apalache: Symbolic Model Checker for TLA+ and Quint
2022–NowCore team member of Apalache, the symbolic model checker for TLA+ and Quint. Used to verify Ethereum and Tendermint consensus, L2 governance protocols, and Cosmos SDK/IBC.
GitHubQuint: Modern Language Tooling for TLA+
2022–2024Co-developed Quint from its early stages at Informal Systems. A modern language and tooling for writing formal TLA+ specifications in a readable, TypeScript-like syntax.
GitHubVerified Accountability in Ethereum 3SF
2024Formal modeling of Ethereum 3-slot finality (3SF) consensus, exhaustively verified accountable safety.
Solarkraft: Model-Based Runtime Monitoring
2024Developed a low-latency runtime monitor for Soroban smart contracts on Stellar blockchain. An application of model-based monitoring to production contract execution.
GitHubIndependent Security Audits
2022–NowIndependent security reviews on Cantina, Code4rena and Sherlock. Verification contests with Certora Prover.
Ready to get started? Limited Availability
Reach out and let's talk about finding the failure modes your system hides.