Find the Bugs Your Tests Miss.
Before Your On-Call Rotation Does.
I’m Dr. Thomas Pani. I help teams building complex software find the failures their tests were not designed to catch.
Find the Bugs Your Tests Miss.
Before Your On-Call Rotation Does.
I’m Dr. Thomas Pani. I help teams building complex software find the failures their tests were not designed to catch.
Who This Is For
-
1
You have fuzzers, property-based tests, or deterministic simulation in place. But you are still not confident your oracle is catching the right failures.
-
2
You are shipping AI-generated infrastructure or protocol code. And your test suite was not built to tell you whether it is actually correct.
-
3
You looked at TLA+ or formal methods and decided the learning curve was too steep. But you still want something more rigorous.
Typical Teams
Distributed systems engineers, SREs, QA architects, and infrastructure teams that already care about reliability, but need a tighter link between their system and expected behavior.
What I Do
The answer is not always "do more testing". Usually it is "testing against the right thing".
Model-Based Adversarial Testing uses executable specs and uses them to generate tougher tests and more meaningful checks.
Grounded in Practice
This is the same general pattern used inside AWS, Oracle, and Microsoft: capture intended behavior in a spec, then use it as a reference model for tests and conformance checks.
Works With Existing Testing
This does not replace fuzzers, chaos testing, or simulation. It makes those tools more useful by giving them a stronger oracle.
AI-Generated Code Testing
Executable specs act as precise oracles for LLM-generated code during generation or review. They catch correctness failures humans and agents routinely miss.
Use when: protocol logic is shipping faster than human review can keep up.
Model-Based Adversarial Testing
I write an executable spec of your protocol or system, generate adversarial scenarios from it, and test the implementation against that model.
Deliverables: executable spec, test harness, and a prioritized report of failures found.
Protocol Specification and Review
I write or review specs for distributed protocols, cloud infrastructure, and payment systems, with invariants, simulation, or model checking where needed.
Useful as: executable design model that doubles as a test asset, or a precondition to deeper verification.
In-House Workshop
A hands-on session on executable specs, guided coding, and an adversarial testing workflow your team can keep using after the workshop.
Format: remote or on-site, scoped to your system and failure modes.
Why Executable Specs?
Integration tests and property-based testing get you most of the way. They stop short of failure modes that require reasoning about concurrent state, message ordering, and partial failures.
Executable specs close that gap.
This is where executable specs sit: between chaos testing and classical formal methods, turning intended behavior into runnable code and usable test oracles.
-
01
Runnable intent
Express intended behavior, including non-determinism, as runnable code.
-
02
Concrete scenarios
Replace informal sequence diagrams with spec executions you can inspect and test.
-
03
Adversarial generation
Generate happy-path and adversarial scenarios via random and symbolic execution.
-
04
Spec as oracle
Drive differential and conformance tests directly from the specification.
-
05
Stronger guarantees
Establish safety and liveness properties through model checking when you need stronger guarantees.
A Note on AI-Generated Code
Fast
LLMs produce plausible-looking code at speed.
Looks right, can still fail
They routinely miss interleavings, race conditions, atomicity assumptions, and protocol invariants.
Executable specs fix it
Executable specs give AI-generated code a tight oracle during generation and review.
How This Fits in the Landscape Show comparison
Executable specs give the rest of the stack something it often lacks: an explicit model of correct behavior. They are most useful where other techniques still struggle to define what correct behavior should be.
Jepsen
Finds real failures in running systems. A spec tells you what should have happened.
Antithesis
Improves reproducibility. Specs provide the missing correctness oracle.
Property-Based Testing
Great low-friction starting point. Harder to scale across distributed behavior and failures.
TLA+ and Classical Formal Methods
Highest expressiveness, highest barrier. I help teams use that toolchain without becoming formal methods specialists.
FizzBee and Quint
Both are viable spec-first approaches. The right choice depends on toolchain and rigor needs.
In-House Simulation
If you already simulate, you are ahead of most teams. A spec-based oracle usually increases signal immediately.
Selected Work
Recent examples of executable specs, model checking, and reliability work.
Aztec Governance Protocol: Formal Verification
2025Massively parallel symbolic verification of 125 invariants across a multi-contract governance system.
Apalache: Symbolic Model Checker for TLA+ and Quint
2022–NowCore team member of Apalache, used to verify Ethereum and Tendermint consensus, L2 governance protocols, and Cosmos SDK/IBC.
Quint: Modern Language Tooling for TLA+
2022–2024Co-developed Quint from its early stages at Informal Systems, bringing a more readable TypeScript-like syntax to TLA+ style specifications.
Verified Accountability in Ethereum 3SF
2024Formal modeling of Ethereum 3-slot finality (3SF) consensus, exhaustively verified accountable safety.
Solarkraft: Model-Based Runtime Monitoring
2024Developed a low-latency runtime monitor for Soroban smart contracts on Stellar blockchain, applying model-based monitoring to production contract execution.
Independent Security Audits
2022–NowIndependent security reviews on Cantina, Code4rena, and Sherlock, including verification contests with Certora Prover.
Get in Touch Limited Availability
If you need stronger evidence that a complex system behaves correctly, reach out.
- Consulting engagements and workshops, remote or on-site.
- I typically respond within one business day.
- A first call is 30 minutes, no preparation needed, and there is no commitment.