Find the Bugs Your Tests Miss.

Before Your On-Call Rotation Does.

Dr. Thomas Pani

I’m Dr. Thomas Pani. I help distributed systems teams find the failure modes their tests were not designed to catch — using model-based adversarial testing, executable specifications, and formal verification.

Model-Based Adversarial Testing · AI-Generated Code Testing · Software Reliability Consulting

What I Do

The answer is not always "do more testing." Sometimes it is "what are you testing against?" I help teams building production infrastructure find the failure modes their tests were not designed to catch — by writing executable protocol specifications and using them to drive adversarial tests directly against the implementation.

Maybe you have wired up property-based testing or deterministic simulation and are still not confident your protocol is correct. Maybe you are shipping AI-generated code faster than your test suite can keep pace. I work with distributed systems engineers, SREs, and QA architects who are ready to close that gap systematically.

🔥 Flagship

Model-Based Adversarial Testing

End-to-end engagement: I write an executable specification of your protocol, use it to generate adversarial test scenarios via random and symbolic execution, and drive conformance testing against the implementation. If your team already has DST, fuzzers, or property-based testing in place, I can help you design a specification-based oracle that dramatically increases the signal from tools you already run.

Deliverables: the spec, a test harness, and a prioritized report of failures found.

⚡ Emerging

AI-Generated Code Testing

LLM-assisted development ships protocol logic faster than review can keep up. I write executable specs that serve as precise oracles for generated code, invokable by agents as a verification tool at generation time. This catches the class of correctness failures that code review and unit tests routinely miss.

Deliverables: executable specs as oracles, generation-time verification harness.

🌿 Foundation

Protocol Specification and Review

I write or review executable specs for critical components — distributed protocols, cloud-native architectures, payment infrastructure. Includes automated checks via invariant writing, simulation, and model checking. Useful as a pre-condition to full verification or as design documentation that doubles as a test asset.

Deliverables: executable specs, formally stated invariants, model checking report.

👑 Team Training

In-House Workshop

A hands-on session for your engineering team: frame the problem, introduce executable specifications as a practical tool (not a formal methods lecture), and spend the majority of time on guided coding. Teams leave with a working spec and a concrete technique they can apply immediately. Available remote or on-site.

→ Full workshop details

Capabilities & Tooling

I choose tools that maximize signal: reproducible bugs, meaningful coverage, and verification of critical invariants.

Languages & Targets

Rust Go Python TypeScript Solidity Cairo Lean

Specification & Model Checking

TLA+ / Quint Apalache Alloy Lean4 SMT: Z3 SMT: CVC5

Testing & Simulation

cargo-fuzz libFuzzer AFL Echidna Medusa Wake

Why Executable Specs?

Here is how testing usually goes. You start with unit and integration tests. Then you add property-based testing, which is great at the unit level but does not compose naturally across several components. Then you try chaos and fault injection (Jepsen, Antithesis): it finds real bugs in running systems but tells you that something failed, not what the correct behavior should have been. Then you look at formal methods (TLA+, Lean) and discover the adoption ceiling is high: deep expressiveness, but most teams will not invest in learning a new language from scratch.

Executable specs sit between chaos testing and classical formal methods. They let you:

The result is a golden reference model that makes your existing test infrastructure smarter and catches the class of failures that standard fuzzing and property-based tests routinely miss.

A note on AI-generated code

LLMs produce plausible-looking protocol code at speed. They are not good at reasoning about concurrent interleavings, failure atomicity, or protocol invariants. Executable specs are one of the few techniques that can keep pace: they describe intent precisely enough to test generated code against, and they can be invoked by agents as a verification tool at generation time — turning the spec into a zero-degree-of-freedom oracle. They can also be written and checked faster than the code they govern.

How This Fits in the Testing Landscape

If you are evaluating your options for distributed systems testing, here is how the landscape looks and where executable specs fit in.

Jepsen (chaos and fault injection). If you want someone to find bugs in a running system by injecting faults, Kyle Kingsbury's Jepsen is the gold standard (Kafka, Redis, MongoDB, etcd). Jepsen's hand-crafted tests tell you that something failed. A spec tells you what the correct behavior should have been. Most teams benefit from both.

Antithesis (deterministic simulation testing). Deterministic simulation as a service, well-funded and gaining traction with exactly the same buyers. It helps you reproduce failures reliably. It does not tell you what correct looks like, and complex properties are hard to express.

Property-based testing. Hypothesis, fast-check. The right starting point at the unit level, and the lowest-friction option if your system is not yet distributed. Does not compose naturally across nodes or network failures.

TLA+ and classical formal methods. TLC, Apalache, Lean. The highest-expressiveness option. Most teams will not invest in learning TLA+ without tooling that lowers the barrier significantly. My work sits on top of this toolchain and makes it accessible without requiring your team to become TLA+ experts.

FizzBee. Python-like MBT targeting distributed systems teams. Worth looking at if Python syntax is a hard requirement. Uses a bespoke language without TLA+ semantics, so you lose interop with tools like Apalache or TLC.

Quint. Modern TypeScript-like syntax over TLA+ semantics, from Informal Systems. I was part of the original dev team. Strong theoretical foundation, but the toolchain is still maturing and there is no native test generation. A solid choice if staying close to TLA+ semantics matters and you have the patience to work around rough edges.

Formal verification consulting. Certora, Runtime Verification, Galois, Informal Systems. Right for high-assurance systems where you need full proof. High-touch and expensive. My engagements are faster, more targeted, and leave your team with tools they can continue using.

In-house simulation (TigerBeetle, FoundationDB). If your team has already built a simulation layer, you are ahead of most. I can help you get more out of what you have built by pairing it with a specification-based oracle.

Selected Work

A sample of engagements in distributed systems correctness and software reliability:

Aztec Governance Protocol: Formal Verification

2025

Massively parallel symbolic verification of 125 invariants across a multi-contract governance system.

GitHub | PDF report | → writeup (recommended)
Aztec Solidity Quint Apalache Formal Verification

Apalache: Symbolic Model Checker for TLA+ and Quint

2022–Now

Core team member of Apalache, the symbolic model checker for TLA+ and Quint. Used to verify Ethereum and Tendermint consensus, L2 governance protocols, and Cosmos SDK/IBC.

GitHub
TLA+ Quint Formal Verification Cosmos Ethereum

Quint: Modern Language Tooling for TLA+

2022–2024

Co-developed Quint from its early stages at Informal Systems. A modern language and tooling for writing formal TLA+ specifications in a readable, TypeScript-like syntax.

GitHub
Quint TLA+ Simulation Language Tooling

Verified Accountability in Ethereum 3SF

2024

Formal modeling of Ethereum 3-slot finality (3SF) consensus, exhaustively verified accountable safety.

Tech report (arXiv) | GitHub
Ethereum Formal Verification TLA+ Alloy SMT Python

Solarkraft: Model-Based Runtime Monitoring

2024

Developed a low-latency runtime monitor for Soroban smart contracts on Stellar blockchain. An application of model-based monitoring to production contract execution.

GitHub
Stellar Soroban Rust TypeScript Go Runtime Monitoring

Independent Security Audits

2022–Now

Independent security reviews on Cantina, Code4rena and Sherlock. Verification contests with Certora Prover.

Cairo / StarkNet Soroban / Stellar Solidity / Ethereum

Ready to get started? Limited Availability

Reach out and let's talk about finding the failure modes your system hides.