We Predicted VAT Fraud at Company Formation - Using ‘Agentic AI’ patterns in 2018

The Problem

Tax fraud detection is traditionally reactive. You analyze filed returns, spot anomalies, investigate suspicious patterns. By then the fraud already happened and money is gone. What if you could predict fraud at company formation - day zero - before any transactions? That was the challenge: analyze formation signals to identify fraudulent intent before the fraud occurs. With explainability for compliance, reliability for production, and accuracy to justify intervention.

Problem was that there is no single factor that can be parameterized and the domain was often anecdotal based on employees’ subjective experiences.

The hype problem: then everyone wanted to use neural networks suffering from black box explainability.

The Architecture

I designed a hierarchical agentic system: Specialized Model Agents: Simple, interpretable models each focused on specific signals. Director history analysis. Address pattern detection. Industry code clustering. Network relationship mapping. Timing pattern recognition. Each agent ran independently, writing inferences to a shared knowledge graph.

The idea built upon Jakob von Uexküll's concept of Umwelt - how each organism perceives its own functional world. The models/agents were simple creatures, each with their own Umwelt. The graph synthesized these individual Umwelten into a unified structure from which the global perspective could be gained.

Agents

The architecture had two modes: synchronous (model answers a specific request) and asynchronous (answer written to graph, making it available to other agents). This blackboard pattern grew the graph at 30GB per year.

Data sources

All the while the graph was continually updated by external data sources merging everything into a holistic view of the domain (and creating a world of troubles in creating a unified bitemporal data model of 2 billions data points.. depending on how you chose to count the information richness of a highly connected property graph with multiple attributes).

This meant that models could be triggered from requests and from data updates. Something changed, what’s our opinion on that? What’s the combined opinion on that? What is the cascading network effect on those opinions?

Knowledge Graph: Captured what tabular data misses - relationships between directors, companies, addresses, and historical fraud patterns. Not ‘this director is suspicious,’ but ‘this director connected to these three previously fraudulent companies, using these addresses, in this industry pattern, with this timing.’

Cascading Updates: New data triggered agents. Agent outputs updated graph. Graph updates triggered other agents with richer context. It was a model to model principle. Iterative refinement created emergent intelligence no single model could achieve.

Synthesis Layer: Algorithms consumed vectorized graph features - the accumulated intelligence from cascading agent updates. It found complex formation patterns that predicted fraudulent intent.

Explainability: When the system flagged a formation, we could trace the reasoning through separate graph patterns back to specific agent inferences. Compliance-friendly, auditable, interpretable.

Simplicity: Having multiple specialized models enhanced simplicity, but also helped maintainability as simple things makes complexity easier to handle by slicing the problem into agreeable portions.

This ran on boring technology by design:

Infrastructure (Layer 1): Linux, Docker and K8s provided reproducible environments. When an agent updated the graph, we could recreate the exact computational context across development, staging, and production. No “it works in the demo” problems.

Immutable Models (Layer 2): Models in production never changed without explicit versioning. When predictions shifted, we could trace whether it was the model, the data, or the graph structure. This made cascading updates debuggable instead of mysterious.

Data as Graph (Layer 3): Postgres fed normalized data through Apache Airflow into the Neo4j graph - a bitemporal model of 2 billion data points. The graph wasn’t exotic technology; it was careful data modeling that captured relationships tabular schemas miss.

Observability (Layer 4): Separate telemetry for infrastructure, KPIs, and long-term memory. When the system flagged a formation, we could trace reasoning through graph patterns back to specific agent inferences. Explainability wasn’t an add-on; it was built into the architecture.

Measurement (Layer 5): Accuracy metrics, business impact, ethical compliance. The 80% accuracy number wasn’t marketing - it was measured against actual fraud prosecutions.

This ran in production at national scale. It worked.

Organizational Ownership: Who Gets Blamed When It’s Wrong

Each business unit owned their models - not the data science team, not the platform team, not IT. When a model flagged a company formation, the business unit decided whether to investigate, using their domain expertise to interpret the signals.

This solved the detached consultant problem. Business units couldn’t blame “the algorithm” when predictions were wrong. They owned the outcomes. This created feedback loops that made models better - business experts refined what signals mattered, data scientists encoded those refinements, the graph captured the accumulated intelligence.

Success was achieved not only by the brilliance of the data scientists, but also due to the close collaboration between domain specialists actively participating in model development. Most AI projects fail because nobody owns the outcome. Someone owns the technology, someone else owns the business process, and they coordinate through tickets. Here, ownership was unified at the point of decision.

Why This Pattern Works

Von Uexküll argued that organisms don't perceive "objective reality" - they perceive their own functional world (Umwelt) shaped by their sensory apparatus and needs. A director history agent sees the world as patterns of association. An address clustering agent sees spatial concentrations. Each agent has its own Umwelt - its specialized way of engaging with the domain.

The knowledge graph becomes the Umwelt - the unified world constructed from these specialized perspectives.

The knowledge graph synthesizes these individual Umwelten into a shared structure that captures relationships no single agent perceives - the unified world constructed from these specialized perspectives. Early agents provide coarse signals based on their limited view. The graph captures relationships those agents can’t individually perceive. Later agents refine predictions using network context that emerges from the accumulated intelligence.

This is compositional learning. Each new company formation improves the system. Each fraud case (detected or missed) enriches the graph. Each agent update makes other agents smarter - not because we retrained them, but because the shared Umwelt they operate in became richer.

This is what makes agentic architecture powerful: specialized models building on each other’s work through shared knowledge representation. The hard part isn’t the AI. It’s the architecture that lets you actually ship it.

What’s Different Now

The pattern works. What’s changed:
Then: Hard-coded orchestration. Fixed agent sequence. Manual graph update rules.
Now: Things can be done differently:
- LLMs can assist in orchestrating dynamically. ‘Given this formation pattern and current fraud landscape, which agents should run? How should we interpret their signals? What graph updates reveal emerging patterns?’
- graphRAG formalizes a guardrailed pattern in which contextual information is fed to the model
- vectorized search and vector spaces are a commodity
But the core architecture - specialized agents, cascading graph updates, compositional synthesis, explainable reasoning - that’s the same.

Modern agentic AI is this pattern with flexible LLM orchestration. We proved it works in production 7 years ago.

Is Your Agentic AI Actually Agentic?

If you can’t answer yes to these questions, you’re building interconnected services, not agentic systems:

Can agents write inferences to shared state? Not just return results - write to a persistent structure other agents consume.
Do updates cascade? When new data arrives, does it trigger agent refinement throughout the system?
Can you trace reasoning through the graph? Not just “model X said Y” - can you reconstruct why through the relationship structure?
Does the system get smarter compositionally? Not just from retraining individual models, but from agents building on each other’s accumulated intelligence?

We designed for this from the start. Most agentic AI projects discover these requirements after their demo stops working.

Why This Matters for other compliance domains

Banks face similar challenges: KYC at account opening, AML for new relationships, credit risk for first-time borrowers. You need to predict behavior before you have transaction history.

The pattern applies:
- Specialized models analyzing formation/onboarding signals
- Knowledge graphs capturing entity relationships and network patterns
- Cascading updates as new information arrives
- Compositional synthesis finding complex patterns
- Explainability for regulatory compliance

Layer LLM orchestration onto this for natural language queries, dynamic agent routing, and explanation synthesis. But the hard part - building the specialized models, knowledge graphs, and production architecture - that takes experience, not hype.

We built this for VAT fraud prediction. The same pattern works for financial crime prevention, credit risk assessment, and compliance screening. The pattern survives across use cases.

This isn’t theory. These numbers are public. We built the system, it went into production in 2017, and the following year we cracked the VAT nut, and it’s still running at national scale today. The architecture patterns - multi-agent orchestration, cascading graph updates, compositional synthesis - they work when you actually have to deliver.

Most discussions about agentic systems remain theoretical. We built one that had to work - where “wrong” meant either missing fraud or falsely accusing legitimate businesses. That’s why explainability wasn’t a nice-to-have. That’s why observability wasn’t optional. That’s why boring technology beat bleeding-edge hype.

When your system’s mistakes can destroy someone’s business or let criminals go free, you learn quickly: the pattern matters more than the technology. Get the architecture right, use tools you understand, ship incrementally, measure ruthlessly.

That’s how you go from 80% prototype to national-scale production. That’s the difference between agentic AI as marketing and agentic AI as engineering discipline.

If you wish to learn more You can visit this presentation:
Youtube video: Fraud Detection with Graphs at Danish Business Authority

And for Danish readers:
Erhvervsstyrelsen fanger svindlere med algoritmer (Version2 article)