4 min read
When most people hear the word AI today, they picture a chatbot. They think of ChatGPT answering a question, Claude drafting a document, or a generative model producing an image from a text prompt. That version of AI is real, powerful, and receiving an extraordinary amount of investment and attention. It is also not the only kind of AI, and for many of the most important tasks in regulated industries, it may not even be the right kind.
There is an older, quieter, and considerably less glamorous form of AI that has been running inside insurance companies, hospitals and financial institutions for decades. It does not generate text, it does not hallucinate, and it does not require a GPU cluster to operate. It is called an expert system, and understanding how it differs from the generative AI dominating today's headlines is one of the most important conceptual questions a professional in a regulated business can get to grips with right now.
An expert system is a piece of software that encodes human expertise as explicit rules. If a claim notification arrives and the loss date is within the policy period, the coverage type matches the reported peril, and the sum insured is above the reserve threshold, then route to senior handler. That logic is written out, inspected, tested, and approved by a human before it runs. The same input will always produce the same output. There is no ambiguity, no creativity, and no surprise. This is what computer scientists mean when they call a system deterministic.
Large language models such as Claude, GPT-4o, or Gemini work entirely differently. They do not follow explicit rules. They have been trained on vast quantities of text, and through that training they have developed a statistical model of language and knowledge that allows them to generate plausible, contextually appropriate responses to almost any input. The key word is plausible. An LLM produces the most statistically likely continuation of a given prompt based on patterns in its training data. This makes it extraordinarily flexible and capable of handling questions and tasks that no human expert could have anticipated and pre-coded. It also makes its outputs inherently probabilistic. The same input, given to the same model at the same moment, can produce slightly different outputs. And on occasions, even high-performing models produce outputs that are confidently wrong: a phenomenon known as hallucination.
For most consumer applications, a low rate of hallucination is a tolerable nuisance. For a regulated business applying a sanction screening rule, calculating a policy excess, or determining whether a treatment falls within a clinical protocol, it is not. This is the core tension that every regulated organisation deploying AI needs to resolve.
The financial services and insurance sectors have been running rule-based AI since the 1980s. MYCIN, one of the earliest expert systems, was developed at Stanford in the 1970s to diagnose bacterial infections and recommend antibiotics. Its descendants are embedded in credit-scoring engines, claims-triage systems, clinical decision support tools, and underwriting workbenches across every major regulated market.
These systems have endured not because the industry is conservative, but because they solve a specific and very real problem: auditability. When the FCA or the Prudential Regulation Authority asks a firm why a particular decision was made, the firm needs an answer. A rule-based system can provide one instantly: the decision was made because rule 47 fired, which requires that any claim with a quantum above £50,000 and a liability code in set X is referred to the specialist team. That answer is complete, traceable, and defensible. A large language model cannot, in general, provide an equivalent explanation. Its internal reasoning is distributed across billions of parameters and is not interpretable in any straightforward way.
In healthcare, clinical decision support systems such as those deployed in NHS Trusts use rule-based logic to flag drug interactions, dosing errors, and contraindications. These systems run thousands of checks per patient encounter, consume negligible computing resources compared to a generative model, and produce outputs that clinicians can verify against the rules that generated them. The performance benchmark for a clinical alert is not creativity or fluency. It is accuracy and auditability that rule-based systems consistently deliver for narrow, well-defined tasks.
The energy profile of these systems is also a material advantage that is increasingly relevant. A rule-based expert system evaluating an insurance claim or a medication order consumes energy in the order of fractions of a milliwatt-hour per transaction, well under 0.001 Wh by most estimates. A lightweight large language model query using a model such as Claude Haiku or Gemini Flash consumes approximately 0.3 Wh. A frontier model query using GPT-4o or an equivalent consumes around 0.4 Wh. And frontier reasoning models such as OpenAI's o3 or DeepSeek-R1, designed for complex multi-step problem-solving, consume over 33 Wh per query, according to research published in 2025 on arXiv (Gutierrez et al., 2025). That is a difference of four to five orders of magnitude between the most energy-efficient rule-based approach and the most energy-intensive frontier AI. For a regulated business processing millions of automated transactions per month and subject to ESG reporting obligations, that is not a marginal consideration.
The distinction between expert systems and large language models is not really a question of which is more advanced. It is a question of what each is designed to do well, and more importantly, what regulated businesses are actually responsible for delivering.
Expert systems sit at the high-value end of the operational spectrum. When an insurer calculates an excess, applies a coverage condition, or determines whether a treatment falls within a prior authorisation protocol, the outcome is not a suggestion. It is a decision with legal, financial, and regulatory consequences. The customer receives a concrete outcome: a payment, a denial, an approval, a referral. The business is accountable for that outcome, and regulators expect to be able to audit the reasoning behind it. This is exactly the territory where rule-based deterministic logic excels. It is precise, consistent, traceable, and it executes a process rather than approximating one. Speed and accuracy at this layer directly affect the customer experience and the business's exposure. Getting it right is not optional.
Large language models occupy a different register entirely. The requests they handle well are those that do not require a decision: a customer asking what their policy covers, a claimant asking what happens next, a patient asking whether a referral has been received. These are communication tasks, not process tasks. The model produces a word answer, a summary, a pointer in the right direction. It is not giving advice in any regulated sense. It is not executing a transaction. It is not binding, a policy, a settlement, or an approval of treatment. It is answering a question in natural language, and doing so at a volume and a consistency that no human team could match for the same cost.
The critical distinction, and one that is often glossed over in AI strategy discussions, is that LLMs rarely execute processes in regulated contexts, and for good reason. A language model that drafts a response or summarises a document operates at the periphery of the value chain, where tolerance for imprecision is higher, and the consequences of an error are recoverable. A system that triggers a payment, applies a policy exclusion, or issues a clinical recommendation is operating at its core, where precision is non-negotiable, and the consequences of an error are not. Conflating the two, or assuming that because a model handles the former well it can be trusted with the latter, is one of the more common and more costly mistakes in regulated AI deployment.
The practical implication is straightforward. Customer service queries, free-text triage, document summarisation, and first-response communication are genuine and growing use cases for generative AI in regulated businesses. They reduce handling time, improve accessibility, and free up skilled people for the judgement-intensive work that actually requires them. But the moment a process needs to produce an outcome that the business stands behind, a deterministic system should drive it, with the language model, if present at all, playing a supporting role in how that outcome is communicated rather than in how it is reached.