Bad fits

Seven anti-patterns describe most of the work where an LLM is reliably the wrong tool. Each has a name. Each has a specific reason it fails. Each has a closely-related good-fit pattern that looks like the anti-pattern but is actually safe — and recognizing the difference is the whole game.

The same disclaimer applies to all seven: the failure mode isn’t that the model produces unusable output. It’s that the model produces plausible-looking output that is operationally wrong, and the wrongness is invisible until consequences land. That’s what makes these patterns dangerous; if they were obviously broken, nobody would build them.

The Sole Authority

The anti-pattern: the LLM makes a final operational decision with no human review and no deterministic check before the decision takes effect.

Why it fails: LLMs are probabilistic. The same input can produce different outputs across runs. The “correct” answer changes based on phrasing, context, and the model’s stochastic state. None of those properties are acceptable in a system where one wrong output costs money, breaks compliance, or affects customers irreversibly.

Examples of misuse:

Direct execution of purchase orders generated by an LLM with no buyer review.
LLM-driven price changes pushed straight to the production catalog.
Customer-account modifications (terms, credit limits, billing addresses) made on an LLM’s recommendation without human signoff.
Permanent record deletions (“AI determined this customer should be removed”).
Autonomous responses to regulatory filings.

The good-fit relative: the Drafter. Same LLM, same general work, but the human is in the loop and the workflow doesn’t proceed without explicit confirmation. The difference is structural, not procedural — the system cannot commit without the human, by design.

The mitigation question: what is the deterministic check I’m replacing with an LLM, and why? If the answer is “we never had a check,” that’s a different problem (your existing process has a gap that the LLM is masking, not solving). If the answer is “we had a check, the LLM is replacing it,” push back hard. The check exists for a reason, and the LLM almost certainly doesn’t know the reason.

The Optimizer

The anti-pattern: asking the LLM to solve a constraint-satisfaction or optimization problem with measurable optima.

Why it fails: LLMs are not solvers. They produce language about solutions, not solutions. They have no inherent ability to satisfy hard constraints, balance competing objectives, or guarantee feasibility. They can pattern-match toward answers that look like solutions, but those answers fail systematically on the parts of the problem that aren’t represented in their training data.

A model that can write a forecast cannot compute one. A model that can describe a replenishment strategy cannot execute one. A model that can talk fluently about scheduling cannot produce schedules that satisfy the actual constraints.

Examples of misuse:

Asking an LLM to generate a week’s worth of replenishment POs across a large item set.
Using an LLM to schedule production runs subject to capacity, sequencing, and changeover constraints.
LLM-driven vehicle routing for delivery fleets.
LLM-generated work-shift schedules with skill, availability, and labor-cost constraints.
LLM-driven inventory rebalancing across distribution centers.
LLM-generated annual budgets with categorical and inter-account constraints.

The good-fit relative: the Explainer. The constraint solver, MILP, or LP-based engine produces the actual schedule, plan, or PO; the LLM narrates why the solver chose what it chose. The output the user sees feels like the LLM did the work; the actual work was done by a deterministic engine designed for it.

The mitigation question: is there a known optimal answer, or a feasible / infeasible distinction, that I can verify? If yes, the LLM is the wrong layer for the work. Use a real solver — operations research has been solving these problems for sixty years, and the tools (CPLEX, Gurobi, OR-Tools, COIN-OR, OptaPlanner) are mature, fast, and correct. The LLM can sit around the solver — translating natural-language requests into solver inputs, or explaining solver outputs to humans — but not replace it.

The Reconciler

The anti-pattern: using an LLM to balance ledgers, match transactions, or ensure exact totals.

Why it fails: reconciliation requires bit-exact arithmetic, complete enumeration of records, and certainty about which records were and weren’t matched. LLMs do not provide any of these. A reconciliation done by an LLM may look correct on the visible cases and miss edge cases entirely — and the edge cases are exactly where the value of reconciliation lives.

Examples of misuse:

Matching invoices to POs to receipts (“three-way match”) via LLM.
Reconciling a bank statement to a GL.
Matching customer payments to open invoices when partial payments and credits are involved.
Producing month-end roll-up totals (“does this AR aging tie out to GL?”).
Inventory reconciliation between physical counts and book balances.

The good-fit relative: the Extractor. An LLM can extract line items from a vendor invoice into structured form; a deterministic system then matches the extracted line items to PO records and receipt records using exact rules. The LLM does the linguistic work, the deterministic system does the arithmetic.

The mitigation question: does the result need to balance, tie out, or sum exactly? If yes, the LLM is the wrong layer. SQL, RPG, COBOL, and Excel are all faster, more reliable, and auditable for this work. The LLM can help format the output, explain a discrepancy in human-readable form, or extract data from messy inputs into a structured form the deterministic system can reconcile. It cannot do the reconciliation itself.

The Long-Horizon Planner

The anti-pattern: asking the LLM to produce plans that span many state transitions, with combinatorial branching at each step.

Why it fails: the LLM has no way to evaluate the full state space, no memory of intermediate decisions across the plan beyond the context window, and no reliable way to backtrack when an early decision turns out to constrain a later one. The same compounding-error math that makes long agentic loops unreliable applies to long plans: each step that has to be right multiplies the failure probability.

Examples of misuse:

Generating a six-month production plan for a complex manufacturing line.
Producing a multi-year capacity expansion plan for a distribution network.
Writing the full week’s-worth of supplier orders accounting for lead times, MOQs, container packing, and supplier-specific pricing breaks.
Producing a multi-step migration plan for a large database where each step’s correctness depends on intermediate state.

The good-fit relative: the Drafter for short-horizon steps with human review at each, or the Translator for converting a high-level intent into the input to a real planner.

The mitigation question: can I name and bound the state space? If the answer is “no, it’s combinatorial, but a human can scan a list and pick reasonable answers,” that’s a clue the answer isn’t literally combinatorial — it’s just unfamiliar to the human. A solver may be the right tool. If the answer is “yes, it’s tractable, the search space is manageable,” then a deterministic optimization tool will outperform an LLM by orders of magnitude in both quality and reproducibility.

The Reproducer

The anti-pattern: using an LLM in a context where bit-exact reproducibility is required.

Why it fails: LLM outputs vary. Even at temperature 0, the same prompt can produce slightly different outputs across runs (different tokenization, different floating-point summations, different model versions over time). Audit, regulatory, and compliance contexts often require that the same input always produces the same output — and that the basis for any output be inspectable years later. LLMs cannot provide either property.

Examples of misuse:

An LLM as the calculation engine for a regulatory filing where the same inputs must produce the same numbers across years.
LLM-generated tax computations.
LLM-driven loan approval decisions where regulators require explanability of the causal factors.
LLM-generated medical-billing codes where the same chart must always produce the same coding.

The good-fit relative: the Explainer (when the deterministic decision is the source of truth) or the Drafter (when a human reviews and accepts before the output becomes binding).

The mitigation question: if a regulator asks “show me the same input produces the same output,” can I demonstrate it? If no, the LLM is the wrong layer. If a regulator asks “show me why this output was produced,” and the answer requires a verbal explanation rather than a chain of structured rules, the LLM is the wrong layer. (LLM-generated explanations of deterministic decisions are fine — that’s the Explainer pattern. LLM-generated explanations of LLM-generated decisions are post-hoc rationalizations, not causes. See Supervision and explainability.)

The Compliance Decider

The anti-pattern: the LLM makes a final compliance, legal, or regulatory determination.

Why it fails: compliance decisions usually require knowledge of specific rules, specific facts, and specific applicability. The LLM can summarize the rules, draft a legal-style explanation, and even propose a tentative determination — but the binding decision needs to rest with a human or a deterministic rule engine, because the cost of a wrong determination is borne by the company, not by the model.

This isn’t about whether the LLM is “smart enough.” Even a perfect LLM with full knowledge of the law would be the wrong choice for binding determinations, because no LLM is accountable. When a determination turns out wrong, the company can’t sue the model. The accountability has to live with humans or with deterministic systems whose logic can be examined and corrected.

Examples of misuse:

KYC determination of whether a customer’s onboarding documents are sufficient.
Sanctions screening with the LLM as final authority.
SOX-relevant determinations of whether a control is operating effectively.
HIPAA determinations of whether a disclosure is permissible.
Whether a transaction is reportable to regulators.

The good-fit relative: the Drafter or the Search-and-Summarize. The LLM helps the compliance officer by drafting findings, retrieving relevant precedent, or summarizing applicable rules. The compliance officer makes the binding call. Where decisions can be made deterministically (by clearly-coded rules), use a rule engine, not an LLM.

The mitigation question: who is accountable when this determination is wrong? If the answer is “no human, the system decided,” and the determination has compliance consequences, that’s the architecture flag. The accountability has to attach to a person or a documented deterministic process.

The Source of Truth

The anti-pattern: treating the LLM’s output as authoritative without checkable grounding.

Why it fails: LLM training data is finite, dated, and not curated for your specific business reality. An LLM doesn’t know your inventory levels, your customer terms, your supplier agreements, your current pricing, or last week’s policy changes. When asked questions about any of those, it will produce an answer that sounds plausible — drawn from its training data, recent context, or pattern completion — and the answer will frequently be wrong in ways that are invisible to users.

Examples of misuse:

“What’s our current inventory of SKU X?” — answered from the LLM’s general knowledge rather than a real-time database query.
“What does our return policy say?” — answered from the LLM’s training data rather than the actual current policy.
“What’s the price of part Y?” — answered conversationally rather than by querying the catalog.
“Has customer Z had any recent issues?” — answered by hallucinating plausible history rather than querying CRM.

The good-fit relative: the Search-and-Summarize. Same conversational interface, but the LLM is required to retrieve from your actual system before answering, and to cite the retrieval source. The interface looks the same to users; the architecture is fundamentally different.

The mitigation question: does this answer come from a system of record, or from the LLM’s training data? If you can’t tell — if the same prompt could produce either kind of answer depending on whether retrieval succeeds — your architecture is letting the LLM be the source of truth. Tighten the prompt to refuse rather than confabulate; gate every authoritative answer behind a successful retrieval.

What these anti-patterns have in common

The properties that recur across all seven failures:

The cost of a wrong output is high and invisible. The output goes somewhere consequential, and nobody catches the mistake until the consequences land — sometimes much later.
The “right” answer is well-defined. There’s an exact correct answer for arithmetic, optimization, reconciliation, regulatory compliance. An LLM produces approximations of well-defined answers, and the approximations are operationally unacceptable.
Verification is expensive or delayed. You can’t easily check whether last month’s reconciliation was right, last quarter’s compliance determinations were correct, or last week’s autonomous PO was a good idea — at least not without doing the work the LLM was supposed to be replacing.
The output is operationally committed before review. The system acts on the LLM’s output (sending it, posting it, executing it) before any human can intervene. This is the structural difference between Drafter and Sole Authority.

A useful diagnostic: when someone proposes an LLM use case, ask what does it cost when this is wrong? If the cost is bounded (a human reads the draft and rewrites it; a misclassified ticket gets re-tagged; a wrong extraction gets caught by a structural check), the use case is probably a good fit. If the cost is unbounded — financial, regulatory, or trust-shaped — then either redesign the supervision so the cost becomes bounded, or use a different tool.

Where next

The catalog of good and bad fits isn’t the whole story. Most real systems sit on a spectrum, not at the poles. The next chapter, The deterministic ↔ probabilistic spectrum, develops the more nuanced view that hybrid architectures are built on.