Beyond the Hype: Making AI Work in Industrial Automation

When GPT-5 scored 73 out of 80 on the same logical reasoning test I’ve used to evaluate thousands of programmers over 30 years—matching only the elite few who’ve achieved such scores—it marked a watershed moment. But the story isn’t about the impressive score. It’s about what the seven wrong answers reveal regarding AI’s fundamental limitations in industrial automation.

The Pattern Recognition Paradox

For three decades, first at InduSoft (now part of AVEVA) and currently at Tatsoft, I’ve used an 80-question assessment covering pattern recognition, logical reasoning, and problem-solving to identify programmers capable of designing complex systems. Human candidates get 45 minutes; GPT responds instantly. The progression has been remarkable: GPT-3.5 scored 47, GPT-4 reached 57, and GPT-5 hit 73. (reference article: Where GPT-5 Shines — and Where It Still Fails).

Yet when I examined GPT-5’s errors, particularly one involving the following letter sequence (ABBCBDEFBGHI), the failure mode proved illuminating. Despite 98 seconds of “thinking,” GPT-5 caught one pattern but missed the second rule governing the sequence. If you add some formatting, the pattern of the (B) separator and the increasing size of alphabetical letter blocks clearly emerges: A (B) BC (B) DEF (B) GHI—pointing to J to complete it. More troubling than the wrong reply GPT gave is the lack of consistency. On each retry, it spawned a different wrong logic. The basic logic flaws detected include:

Creating rules ignoring large sets of data
Creating rules that don’t apply to supplied data
Creating rules that apply to a subset of data, where rules applicable to entire dataset exist

The lack of consistency is not specific to GPT. Google Gemini, for instance, every time I search for my own article on reasoning evaluations, finds the article summary but credits it to a fictitious person sharing my family name (see reference article, Beyond Prompt Engineering: The Entity Engineering).

For the general public, these examples may be classified as curiosities, or even humorous in the case of the non-existing relatives, but in the industrial environment, dealing with real physical assets—potentially critical infrastructure assets—they represent fundamental risks.

The proven lack of consistency, the lack of repeatability on results even with the exact same input, the fundamental flaws in basic reasoning in some percentage of the tests, draw a clear line on where this type of technology can be applied in automation, now and in the future.

About the future, it’s important to clarify that these fundamental flaws are not possible to solve by additional training data or scaling computing resources. They are related to the fundamental statistical models used at the core of LLM technologies. Scaling data centers, training data, and alignment validation on the models can diminish the error margin, not cure the fundamental structural flaws that prevent them from controlling critical assets.

A Critical Industrial Scenario

Consider a distillation column where temperature, pressure, and reflux ratio interact through complex, time-dependent relationships. An operator knows that when bottom temperature rises above 180°C while reflux ratio drops below 1.2, intervention is needed—but the specific action depends on feed composition, ambient conditions, and downstream demand. These parameters are well-documented in control engineering literature, with reflux ratios of 1.1-1.5 considered optimal and temperatures of 30-400°C typical in atmospheric distillation.

GPT-5 might correctly identify the abnormal condition 9 times out of 10, but that tenth time, it could ignore the reflux ratio entirely, focusing only on temperature because that pattern is statistically stronger in its training. In a $10 million/day operation, “mostly right” becomes catastrophically wrong.

Why This Matters for Industrial Systems

In industrial automation, we don’t just need correct answers most of the time—we need consistent, deterministic logic every time. Consider these realities:

Pattern Detection vs. Understanding: GPT-5 excels at recognizing patterns it has seen before but struggles with multi-rule scenarios common in industrial processes. When a compressor surge occurs, it involves pressure differentials AND temperature gradients AND flow reversals, can we trust a system that might ignore one rule to fit its pattern?

Error Recovery: Elite human programmers detect and adapt when they make mistakes. They learn on the spot. GPT-5, even in “thinking” mode, often doubles down on flawed logic. In a control room managing critical infrastructure, this difference isn’t academic—it’s existential.

Statistical vs. Deterministic: Large language models are sophisticated statistical engines predicting the next most likely token. Industrial control systems require deterministic responses where the same input always produces the same output. These are fundamentally different paradigms.

The Machine Learning Success Story We’re Not Discussing

While everyone focuses on LLMs, machine learning (ML) has quietly revolutionized industrial automation for years with documented, measurable results:

UPS ORION: The most comprehensively documented success, saving $300-400 million annually by reducing 100+ million miles and 10+ million gallons of fuel. The system processes 300+ million data points daily, using predictive analytics before complete delivery manifests through historical pattern analysis and dynamic route adaptation.

Shell’s Scale Achievement: While specific cost reduction percentages vary by source, Shell has verifiably deployed predictive maintenance across 10,000+ pieces of equipment, processing 20 billion rows of data weekly and running 11,000+ ML models that generate 15+ million predictions daily.

Industry-Wide Impact: McKinsey documents 18-25% maintenance cost reduction and up to 50% reduction in unplanned downtime. Deloitte reports 40% maintenance cost reduction with 70% downtime decrease. The U.S. Department of Energy confirms 8-12% savings versus preventive maintenance approaches.

The key difference? These ML applications work within bounded, well-defined problem spaces with clear success metrics. They augment human expertise rather than attempting to replace it.

Building AI-Ready Architecture: Lessons from the Field

After three years developing our next platform with AI as a core design consideration, several principles have emerged:

Human-in-the-Loop is Non-Negotiable: Every AI decision affecting physical processes must have audit trails and confirmation gates. This isn’t about mistrust—it’s about maintaining accountability in systems where failure has real consequences.

Separate Deterministic from Probabilistic: Traditional ladder logic and PID loops handle critical control functions. AI layers provide optimization suggestions, anomaly alerts, and pattern insights. Never let statistical models directly control safety-critical operations.

Design for Explainability: Black-box AI has no place in industrial settings. Operators need to understand not just what the AI recommends, but why. This means choosing architectures that support inspection and reasoning traces.

Embrace Hybrid Architectures: The future isn’t AI replacing SCADA/HMI systems—it’s AI-enhanced platforms where machine learning handles pattern recognition, LLMs assist with documentation and troubleshooting, and deterministic logic maintains control.

When to Deploy AI: A Practical Framework for LLM-Based Systems

Note: This framework applies specifically to AI models involving Large Language Models (LLMs), not to all AI algorithms or machine learning applications.

Green Light (Deploy with confidence):

Historical data analysis and trending
Predictive maintenance scheduling
Anomaly detection with human review
Documentation assistance and knowledge retrieval

Yellow Light (Proceed with human oversight):

Real-time optimization suggestions requiring operator approval
Quality prediction with verification protocols
Energy optimization with constraint checking
Automated reporting with human validation

Red Light (Avoid autonomous deployment):

Direct control of safety-critical systems
Autonomous process changes affecting product quality
Emergency response decisions
Any scenario lacking immediate human intervention capability

Platform Architecture Matters

The choice of underlying technology significantly impacts AI integration success. Platforms built on modern .NET provide native advantages:

Real compilation for deterministic execution
Managed memory preventing AI-induced crashes
True multithreading for parallel model execution
Native interop with Python ML libraries

Contrast this with interpreted scripting tools and web-first architectures that struggle with real-time requirements. When milliseconds matter, the foundation matters more.

Looking Forward

GPT-5’s capabilities are genuinely impressive, but those seven wrong answers remind us that “mostly right” isn’t enough for industrial automation. The path forward isn’t replacing human expertise but augmenting it systematically.

Success comes from understanding what AI does well (pattern recognition, data processing, anomaly detection) and what it cannot do (guarantee correctness, explain failures, take responsibility). Build architectures that leverage strengths while protecting against weaknesses.

The future isn’t artificial intelligence replacing control systems. It’s augmented operations—human expertise enhanced by AI capabilities, creating systems neither could achieve alone.

Author’s Related Articles:

(*1) Taccolini, Marc. “Where GPT-5 Shines — and Where It Still Fails.” Tatsoft Blog, August 2025.
(*2) Taccolini, Marc. “Beyond Prompt Engineering: The Entity Engineering Approach.” Tatsoft Blog, 2025.

Author Bio: Marc Taccolini brings 30+ years of industrial software expertise as founder of Tatsoft and previously InduSoft (acquired by AVEVA). He focuses on bridging operational technology with modern architectures while maintaining industrial-grade reliability.