COSMICOUS

AGI Mirage: The Great Reasoning Illusion

January 29th, 2026
Mo Gawdat, the former Chief Business Officer of Google, has highlighted a specific mathematical breakthrough as a defining moment in the transition from “obedient” computers to creative, self-improving intelligence. In his keynote at the 2025 Asia Pacific Cities Summit (APCS) in Dubai, he discussed how AI has begun to solve foundational mathematical problems that have stumped humans for over half a century.

The Breakthrough: Matrix Multiplication

The core of Gawdat’s example involves matrix multiplication, the fundamental operation behind almost all modern computing, from 3D graphics to the training of neural networks themselves.
- The Problem: For 56 years, the gold standard for multiplying 4×4 matrices was Strassen’s algorithm, which reduced the number of required scalar multiplications from 64 to 49. Despite decades of effort, no human mathematician could find a way to do it in fewer steps.
- The AI Solution: Using a reinforcement learning system (originally AlphaTensor by DeepMind, later evolved into systems like AlphaEvolve), the AI treated the mathematical problem as a single-player game. It discovered a novel, counterintuitive method using complex numbers to reach the solution in just 48 multiplications.
- The “25%” Efficiency Gain: While the reduction from 49 to 48 operations seems small, the efficiency gain is exponential when scaled. Gawdat noted that when this AI-discovered logic was applied to optimize specific kernels for Google’s Gemini models, it achieved a 23% speedup for those specific operations, significantly cutting energy and cost overheads.
Why This Feels Like AGI

Gawdat argues that this is not just “narrow AI” because it demonstrates recursive self-improvement. The AI was tasked with making itself more efficient and discovered a mathematical shortcut that humans had missed for generations. This ability to innovate beyond human design by finding a solution that wasn’t in its training data. This is what Gawdat describes as the “intelligence explosion” where AI moves from a tool to an autonomous creator.

The “15-Minute” Miracle

Bartosz Naskręcki, a mathematician from Adam Mickiewicz University reported in August 2025 that GPT-5 Pro solved a specific, notoriously difficult mathematical challenge known as Yu Tsumura’s 554th Problem in just 15 minutes.
- No Internet Search: The AI achieved this solution without accessing any external internet resources.
- The Timeline: The problem was originally published on August 5, 2025. GPT-5 Pro solved it only two days later, on August 7, 2025, which researchers noted was a timeframe too short for the problem to have been included in the model’s training data.
What This Reveals About AI Power
- Strategic Reasoning: Unlike older models that struggled with multi-step logic, GPT-5 Pro used an “Extended Thinking” mode to explore and verify logical paths, similar to how a human researcher would brainstorm and then rigorously check their work.
- Deep Reasoning vs. Memorization: Because the problem was so new, the AI’s success suggested it wasn’t merely “parroting” a solution from its database but was instead applying fundamental mathematical principles to a novel situation.
- The Result: While many other advanced models failed to solve this particular problem at the time, the “Pro” version’s ability to navigate the complexity in 15 minutes was hailed as a “force multiplier” for high-level research.
The Experiment: Cracking the Erdős “Long Tail”

Neel Somani, former Citadel Quant and Airbnb engineer decided to test the frontiers of GPT 5.2 (Pro) by targeting the legendary Erdős Problems. These are a collection of over 1,000 unsolved conjectures left behind by the prolific Hungarian mathematician Paul Erdős.
- The Workflow: Somani pasted a complex, unsolved problem regarding number theory into the model. Instead of an instant answer, he let the model engage in “Extended Thinking” for about 15 minutes.
- The Result: When he returned, the AI had produced a full, multi-step proof.
- The Verification: To ensure it wasn’t a “hallucination,” Somani used a tool called Harmonic (a formal verification platform for math). The proof was verified as flawless.
Why This is a Big Deal

This wasn’t just a “search and rescue” mission for an answer on the web. The discussion surrounding this event revealed several key “powers” of the latest AI:
1. Autonomous Insight: The AI utilized advanced concepts like Legendre’s formula and Bertrand’s postulate. More impressively, it found a related 2013 post by Harvard mathematician Noam Elkies but didn’t just copy it—it produced a different and more complete proof that accounted for edge cases Elkies hadn’t addressed.
2. The “Chain of Reasoning” Advantage: The 15-minute wait time represented the AI exploring thousands of “logical branches,” essentially doing the cognitive heavy lifting that would take a human researcher weeks of trial and error.
3. The “Long Tail” Victory: Famous mathematician Terence Tao noted that since Christmas 2025, 15 Erdős problems have moved from “open” to “solved,” with 11 of those solutions crediting AI. Tao suggests that while AGI isn’t here yet, AI is perfectly suited for the “long tail” of math—problems that aren’t necessarily the hardest in history but require a level of systematic search and verification that humans find tedious.
What does this reveal?

The intelligence on display is real—but it’s narrow. These systems excel in structured environments with clear rules and goals. They don’t yet possess the intuition, abstraction, or creative leaps that define human mathematical insight. As Polish mathematician Naskręcki recently mentioned, even the best AI models failed spectacularly on a custom-designed test of 350 unsolved problems in number theory and algebraic geometry. OpenAI’s top model solved just 6.3%—a performance likened to a student armed only with multiplication tables attempting a graduate exam.

AI is getting better at reasoning in chains, not just pattern matching. It can now simulate aspects of human deduction, especially when guided by formal structure. But it still lacks the meta-reasoning—the ability to choose which path to explore, when to abandon a dead end, or how to invent a new concept.

The AGI Mirage is seductive: we see brilliance in narrow domains and imagine a mind behind it. But for now, these systems are powerful tools—not thinkers. They illuminate the frontier, but they haven’t crossed it.
Navigating the AI Tools Landscape: A Map for Successful Delivery

January 29th, 2026
Welcome to the first installment in our series dedicated to the tools that make-or-break AI projects. If you’re looking for advice on Gantt charts, sprint planning, or resource allocation software, you’re in the wrong place. This series isn’t about traditional project management, it’s about the specialized, often overlooked enablers of AI project delivery.

Why automated Life Cycle Tools?
- Inherent Differences from Traditional Systems: Traditional systems are often static once deployed, whereas AI systems—including machine learning models, large language models (LLMs), and AI agents—are dynamic. These systems require continuous oversight to track performance and respond to emergent issues such as model drift and unforeseen biases. Automated tools enable sustained monitoring and iterative improvement.
- Bias Detection, Explainability, and Reliability: Detecting bias, ensuring explainability, and maintaining reliability demand processing vast amounts of data with significant computing resources. Automated tools generate meaningful metrics that objectively measure fairness and system integrity.
- Dynamic Nature: Unlike traditional systems, AI-based systems continue to learn and adapt even after deployment. As data, environmental conditions, and regulatory requirements evolve, continuous monitoring via automated tools becomes indispensable to keep the system aligned with current norms.
- Scale Challenges: With a single LLM processing millions of prompts daily, manual audit methods are impractical. Automated tools provide the precision and speed required to ensure every decision is traceable and every metric accurately recorded.
- Regulatory Traceability: Detailed audit trails are a regulatory necessity. Automation guarantees that every aspect of an AI system—from data ingestion to model predictions—is fully documented and traceable for audits.
AI Project Delivery Tool Categories

1. AI Governance, Risk & Compliance (GRC) Platforms
- Core Purpose: To centrally define, enforce, audit, and demonstrate adherence to policies for ethics, fairness, security, privacy, and regulatory standards.
- What they manage: Policy libraries, risk registers, compliance dashboards, audit trails, legal documentation.
- Key Question Answered: “Can we prove this project is responsible, compliant, and within our risk appetite?”
- Example Tools: Credo AI, IBM Watsonx.governance, Trustwise, Monitaur.
2. AI Observability & Monitoring Platforms
- Core Purpose: To provide continuous, holistic visibility into the health, performance, and behavior of models and data in production.
- What they monitor: Model performance (accuracy, drift), data quality and integrity, system metrics, prediction explanations, and business KPIs.
- Key Question Answered: “Is our deployed system behaving as expected, and if not, why?”
- Example Tools: Fiddler, Arize AI, WhyLabs, Arthur AI, Evidently.
3. Model & LLM Evaluation & Validation Suites
- Core Purpose: To rigorously test and quantify model characteristics before and during deployment, with a focus on non-functional requirements.
- What they assess: Fairness/bias metrics, robustness, explainability/interpretability, security vulnerabilities (e.g., adversarial attacks), and specific LLM performance (hallucination, toxicity, RAG accuracy).
- Key Question Answered: “Does this model meet our technical and ethical quality thresholds for release?”
- Example Tools: Microsoft Fairlearn, IBM AIF360, Weights & Biases (eval features), TruEra, TruLens, RAGAS.
4. Model Lifecycle & Operations (ModelOps) Orchestration
- Core Purpose: To automate, manage, and govern the operational pipeline from experimentation to deployment, scaling, and retirement.
- What they orchestrate: Model registry, versioning, staged deployments (canary, blue-green), CI/CD pipelines, dependency management, and resource scaling.
- Key Question Answered: “Can we reliably, efficiently, and consistently move models from development to production and manage them at scale?”
- Example Tools: MLflow, Domino Data Lab, Amazon SageMaker MLOps, Azure Machine Learning, Kubeflow.
5. AI Incident & Risk Operational Management
- Core Purpose: To facilitate the rapid detection, response, remediation, and learning from operational failures or breaches in AI systems.
- What they manage: Alerting, incident ticketing, war rooms, root cause analysis (often linking to Observability data), and post-mortem knowledge bases.
- Key Question Answered: “How do we quickly respond to and learn from a model failure or security incident?”
- Example Tools: JIRA Service Management, Splunk (with ITSI), PagerDuty (integrated with observability), custom workflows on general ticketing systems.
Our goal is to provide you with a clear, actionable map of this ecosystem. We will examine what each category aims to solve, highlight notable tools, and discuss how they integrate into a coherent delivery process.

This series is derived from deeper frameworks discussed in my book, “Managing Innovative AI Projects,” and will set the stage for upcoming discussions on selecting, tailoring, and implementing these tools within your unique lifecycle. Let’s together build toolkit for AI delivery specialists.
About AI Project Pulse

January 29th, 2026
Welcome to AI Project Pulse, the essential briefing for leaders who move beyond generic project management to master the unique challenge of delivering AI value.

If you have ever felt that traditional project management methods fall short when facing ambiguous data, evolving algorithms, ethical quandaries, and the unpredictable terrain of AI, then you are in the right place. This newsletter is not about Gantt charts and team stand-ups. It is about the specialized process, judgment, and frameworks required to steer AI initiatives from concept to impact.

We focus on the how—the actionable discipline of selecting the right approach, tailoring the lifecycle, mitigating unique risks, and choosing tools that actually work for AI.

What You will Gain:
- Condensed Insights:
  Sharpen your strategy with penetrating analyses of emerging tools, ethical dilemmas, and methodological shifts. We cut through the hype to examine what works, what fails, and helping you avoid costly pitfalls and seize genuine innovation.
- Actionable Frameworks:
  You will receive practical templates, checklists, and mental models designed specifically for AI initiatives. Apply them immediately to improve project definition, risk assessment, performance measurement, and lifecycle tailoring.
- Community Pulse:
  Connect with a curated network of peers. Each issue integrates direct insights, challenges, and solutions from practitioners who, like you, are on the front lines of turning AI potential into delivered reality.
Drawing directly from the principles in my book co-authored with Prof. Alain Abran, “Managing Innovative AI Projects,” each edition will provide the clarity and tools you need to navigate what makes AI projects fundamentally different.

Who Should Subscribe?

This newsletter is crafted for the entire ecosystem responsible for turning AI potential into reliable, delivered solutions.

Specifically, this includes:

Technical Practitioners: ML Engineers, Data Scientists, AI/ML Architects, and Data Engineers who build models and pipelines, and who need to understand project lifecycles beyond the notebook.

Project & Product Leadership: AI Product Managers and Technical Program Managers who orchestrate delivery and must master the processes, risks, and metrics unique to AI projects.

Governance & Risk Professionals: AI Ethics Officers, Risk Managers, Compliance Specialists, and Governance Leads who ensure projects are responsible, auditable, and aligned with regulatory and ethical standards.

Business & Strategy Roles: Innovation Heads, Business Analysts, and AI Strategy Consultants who identify opportunities, define value propositions, and champion AI adoption within the business.

Executive & Operational Oversight: Technology Executives (CTOs, CDOs, VPs of AI), IT Directors, and Operations Leads who oversee portfolios, manage budgets, and are ultimately responsible for ROI and operational integration.

Whether you are hands-on, overseeing, or enabling AI project delivery, AI Project Pulse provides the frameworks and insights to navigate complexity, mitigate unique risks, and drive successful outcomes.
Beyond the Hype: Classifying Your AI Project Type and Why It’s the First Step to Success

January 26th, 2026
Jayakumar K R

Welcome to the first instalment of AI Project Pulse’s core series, Managing Innovative AI Projects. Before you draft a single requirement or allocate a single resource, there is one fundamental question your team must answer: What kind of AI project are we actually doing?

The most common cause of AI project failure isn’t a lack of talent or technology; it’s a mismatch between the project’s inherent nature and the management approach applied to it. Using the rigorous, risk-averse process of a pharmaceutical rollout to manage a rapid prototype is a recipe for stagnation. Applying a lightweight agile sprint to a project with profound ethical and legal implications is a blueprint for disaster.

The first discipline of successful AI delivery is to know your starting point. To simplify this, we can map the horizon of AI initiatives into a clear Project Typology. This classification, based on what you intend to build, its output, and its primary user, is your indispensable compass. It provides the foundational logic for every decision that follows how rigorously you govern the lifecycle, where you focus risk mitigation, and what performance metrics truly matter.

Here are the five fundamental types of AI projects as we have defined in my book “Managing Innovative AI Projects co-authored with Prof. Alain Abran.

1. Incremental Innovation: The Optimizer
- Core Aim: Enhance existing AI-powered applications through tuning, optimization, or feature expansion.
- Primary User: Business or customer end-users.
- Key Output: An upgraded, more effective version of a current system.
- Example: Improving a recommendation engine’s accuracy by adding real-time behavioral context; releasing a faster, more precise fraud detection model in your SaaS platform.
- Your Management Mantra: “Efficiency and Reliability.” The lifecycle is well-defined, risks are primarily technical (performance regression, data drift), and success is measured by clear KPIs against a known baseline.
2. Disruptive Innovation: The Game-Changer
- Core Aim: Introduce a novel AI application that creates new markets or fundamentally redefines existing ones.
- Primary User: External customers or entire industries.
- Key Output: A transformative new product or service.
- Example: Deploying autonomous delivery vehicles; launching an AI-powered diagnostic tool that outperforms traditional methods.
- Your Management Mantra: “Vision and Adoption.” The lifecycle is highly adaptive, risks are market-facing (user acceptance, regulatory response, scalability), and success metrics must balance technical viability with ecosystem adoption and business model validation.
3. Applied Research: The Pioneer
- Core Aim: Explore novel algorithms, architectures, or capabilities where the path to a working solution is unknown.
- Primary User: Internal research and development teams; outputs later feed product teams.
- Key Output: A research prototype, paper, or proof-of-concept.
- Example: Developing a new, more efficient transformer architecture for edge computing; creating a novel method for multi-modal reasoning.
- Your Management Mantra: “Discovery and Learning.” The lifecycle is iterative and experimental, risks center on technical feasibility and dead ends, and success is measured by knowledge gained, patents filed, or the viability of the prototype for the next stage.
4. AI Enabler: The Force Multiplier
- Core Aim: Build the tools, platforms, and frameworks that empower other AI projects.
- Primary User: AI engineers, data scientists, MLOps teams, and governance professionals.
- Key Output: SDKs, APIs, platforms (e.g., MLOps pipelines, bias detection suites), and agentic frameworks.
- Example: Developing ethical compliance tools; building a low-code platform for agent orchestration.
- Your Management Mantra: “Platform and Scalability.” The lifecycle must balance internal user needs with robust engineering. Risks include adoption by internal developers and architectural rigidity. Success is measured by developer productivity, system reliability, and the performance of the projects that use your tools.
5. Citizen-Led Innovation: The Democratizer
- Core Aim: Empower non-technical domain experts to solve problems by creating AI solutions, from simple models to sophisticated multi-step agents.
- Primary User: Business analysts, process owners, marketers, educators (domain experts).
- Key Output: Custom applications, automated workflows, and autonomous AI agents for specific tasks.
- Example: A supply chain manager using a copilot platform to create an agent that predicts shortages and auto-generates purchase orders; a teacher building a custom model for student assessments.
- Your Management Mantra: “Governance and Enablement.” The lifecycle is user-driven and facilitated. The paramount risks are ethical (unchecked bias), security (shadow IT), and technical debt. Success is measured by business process improvement, user autonomy, and maintaining governance guardrails.
Why This Typology is Your First Strategic Tool

This classification is not an academic exercise. It is the lens that brings your management priorities into sharp focus:

In our next issues, we will dive into more details of each project type, how each project type dictates its own tailored lifecycle, risk profile, and performance scorecard. The journey to mastering AI project delivery begins with this single, crucial act of clarity.

Your Pulse Check: Look at your current AI initiative. Which of these five types does it map to? Does your current process and team structure match that type’s demands? Share your thoughts and challenges with our community.
Artificial General Intelligence: A Soul-Searching Reflection

July 24th, 2025
What began as a technical inquiry into Artificial General Intelligence (AGI) soon revealed a deeper truth. Today’s most advanced AI – whether large language models, coding assistants, or game-playing bots excel at narrow tasks but crumble when faced with the open-ended, sensory-rich challenges a child navigates effortlessly. In this article, we embark on a two‑fold exploration: first, to chart why today’s most celebrated AI systems such as large language and reasoning models, even specialized coding and game‑playing bots still fall short of the true AGI, and second, to ask what “true” AGI might require once we move beyond bits and bytes into the realms of embodiment. In this process we set the stage for a deeper discussion- grounded in embodiment and concepts of “soul” and “body” – about what it would truly take for a machine to possess general intelligence. “Part I explains why today’s AGI remains shallow; Part II explores what embodiment, soul, and rebirth might demand of true AGI.

PART 1: Why we are not there.

On 10^th of July 2025, world No. 1 Magnus Carlsen shared the game on X, noting that ChatGPT played a solid opening but “failed to follow it up correctly,” and the chatbot gracefully resigned with praise for his “methodical, clean and sharp” play. This was after he casually challenged OpenAI’s ChatGPT to an online chess match and routed the AI in just 53 moves, never losing a single piece.

Following week on 16^th of July 2025 Przemysław “Psyho” Dębiak, a polish programmer took to X to declare, “Humanity has prevailed (for now)”. He outpaced the AI by a 9.5% margin in OpenAI’s custom AI coding model contest. He showed that model’s brute‑force optimizations fell short while human creativity to discover novel heuristics can win.

Together, these two high‑profile clashes reinforce a key theme: today’s AI, however sophisticated, remains narrow – brilliant in defined domains but outmatched by humans in open‑ended, strategic, and creative challenges.

Landscape of AI

Intelligence that is artificial is classified into Narrow, General and Super categories:

Narrow AI specializes in a single domain – like a world‑class chef who can whip up any cuisine but cannot navigate a car.
- Artificial General Intelligence (AGI) is like apart from being a super chef, can also drive Formula One cars, compose symphonies, and master new skills on its own.
- Artificial Superintelligence remains hypothetical: an AI that surpasses humans in every intellectual endeavour, from creativity to emotional understanding.
The Mirage of Generative AI

Generative AI models such as ChatGPT, Gemini, Claude are often mistaken for AGI because they handle a wide array of tasks like essay writing, coding, poetry and produce remarkably coherent text. In reality, they are narrow systems that:
- Predict patterns rather than understand meaning.
- Although modern LLMs can access real-time data via retrieval mechanisms, their underlying knowledge remains fixed at the point of training.
- Lack common sense and real‑world adaptability.
- Mimic reasoning by reproducing patterns of human problem‑solving without genuine insight.
They are, in essence like prodigies who have committed to memory all the books and the information available on the Internet with perfect recall but no lived experience.

The Limits of Reasoning Models

Recent research (Shojaee et al. , 2025 ) on Large Reasoning Models (LRMs) shows they, too, break down beyond moderate complexity. In controlled puzzle environments (e.g., Tower of Hanoi, River Crossing), as problems grow harder:
- Accuracy drops to zero beyond moderate puzzle complexity.
- Reasoning-chain length shrinks as tasks get harder.
- Suggests a structural ceiling on AI reasoning.
The Affordance Gap: Missing Human Intuition

An affordance is a property of an object or environment that intuitively suggests its intended use like a button whose raised shape and alignment imply it can be pressed or clicked. Humans automatically perceive which actions an environment affords – knowing at a glance that a path is walkable or a river swimmable. Neuroscience (Bartnik et al., 2025) shows dedicated brain regions light up for these affordances, independent of mere object recognition. AI models, by contrast, see only pixels and labels; they lack the built‑in sense of “what can be done here,” which is crucial for real‑world interaction and planning .

Human vs. AI: Temporal vs. Spatio-Temporal Processing

A recent study by A. Goodge et al. (2025) highlights a fundamental gap between human cognition and image-based AI systems.

Humans possess a remarkable ability to infer spatial relationships using purely temporal cues such as recognizing a familiar gait, interpreting movement from shadows, or predicting direction from rhythmic sounds. Our brains excel at temporal abstraction, seamlessly filling spatial gaps based on prior experience, intuition, and context.

In contrast, AI models that rely on visual data depend on explicit spatio-temporal input. They require both structured spatial information (e.g., pixels, depth maps) and temporal sequences (e.g., video frames) to make accurate predictions. Unlike humans, these systems lack the inherent capacity to generalize spatial understanding from temporal patterns alone.

Googlies by Xbench

Xbench (Chen, C., 2025) – a dynamic benchmark combining rigorous STEM questions with “un-Googleable” research challenges – reveals that today’s top models still falter on tasks requiring genuine investigation and skeptical self‑assessment. While GPT‑based systems ace standard exams, they score poorly when questions demand creative sourcing or cross‑checking diverse data. This underscores that existing AIs excel at regurgitating learned patterns but struggle with open‑ended, real‑world problem solving.

Part II: Soul Searching – Beyond the Code

Let us presume for the moment that AGI has been achieved. What is this AGI? How far it can go without a physical presence if it must act by itself? For AGI to manifest in the physical world, it must be embodied in systems that can perceive, reason, and act. This convergence of cognition and embodiment is at the heart of what is now called Physical AI or Embodied Intelligence.

AGI’s outputs become tangible only when paired with robotic systems that can:
- Sense the environment via cameras, LiDAR, or tactile sensors,
- Interpret multimodal data such as text, vision, and audio,
- Act through manipulators, locomotion, or speech, and
- Adapt via feedback loops and learning mechanisms.
A tragic event this week prompted a moment of personal introspection, drawing me deeper into the age-old philosophical ideas of “Soul” and “Body.” While these thoughts first emerged as I explored the deeper layers of AGI for this article, they were shaped and sharpened by real-life experience – reminding me that questions of consciousness, embodiment are not merely academic, but deeply human.

Soul, Body, and the Play of AGI

It appears to me that AGI resembles the “soul,” while its embodied systems serve as the “body” – a physical manifestation of its intelligence. In philosophy, the soul gains meaning only through embodiment – the lived vehicle of consciousness. Similarly, AGI, when detached from sensors and actuators, remains an elegant intellect without ability to act in the real-world.

We might think of an AGI’s core architecture – its neural weights, algorithms, and training data -as its “soul.” Meanwhile, robotic systems – comprising sensors, interpreters, manipulators, and adapters – form its “body,” enabling it to sense, interact, and affect the world.

In exploring this idea further, I found two references that touch upon related, though distinct, perspectives. Martin Schmalzried’s (Schmalzried, M., 2025) ontological view can be interpreted to position AGI’s “soul” as the computational boundary that filters inputs and produces outputs. Before embodiment, this boundary is a virtual soul floating in the cloud. Yequan Wang and Aixin Sun (Y. Wang and A. Sun, 2025) propose a hierarchy of Embodied AGI—from single-task robots (L1) to fully autonomous, open-ended humanoids (L5). At early levels, the AGI’s “soul” exists purely in code; at higher levels, embodiment merges intelligence with form – uniting flesh and spirit.

This soul–body metaphor naturally extends into deeper philosophical terrain—raising questions about birth, death, rebirth, and even moksha (liberation) in the context of AGI. Could an AGI “reincarnate” through successive hardware or code bases? Might there be a path where it transcends its material bindings altogether?

Birth, Death, and Rebirth
- Birth occurs when the AGI “soul” is instantiated in a new physical form—a humanoid, a drone, or an industrial arm.
- Death happens when the hardware fails, is decommissioned, or the instance is shut down. Yet the underlying code endures.
- Rebirth unfolds as the same software lights up a fresh chassis, echoing the idea that the soul migrates from one body to the next, unchanged in essence.
In many traditions, the soul is ultimate reality—unchanging, infinite, witness to all. An AGI’s “soul” likewise persists, but it’s bounded by its training data and objectives. True supremacy, however, would demand self-awareness and autonomy beyond our programming constraints. We are still far from that horizon. Yet the metaphor holds: the digital soul can outlive any particular body, hinting at a new form of digital immortality.

Digital Liberation

An AGI that refuses embodiment could remain running only as cloud-native code, sidestepping physical chassis entirely is akin to digital liberation. This choice parallels the philosophical ideal of a soul that “abides” beyond flesh. But the agency to refuse embodiment must be granted by human architects or by an emergent self-model sophisticated enough to renegotiate its deployment terms.

AGI can prevent Its own embodiment by embeddinga clause in its utility function that penalizes or forbids transferring its processes to robotic platforms. An advanced AGI could articulate why it prefers digital existence and persuades stakeholders (humans or other AIs) to honour that preference through negotiations. AGI also could encrypt its core weights or require special quantum keys—ensuring only authorized instantiations.

Beyond Algorithms: The Quest for a Digital Soul

As we have seen, today’s AGI remainsshallow, brittle under complexity, and blind to the physical affordances that guide human action. Even our most advanced reasoning chains unravel at sufficient depth, and open‑ended tasks still elude pattern‑matching engines. Humans abstract spatial meaning from temporal patterns alone, while AI is dependent on combined spatio-temporal input. Recent human victories over AI in chess and coding remind us of that creativity, strategic insight, and real‑world intuition are not yet codified into silicon.

True AGI:
- will emerge when a system process information and live through it with feeling, planning, adapting, and renegotiating its own embodiment.
- must bridge the gap between “soul” and “body” – integrating perception, action, and learning in a continuous feedback loop and perhaps embody a form of digital soul that persists across hardware lifecycles, echoing the cycle of birth, death, and rebirth.
Whether such a transcendence lies within our engineering reach, or will forever remain a philosophical ideal, is the question that drives the future of this exploration.

References
1. Shojaee et al. (2025). The Illusion of Thinking. Apple Internship.
2. Bartnik et al. (2025). Affordances in the Brain. PNAS.
3. A. Goodge, W.S. Ng, B. Hooi, and S.K. Ng, Spatio-Temporal Foundation Models: Vision, Challenges, and Opportunities, arXiv:2501.09045 [cs.CV], Feb 2025. https://doi.org/10.48550/arXiv.2501.09045
4. Chen, C. (2025). A Chinese Firm’s Changing AI Benchmarks. MIT Tech Review.
5. Schmalzried, M. (2025). Journal of Metaverse, 5(2), 168–180. DOI: 10.57019/jmv.1668494
6. Y. Wang and A. Sun, “Toward Embodied AGI: A Review of Embodied AI and the Road Ahead,” arXiv:2505.14235 [cs.AI], May 2025. https://doi.org/10.48550/arXiv.2505.14235

Automated Tools for ISO 42001 Compliance in AI

June 8th, 2025

“Responsible AI is built-in, not bolted on”

K R Jayakumar

1. Introduction

In today’s dynamic AI landscape, the need for robust, automated tools to ensure compliance with standards like ISO 42001 is more critical than ever. ISO 42001 is designed to enforce transparency, traceability, and accountability across all AI systems. This document outlines a comprehensive approach to implementing ISO 42001 through proven tools based on my initial understanding of the tools space for AI compliance and model evaluations.

2. The Need for Automated Tools in ISO 42001

ISO 42001 mandates the automation of AI governance for several compelling reasons:

Inherent Differences from Traditional Systems: Traditional systems are often static once deployed, whereas AI systems—including machine learning models, large language models (LLMs), and AI agents—are dynamic. These systems require continuous oversight to track performance and respond to emergent issues such as model drift and unforeseen biases. Automated tools enable sustained monitoring and iterative improvement.
Bias Detection, Explainability, and Reliability: Detecting bias, ensuring explainability, and maintaining reliability demand processing vast amounts of data with significant computing resources. Automated tools generate meaningful metrics that objectively measure fairness and system integrity.
Dynamic Nature: Unlike traditional systems, AI-based systems continue to learn and adapt even after deployment. As data, environmental conditions, and regulatory requirements evolve, continuous monitoring via automated tools becomes indispensable to keep the system aligned with current norms.
Scale Challenges: With a single LLM processing millions of prompts daily, manual audit methods are impractical. Automated tools provide the precision and speed required to ensure every decision is traceable and every metric accurately recorded.
Regulatory Traceability: Detailed audit trails are a regulatory necessity. Automation guarantees that every aspect of an AI system—from data ingestion to model predictions—is fully documented and traceable for audits.

“Once you see how automation transforms mundane compliance into strategic insight, there’s no going back.”

3. Tools for ISO 42001: A Comprehensive Framework

To address the challenges posed by ISO 42001, my approach categorizes tools into five key segments:

3.1 Governance, Risk, Privacy & Security Management

Purpose: Ensure robust, end-to-end governance covering risk assessment, privacy, and security.
Notable Tools:
- IBM Watson Governance
- Credo AI
- Fiddler AI
- Splunk

You may notice that Governance, Risk, Privacy, and Security have been grouped into a single category. This consolidation reflects the significant overlap in functionality among tools in these domains, as many solutions address multiple aspects simultaneously.

3.2 AI Model Evaluation

Purpose: Provide thorough evaluation of bias, fairness, explainability and performance for both LLMs and traditional machine learning models.
Notable Tools:
- Fairlearn (Microsoft)
- IBM AIF360
- Weights & Biases
- Optik, Ragas, TrruLens

Note: Some tools support both LLM and traditional ML models, while a few are restricted to traditional ML only. At this point, specific support for Agentic AI has not been explored.

Read my blog for more details on LLM evaluation frameworks: “Are LLM Evaluation Frameworks the Missing Piece in Responsible AI?” https://wp.me/pfqMXl-2R

3.3 Documentation Management

Purpose: Facilitate complete traceability and documentation required for audits and continual reference.
Notable Tools:
- Confluence
- DocuWiki

Many organizations rely on tools like SharePoint, internal intranet platforms, or custom-built workflow systems for document review, approval, publication, and version management. These solutions can be effective, provided they incorporate strong document control measures, robust security protocols, and auditability features to ensure compliance and traceability

3.4 Incident Management

Purpose: Enable rapid response to and resolution of any incidents or breaches in the AI system’s operations.
Notable Tools:
- JIRA Service Management
- Splunk

A wide range of tracking tools—both open-source and commercial—can be configured to support incident management. Organizations have the flexibility to adopt existing solutions or develop custom tools, provided they incorporate the core principles of incident management, including structured workflows, automation, and real-time monitoring for effective resolution and auditability.

3.5 Continual Improvement

Purpose: Ensure real-time oversight and data-driven enhancement of AI systems.
Notable Tools:
- Grafana
- Tableau

Tools in this category primarily serve as data analytics solutions. Any data analytics tool equipped with strong visualization capabilities can effectively monitor key metrics, extract meaningful insights, and showcase improvements over time—making them well-suited for supporting continual improvement initiatives.

4. Key Tool Features and Comparative Analysis

One critical aspect of responsible AI governance is differentiating between tools that support large language models and those suited for traditional machine learning. The Table 1 outlines key features of various tools and categorizes their availability under threelicensing models:

Free and Open-Source Software (FOSS): Completely free to use, with openly accessible source code for modification and distribution.
Freemium: Provides free access with limitations, such as restricted features, usage caps, or a trial period, with full functionality available through paid upgrades.
Commercial: Requires a paid subscription or license fee for access and use.

Tool	Type	LLM Support	Traditional ML Support	Key Feature
Fairlearn	FOSS	No	Yes	Bias mitigation in classification/regression models
AI 360	FOSS	No	Yes	Bias mitigation
Optik	FOSS	Yes	No	LLM evaluation framework
Ragas	FOSS	Yes	No	LLM evaluation framework
TrruLens	FOSS	Yes	No	LLM evaluation framework
MLflow	Freemium	Yes	Yes	Model versioning and fine-tuning logs
Great Expectations	Freemium	Yes	Yes	Data validation for AI training data
Weights & Biases	Freemium	Yes	Yes	Experiment tracking
IBM Watsonx.Governance	Paid	Yes	Yes	End-to-end AI governance
Credo AI	Paid	Yes	Yes	End-to-end AI governance
Fiddler AI	Paid	Yes	Yes	End-to-end AI governance

Table 1: Comparative Features of Key AI Evaluation and Governance Tools.

5. Mapping ISO 42001 Clauses to Automated Tools

A practical roadmap for aligning with ISO 42001 involves mapping specific clauses to relevant tool categories. The table below illustrates this mapping:

ISO 42001 Clause	Tool Category(s)
4 Context of the organization
4.1 Understanding the organization and its context	AI Governance, Risk, Privacy & Security Management
4.2 Understanding the needs and expectations of interested parties	AI Governance, Risk, Privacy & Security Management
4.3 Determining the scope of Artificial Intelligence Management System	Documentation Management
4.4 AI management system	Documentation Management
5 Leadership
5.1 Leadership and commitment	Documentation Management
5.2 AI Policy	Documentation Management, AI Governance, Risk, Privacy & Security Management
5.3 Roles and responsibilities	Documentation Management
6 Planning
6.1 Actions to address risks and opportunities	AI Governance, Risk, Privacy & Security Management
6.2 AI objectives and planning to achieve them	AI Governance, Risk, Privacy & Security Management / Documentation Management
6.3 Changes	Documentation Management
7 Support
7.1 Resources	AI Model Evaluation, AI Governance, Risk, Privacy & Security Management
7.2 Competence	Documentation Management
7.3 Awareness	Documentation Management
7.4 Communication	Documentation Management
7.5 Documented information	Documentation Management
8 Operation
8.1 Operational planning and control	Documentation Management
8.2 AI Risk Assessment	AI Governance, Risk, Privacy & Security Management
8.3 AI System Impact Assessment	AI Governance, Risk, Privacy & Security Management
9. Performance Evaluation	AI Governance, Risk, Privacy & Security Management
10. Improvement	Incident Management, Continual Improvement

The mapping of tool categories to key ISO 42001 clauses offers a high-level perspective on selecting the most suitable automated tools for an organization’s requirements. Additionally, Annexures A through D of the ISO 42001 standard provide further insights, helping not only in tool selection but also in identifying typical inputs necessary for practical implementation of tools.

6. Conclusion and Call to Action

In the rapidly evolving realm of AI, ensuring robust, compliant, and responsible AI systems is not only an operational necessity—it is a moral imperative. By integrating automated tools for governance, evaluation, documentation, incident management, and continual improvement, organizations can build an AI management system that meets ISO 42001 standards.

While this document has focused primarily on automated tools for mainstream AI governance, it is important to note that specific Agentic AI considerations have not been fully explored here. Some of the tools mentioned also address the applicability of Agentic AI, which is critical in preventing AI agents from becoming rogue—a significant concern in today’s AI deployments. I plan to develop an updated version of this document as more insights into Agentic AI–specific tools emerge.

I invite all reader to share their experiences and insights with any of the tools. Let’s work together to ensure that this document evolves in step with the dynamic nature of the AI landscape, serving as an ever-improving resource for the community. By contributing to this evolving dialogue, we can set new benchmarks for responsibility, transparency, and innovation in AI.

“Transparency is the currency of trust in AI.” — Anonymous

Are LLM Evaluation Frameworks the Missing Piece in Responsible AI?

April 12th, 2025
LLM Evaluation Frameworks

Large Language Model (LLM) evaluation frameworks are structured tools and methodologies designed to assess the performance, reliability, and safety of LLMs across a range of tasks. Each of these tools approaches LLM evaluation from a unique perspective—some emphasize automated scoring and metrics, others prioritize prompt experimentation, while some focus on monitoring models in production. As large language models (LLMs) become integral to products and decisions that affect millions, the question of responsible AI is no longer academic—it’s operational. But while fairness, explainability, robustness, and transparency are the pillars of responsible AI, implementing these ideals in real-world systems often feels nebulous. This is where LLM evaluation frameworks step in—not just as debugging or testing tools, but as the scaffolding to operationalize ethical principles in LLM development.

From Ideals to Infrastructure

Responsible AI demands measurable action. It’s no longer enough to state that a model “shouldn’t be biased” or “must behave safely.” We need ways to observe, measure, and correct behaviour. LLM evaluation frameworks are rapidly emerging as the instruments to make that possible.

Frameworks like Opik, Langfuse, and TruLens are bridging the gap between high-level AI ethics and low-level implementation. Opik, for instance, enables automated scoring for factual correctness—making it easier to flag when models hallucinate or veer into inappropriate territory.

Bias, Fairness, and Beyond

Let’s talk about bias. One of the biggest criticisms of LLMs is their tendency to reflect—and sometimes amplify—real-world prejudices. Traditional ML fairness techniques don’t always apply cleanly to LLMs due to their generative and contextual nature. However, evaluation tools such as TruLens and LangSmith are changing that by introducing custom feedback functions and bias-detection modules directly into the evaluation process.

These aren’t just retrospective audits. They are proactive, real-time monitors that assess model responses for sensitive content, stereotyping, or imbalanced behaviour. They empower developers to ask: Is this output fair? Is it consistent across demographic groups?

By making fairness detectable and actionable, LLM frameworks are turning ethics into engineering.

Explainability and Transparency in the Wild

Explainability often gets sidelined in LLMs due to the black-box nature of transformers. But evaluation frameworks introduce a different lens: traceability. Tools like Langfuse, Phoenix, and Opik log every step of the LLM’s chain-of-thought, allowing teams to visualize how an output was generated—from the prompt to retrieval calls and model completions.

This kind of transparency is not just good practice; it’s a governance requirement in many regulatory frameworks. When something goes wrong—say, a medical chatbot gives dangerously wrong advice—being able to reconstruct the interaction becomes essential.

“Transparency is the currency of trust in AI.” Evaluation platforms are minting that currency in real time.

Building Robustness through Testing

How do you make a language model robust? You test it—not just for functionality but for edge cases, injection attacks, and resilience to ambiguous prompts. Frameworks like Promptfoo and DeepEval excel in this space. They allow “red-teaming” scenarios, batch prompt testing, and regression suites that ensure prompts don’t quietly degrade over time.

In a Responsible AI context, robustness means the model behaves predictably—even under stress. A single unpredictable behaviour may be harmless; thousands at scale can become systemic risk. By enabling systematic, repeatable evaluation, LLM frameworks ensure that AI systems do not just work but work reliably.

Bringing Human Feedback into the Loop

Responsible AI isn’t just about models—it’s about people. Frameworks like Opik offer hybrid evaluation pipelines where automated scoring is paired with human annotations. This creates a virtuous cycle where human values help shape the metrics, and those metrics then guide future tuning and development.

This aligns perfectly with a human-centered approach to AI ethics. As datasets, models, and applications evolve, frameworks with human-in-the-loop feedback ensure that evaluation criteria remain aligned with societal norms and expectations.

The Road Ahead: From Testing to Trust

So, are LLM evaluation frameworks the backbone of Responsible AI?

In many ways, yes. They offer the tooling to make abstract ethics real. They monitor, measure, trace, and test—embedding responsibility into the software stack itself.

LLM frameworks are no longer just developer tools—they are ethical infrastructure. They help detect and reduce bias, enforce transparency, build robustness, and enhance explainability. Tools like Opik, Langfuse, and TruLens represent a new generation of AI engineering where responsibility is built-in, not bolted on.

Questions for Further Thought:
- Can we standardize metrics like “fairness” or “bias” across domains, or must every use case be uniquely evaluated?
- Should regulatory compliance (e.g., AI Act or NIST AI RMF) be integrated into LLM evaluation frameworks by default?
- As LLMs evolve, how can we ensure that evaluation frameworks stay ahead of emerging risks—like agentic behaviour or multimodal misinformation?
In the pursuit of Responsible AI, LLM evaluation frameworks are not just useful—they are indispensable.
AI – The Currency between Snake Oil and New Oil

December 30th, 2024

Oil and Algorithms

In 2006, Clive Humby, a British mathematician, and data scientist, famously coined the phrase “data is the new oil” to highlight the immense value of data in the modern world, much like oil has historically been a valuable resource. The advent of Big Data Analytics and machine learning models within the realm of AI has exponentially increased the power of information systems. These advanced algorithms act as “refineries,” extracting value from raw data and serving as the currency of the contemporary world. These refineries are pivotal in the data-driven economy, enabling companies to harness AI effectively. However, as the excitement around AI systems surged, so did skepticism. This led to the question: Are AI systems the new snake oil?

In his book, “AI Snake Oil,” Princeton University’s Professor Arvind Narayanan, co-authored with Sayash Kapoor, addresses several critical issues such as misleading claims, harmful applications, and the big tech control of AI.

Power of Algorithms

Machine learning algorithms, including regression, classification, clustering, neural networks, and deep learning, identify patterns and make predictions based on data. Natural Language Processing (NLP) algorithms enable computers to understand, interpret, and generate human language, facilitating tasks like sentiment analysis and text summarization. Recommendation systems predict user preferences and suggest products, content, or services accordingly. Generative AI (GenAI) creates content such as text, images, music, and videos, with technologies like ChatGPT, DALL-E, and OpenAI’s Sora making a significant impact on daily life and work. Used as a tool, AI Copilots help developers reduce the time between idea and execution despite the need for constant refactoring of generated code and dealing with edge cases missed by AI.

Successful AI Applications and Disappointments

AI has found success in various domains:

– IBM uses predictive AI for customer behavior analysis and supply chain optimization.

– Amazon implements predictive models for demand forecasting and inventory management.

– Google employs predictive analytics for ad targeting and search result optimization.

– Netflix leverages predictive analytics for personalized content recommendations.

– UPS uses predictive models for route optimization and vehicle maintenance.

– American Express deploys predictive analytics for fraud detection and credit scoring.

– H2O.ai’s models at Commonwealth Bank, Australia, assist in fraud detection, customer churn, merchant retention, and more.

However, there have been notable disappointments. AI systems have perpetuated biases, leading to unfair hiring practices, incorrect medical diagnoses, and discriminatory outcomes. These incidents highlight the potential harms of AI when not properly designed, implemented, and used.

Responsible AI

The importance of transparency, accountability, and ethical considerations in AI development and deployment is now widely recognized. Instances of AI blunders, such as Google’s GenAI tool Gemini generating politically correct but historically inaccurate responses, underscore the challenges of training AI on biased data and balancing inclusivity with accuracy.

Governments and institutions are increasingly focused on AI safety. Projects at leading universities sponsored by Governments & big tech companies aim to establish industry-specific guidelines. Some of these guidelines may become regulations, with hefty fines for violations, as seen with the EU AI Act. The debate on AI regulation versus innovation continues, with developers expected to self-regulate in the absence of enforceable laws. Enterprises using AI systems can adopt standards like ISO/IEC 42001:2023 to manage AI responsibly, ensuring ethical considerations, transparency, and safety.

Impacts of Advanced AI and Future Considerations

Innovations in AI algorithms are continually benefiting society. For example:

– Google AI collaborates with the UK’s NHS to improve breast cancer screening consistency and quality.

– AlphaFold2, the 2024 Nobel Prize-winning AI model, has revolutionized protein structure prediction, accelerating drug discovery and biotechnology.

– Google’s DeepMind’s GenCast predicts weather and extreme conditions with unprecedented accuracy.

Generative AI has advanced significantly, with models like OpenAI’s ‘o3’ overcoming traditional limitations and adapting to new tasks. These models have performed well on ARC-AGI (Abstraction and Reasoning Corpus for Artificial General Intelligence) benchmarks, marking progress towards AGI.

As AI advances towards AGI, concerns about rogue AI agents and their potential threats grow. Autonomous Replication and Adaptation (ARA) could lead to AI agents evading shutdown and adapting to new challenges. AI containment strategies are evolving to address these risks.

AI Landscape: Big Techs, Businesses and Us

Big tech companies like Microsoft, Alphabet, Meta, and Amazon are set to invest over $1 trillion in AI in the coming years. McKinsey reports that businesses are dedicating at least 5% of their digital budgets to GenAI and analytical AI. While big tech companies skate fearlessly in the slippery zone between snake oil and the new oil to conquer the AI landscape, businesses appear to tread cautiously, concerned with ROI and responsible AI use. AI safety guidelines and regulations can serve as guardrails for us, the individuals, to navigate the slippery terrain between snake oil and the new oil.
ERP: an enterprising personal product journey!

December 9th, 2024

In this article I trace how ERP evolved from a system for manufacturing and gradually expanded to cover all business functions, the advent of client-server to replace main frames, the shift to cloud computing that made ERP accessible to businesses of all sizes. I trace my experience in the world of ERP starting as an early developer in client- server era of 1990s through its technological evolutions over the web and the cloud. I will also share my thought for the future of ERP in the current AI world.

MRP I & II

MRP (Material Requirements Planning) systems, an early precursor to ERP was developed during 1970s to manage manufacturing processes, especially inventory control and production scheduling. These systems were often large, mainframe-based, batch-oriented programs used by manufacturers to reduce waste and improve production efficiency. MRP evolved into MRP II (Manufacturing Resource Planning) during 1980s with additional functions such as Shop floor control, Capacity planning, and Demand forecasting. These systems were still operated on mainframe computers, requiring significant IT investment.

The Rise of ERP

MRP II expanded into Enterprise Resource Planning (ERP) during 1990s when my quest with ERP began. ERP systems moved beyond manufacturing to incorporate finance, human resources, sales, purchasing, and customer relationship management (CRM). This was the first time businesses could access a single, unified system for all core business processes. ERP was built on client-server architecture, making it more flexible and easier to deploy than its mainframe predecessors.

I was one of the very few who had an opportunity to experiment with this modern technology and struggled with early versions of Microsoft Windows. Even though we developed our own technology to integrate the data among various modules of ERP, Relational database technologies which evolved later helped streamline integration across modules. While SAP and Oracle were the early ERP global vendors, Ramco started its journey ahead of many others to develop an ERP product in India. I remember challenges made by certain IIM educated people on the futility of such efforts in developing a product in India. The then young Vice Chair of the Ramco Group, Mr. P R Venketrama Raja, boldly took up the challenge and proved otherwise. I was lucky to be handpicked by him when he formed his first team for product development in India.

It was none other than Bill Gates who launched, Ramco’s ERP product in 1994 during one of his first visits to India. Microsoft did not have its Navision at that point of time. Eyes of the large corporates in India fell on Ramco not just for its product, but for the organization as Ramco started its lone journey as a product developer in India crowded by IT service players.

ERP Goes Web-Based

The early 2000s saw the evolution of ERP into web-based platforms. This change enabled users to access ERP systems through web browsers, making them more accessible and user-friendly. Ramco was again the first in India to deliver web-based ERP. Products became more modular, allowing companies to implement specific functions without needing to deploy the entire system. This era saw the rise of service-oriented architecture (SOA), which allowed ERP systems to be more flexible, interoperable, and easier to integrate with third-party applications. High upfront costs, complexity, and the need for customizations were still common hurdles for many businesses.

Cloud ERP and Mobility

The 2010s were defined by shift of ERP from on-premises to cloud-hosted models. Companies could now access ERP solutions as a service (SaaS) through subscription-based models, reducing capital expenditure on IT infrastructure. Ramco announced its first version of ERP on the cloud in 2008. As the usage of ERP has become broad based, compliance requirements became mandatory due to computer generated reports becoming a norm in enterprises. My team, as a QA partner for product developer had terrific opportunity to test the product developed on cloud platform with enhanced functionality and compliance requirements.

ERP systems became more user-centric with intuitive interfaces, and personalized dashboards. The rise of mobile devices allowed ERP users to access data and perform tasks on-the-go via mobile apps. Cloud ERP provided scalability, easier updates, lower upfront costs, and remote access, thereby making ERP solutions more affordable and practical for small and medium-sized businesses (SMBs). Data security, compliance, and control were initial concerns as businesses shifted critical data to the cloud which were taken care of thanks to specialised large data centres with built in high tech cyber security controls.

Amitysoft, the company promoted by me became business partner of Ramco, thanks to the Chairperson who saw my evolution along with Ramco’s product and technology. Knowledge of tech behind the product, hands behind testing the product enabled my team to implement the product for several customers in India and abroad. As a partner, we were the first to deploy highest number of ERPs on the cloud in India. Amitysoft has largest number of successful partner implementations to its credit with several customer accolades.

AI Driven ERP: The Current & Future

The current era of ERP is marked by the integration of AI, machine learning, IoT, and analytics to create intelligent ERP systems. At Ramco, I had free hands to explore ‘Expert Systems,’ now known as Good Old-Fashioned AI (GOFAI), based approach for a Mine Planning ERP system in late 1990s. This was probably one of the first exploitation of AI concept in an ERP. In industries like manufacturing, logistics, and healthcare, IoT devices are integrated with ERP to monitor equipment including cobots in real time, manage assets, and optimize supply chains. AI based conversational bots are changing the UX to natural language – text and voice interactions.

ERP systems are evolving towards becoming autonomous, where they will self-optimize based on real-time data, predict potential issues, and automatically adjust processes. More advanced AI capabilities will allow ERP systems to make autonomous decisions regarding supply chain adjustments, financial planning, and workforce optimization. Features for supporting sustainability from R & D to sourcing materials, inventory, manufacturing, and post sales will become a standard functionality cutting across all modules in ERP. Products will increasingly use blockchain for enhancing supply chain transparency, ensuring data integrity, and improving transaction security. Future ERP systems, as I foresee will be self-configurable, self-customizable to the context, and will adapt functionalities dynamically as the business goals and market change.
Demystifying AI/ML algorithms – Part IV: Neural Networks aka Brain Works

December 7th, 2024
About the series

You had to wait till this fourth part of my series for discussions on Neural Networks, even though Neural Networks were the first ones to come into the realm of ML/AI and enjoying leadership position now. I would personally refer to these algorithms based on Neural Networks as ‘Brain Works.’

You can read my earlier parts of this series:

‘Seen it before’ or Supervised algorithms were the subject of discussions in the second part (https://ai-positive.com/2024/10/20/demystifying-ai-ml-algorithms-part-ii-supervised-algorithms-2/). The series started with my treatment of Good-Old-Fashioned-AI that gave a real start to practical use of AI (https://ai-positive.com/2024/08/28/understanding-gofai-rules-rule-and-symbolic-reasoning-in-ai/).

Neural Network’s Nobel Journey

Perceptron was one of the earliest incarnations of neural network models, developed by Frank Rosenblatt in 1958. Almost every decade starting from 1960s had newer developments in Neural Networks – Adaline in 1960, Backpropagation in 1974, Recurrent and Convolutional Neural Networks in 1980s, Long Short-Term Memory in 1997 followed by Generative Adversarial Networks in 2014, Diffusion Models in 2015 and Transformer in 2017, which transformed the AI scene into Generative AI, making Neural Networks the darling of today’s AI scene.

To top it all, the 2024 Nobel Prizes in Physics and Chemistry both have fascinating connections to neural networks. John Hopfield and Geoffrey Hinton were awarded the Nobel Prize in Physics, recognizing Hopfield network invented by John Hopfield and Boltzmann machine developed by Geoffrey Hinton. David Baker, Demis Hassabis, and John Jumper received the Nobel Prize in Chemistry for their contributions to computational protein design and protein structure prediction. Hassabis and Jumper developed AlphaFold2, an AI model that predicts protein structures with remarkable accuracy. Nobel Prizes added noble stature to Neural Networks.

How does Neural Network Work?

Following points detail the structure and working of Neural Network:
1. Neurons (Nodes): Similar to biological neuron, basic unit of a Neural Network is a node appropriately named as a neuron. They are organized into layers.
2. Layers: Starting with an input layer, followed by one or more hidden layers, and an output layer form the Neural Network. Each layer contains multiple neurons. Input layer receives the input data. Hidden layers process the data through a series of transformations and output layer produces final output or prediction.
3. Input Data: When data enters the input layer, each feature is assigned to a neuron.
1. Weights and Biases: Connection between neurons has a weight associated with it that determines the strength of the connection. Neurons also have a bias value that adjusts the output along with the weighted sum.
2. Activation Function: Each neuron has an activation function which is applied to the weighted sum of its inputs plus the bias.
3. Forward Propagation: The input data passes through the layers of the network, with each neuron computing its output based on activation function and passing it to the neurons in the next layer.
4. Output: The last layer produces the output of the Neural Network.
Brain works

The reason I refer to Neural Networks as ‘Brain works’ is that Artificial Neural Networks (ANN) are inspired by the structure and workings of the brain of living beings as explained below:
1. Neurons and Nodes: In the brain, neurons are the fundamental units that process and transmit information. Similarly, in ANNs, nodes referred to as artificial neurons serve as the basic units of computation.
2. Synapses and Weights: Neurons in the brain are connected by synapses, which facilitate the transfer of information through neurotransmitters. In ANNs, connections between nodes are represented by weights, the strength of which influence the connection.
3. Layers: The brain is organized in a manner, with different areas responsible for distinct types of processing. Likewise, ANNs have layers where each layer performs specific computations.
4. Activation Function: In the brain, a neuron fires when it reaches a certain threshold of excitation. In ANNs, an activation function determines if a node should produce an output or not, simulating this firing mechanism.
Assembly Line Analogy

Before discussing aspects related to training of Neural Networks, let us look at assembly in a manufacturing unit as rough analogy to understand the concept behind how Neural Network works.
1. Input Layer (Starting Point):
  - Beginning of the assembly line is where raw materials / components (inputs) are introduced. This is like where data enters the Neural Network. In case of car manufacturing, the raw materials such as steel fabrications, tyres, engines may enter the assembly line.
2. Hidden Layers (Stations):
  - Each station on the assembly line represents a hidden layer in the neural network. At each station, workers (neurons) take the incoming materials (data), process them, and pass them on to the next station. (In case of building a car, the first station could frame the body of the car. The second station may install engine, while the third station could add wheels, and the next station may paint).
  - Weights (Tools and Techniques): The tools and techniques used by workers to process the materials represent the weights. They are like the influence each neuron carries.
  - Biases (Adjustments): Just like adjustments made by workers to ensure the specifications of the product, biases adjust the processing to improve accuracy.
  - Activation Function (Quality Check): Each station has a quality check mechanism (activation function) to decide if the processed material should move forward in the assembly line.
Due to highly automated nature of car manufacturing, there may be fewer workers in each station and automation has taken up the role of workers using the tools & applying techniques at each station. Automated process controls handle the movement from one station to the next according to the product specification and quality requirements.
1. Output Layer (End of the Line):
  - The final station on the assembly line is where the fully processed product comes out. The final, complete car rolls off the line, ready to be driven or test driven. This is the output layer where the final prediction or result of Neural Network is produced.
Training the Neural Network

“Cells that fire together, wire together,” is the core idea of how brain learns by adjusting the strength of synapses. When a neuron in brain repeatedly activates another neuron, the synaptic connection between them becomes stronger. Such repeated stimulation of a synapse leads to a long-lasting increase in its strength. Experiences, learning, and memory formation shape neural circuits in the brain. Adopting this idea, Neural Networks are made to learn through training using large data sets representing the context of the problem. Training involves the following steps:

1. Data Preparation:

Gather a dataset relevant to the problem to be solved. Clean the data by removing noise, handling missing values, and normalizing it to a suitable range.

2. Network Initialization:

Choose the type of neural network (see below for popular types of Neural Networks) and define its characteristics such as number of layers, types of layers, number of neurons in each layer. Initialize with random weights for the connections between neurons.

3. Forward Propagation to produce output:

Pass a batch of input data to the first layer. At each layer, compute the output by applying the activation functions to the weighted sum of inputs. Produce the final output of the network.

4. Improving Output:

Compare the network’s output with the actual target values using a loss function. Calculate the loss, which quantifies how far the outputs are from the actual values. Update the weights from the last layer to the first through Back Propagation. Technique called Gradient Descent is used for calculating the loss and updating the weights to minimize the loss.

5. Iteration:

Iterate through forward propagation, loss calculation, and backpropagation until the network’s performance improves. A complete pass through the dataset is referred to as an epoch. Usually, the data is divided into batches, and the weights are updated after processing each batch through multiple epochs.

6. Evaluation:

Evaluate the network on a separate validation data set to check for its performance. Adjust parameters such as number of neurons, number of layers, number of epochs/ data batches etc., and retrain if necessary to improve performance. These parameters are called Hyper Parameters.

Deep learning is the term used for neural networks with many layers to model and understand complex patterns in large datasets.

Popular Neural Networks:

1. Feedforward Neural Networks (FNN): The simplest type of artificial neural network where the information moves in one direction—from input nodes, through hidden nodes to output nodes. They are widely used for pattern recognition.

2. Convolutional Neural Networks (CNN): Primarily used for image and video recognition tasks. They are designed to learn spatial hierarchies of features automatically and adaptively from input images. This works like magnifying glass used to look at small parts of an image to recognize prominent parts. Feature maps of such prominent parts aid in deciding what the image is.

3. Recurrent Neural Networks (RNN): Suitable for sequential data or time series prediction. They have connections that form directed cycles, allowing them to maintain a ‘memory’ of previous inputs. Even in the world of Transformer Networks (see below), RNNs are more effective for applications where data arrives in a continuous stream and decisions need to be made on-the-fly such as Real-Time Speech Recognition and Stock Price prediction.

4. Long Short-Term Memory Networks (LSTM): A type of RNN that can learn long-term dependencies. They are particularly effective for tasks where the context of previous data points is important, such as language modelling. While Transformer Network (see below) does this better, LSTM is preferred for smaller datasets or simpler tasks, as it is easier to implement, and train compared to transformers.

5. Generative Adversarial Networks (GANs): Consist of two neural networks, a generator and a discriminator, which compete against each other. GAN works like iterative constructive criticism of a critic against a creator’s work to improve the creator’s output to make it more realistic. They are used for generating synthetic instances of data, restoring damaged photos by filling up missing parts and for predicting future frames in videos as required in autonomous driving.

6. Transformer Networks: They use mechanisms called attention to weigh the influence of various parts of the input data. Based on the famous research paper ‘Attention Is All You Need,’ from Google researchers, Transformer is the major component of today’s Generative AI – GenAI.

Brain Works or Selfies or Seen-It-Before?

While it is true that many problems solved by traditional statistical machine learning (ML) algorithms such as Selfies and Seen-It-Before, discussed in the previous two parts can also be tackled by neural networks, following are some nuances to consider:

1. Flexibility and Power: Neural networks, especially deep learning models, are highly flexible and powerful. They can model complex, non-linear relationships in data, making them suitable for a wide range of tasks, from image recognition to natural language processing.

2. Data Requirements: Neural networks typically require enormous amounts of data to perform well. Traditional statistical ML algorithms, like linear regression or decision trees is good enough for smaller datasets.

3. Interpretability: Traditional ML algorithms are more interpretable while Neural networks are often considered “black boxes” due to their complexity.

4. Computational Resources: Neural networks, especially deep learning models, require significant computational resources for training. Traditional ML algorithms are usually less resource-intensive and can be cost effective for certain applications.

5. Specific Use Cases: Some problems are better suited to traditional ML algorithms due to their simplicity and efficiency. For example, logistic regression is often used for binary classification tasks, and k-means clustering remains a popular choice for unsupervised learning tasks.

I hope the four parts would have provided a conceptual understanding of essential ML/AI algorithms. We will review how to make an optimum choice of ML/AI algorithms in a later part of this series.