Deep Dives - AI News

The firm that never forgets: Rowspace launches with $50M to make AI for private equity actually work

Dashveenjit Kaur — Fri, 06 Mar 2026 10:00:00 +0000

Private equity runs on judgment–and judgment, it turns out, is extraordinarily hard to scale. Decades of deal memos, underwriting models, partner notes, and portfolio data are scattered across systems that were never designed to communicate with each other.

Every time a new deal crosses a firm’s desk, analysts start from scratch, even when the answers to their most pressing questions are buried somewhere in the firm’s own history.

That is the problem Rowspace was built to solve, and it’s why the San Francisco startup is emerging from stealth with US$50 million in funding and a bold pitch: AI for private equity that doesn’t just assist decision-making, but actually learns how a firm thinks.

The company launched publicly with a seed round led by Sequoia and a Series A co-led by Sequoia and Emergence Capital, with participation from Stripe, Conviction, Basis Set, Twine, and a group of finance-focused angel investors.

Early customers–unnamed, but described as name-brand private equity and credit firms managing hundreds of billions to nearly a trillion dollars in assets–are already living on the platform, with about ten top firms on seven-figure annual contract values.

Two MIT graduates, one stubborn problem

Rowspace was founded by Michael Manapat and Yibo Ling, who met as graduate students at MIT before diverging into very different careers. Manapat went on to build the machine learning systems at Stripe that process billions of transactions, then helped drive Notion’s expansion into AI as its CTO.

Ling took the finance route–a two-time CFO who led finance teams at Uber and Binance, and spent years making investment decisions by manually synthesising data across fragmented systems. When ChatGPT launched in late 2022, Ling tested it on due diligence tasks and ran straight into the same wall.

“Clearly there was a lot of promise, but it just wasn’t working,” he told Fortune. “You need the right information in the right context.” That gap — between AI’s potential and the messy, proprietary, institution-specific data reality of finance—became the founding thesis.

Ling, Co-founder and COO, put it plainly: “Most tech tools aren’t comprehensive or nuanced enough for finance. And most finance tools need to raise their technical ceiling. We intend to do both.”

The asset management firms we talk to say the same thing: they know the data they've accrued over time holds hugely valuable patterns and judgment.

Rowspace is the platform that helps them scale it. pic.twitter.com/pDXPD62rLM
— Rowspace (@rowspace_ai) February 26, 2026

What AI for private equity actually looks like

Rowspace’s platform connects structured and unstructured data across a firm’s entire history–document repositories, investment and accounting systems, old PowerPoints, deal memos–and applies what Manapat calls a finance-native lens: one that reflects how a firm actually reconciles information, interprets discrepancies, and makes decisions. Crucially, it processes all of this inside a client’s own cloud environment. The firm’s data never leaves its control.

The result is accessible through Rowspace’s own interface, within tools like Excel and Microsoft Teams, or directly into a firm’s existing data infrastructure. A first-year analyst reviewing a new deal can surface decades of prior decisions, comparable transactions, and internal underwriting patterns without picking up the phone or hunting through shared drives.

“Finance is full of high-stakes decisions. There used to be a tradeoff between moving quickly and making fully informed, nuanced decisions using all the possible data at a firm’s disposal. Our AI platform eliminates that tradeoff,” said Michael Manapat, Co-founder and CEO of Rowspace. “We’re building specialised intelligence that turns a firm’s data into scalable judgment with the rigour finance demands.”

The ambition is captured in a line Manapat uses internally: “Imagine a firm that never forgets. Where an experienced investor’s workflows–touching many different tools in specific ways–can be codified and multiplied. When that’s possible, a first-year analyst can tap into decades of institutional knowledge, and judgment scales with a firm instead of being diluted.”

Why Sequoia and Emergence are betting on vertical AI

The investor conviction behind this raise is itself a signal worth reading. Alfred Lin, the Sequoia partner who led the investment, positioned Rowspace as a direct answer to the question of what AI applications will survive the rise of increasingly capable foundation models.

“Michael built the machine learning systems at Stripe that process billions of transactions and helped drive Notion’s expansion into AI. Yibo has been a finance leader and investor who’s wrestled with the exact challenges Rowspace is solving,” Lin said, adding that both Michael and Yibo have seen the problem from both sides, pairing technical depth with firsthand understanding of what customers actually need.

Jake Saper, General Partner at Emergence Capital, went further on the data infrastructure thesis: “They’re doing the previously impossible work of connecting proprietary data, and reconciling and reasoning over it with real rigour. Without this foundation, it doesn’t matter what other AI tools you’re using.”

The argument is a neat inversion of the fear gripping much of the software industry right now: that foundation models will eventually commoditise applications. Lin’s view is the opposite–that vertical AI systems built on deep, proprietary data layers are precisely where durable competitive advantage will compound.

For AI for private equity specifically, where alpha is by definition firm-specific and non-replicable, that logic is particularly hard to argue with. The back office of investment management has quietly been one of the last frontiers general AI has struggled to crack. Rowspace just raised $50 million on the premise that it knows why–and what to do about it.

(Photo by Rowspace)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here

The post The firm that never forgets: Rowspace launches with $50M to make AI for private equity actually work appeared first on AI News.

Beyond the pilot: Dyna.Ai raises eight-figure Series A to put agentic AI in financial services to work

Dashveenjit Kaur — Thu, 05 Mar 2026 08:00:00 +0000

The financial services industry has a pilot problem. Institutions pour resources into AI proofs-of-concept, generate impressive dashboards, and then quietly watch momentum stall before anything reaches production. Singapore-headquartered Dyna.Ai was built precisely to break that pattern–and investors are now backing that thesis with serious capital.

The AI-as-a-Service company has closed an eight-figure Series A round led by Lion X Ventures, a Singapore-based venture capital fund advised by OCBC Bank’s Mezzanine Capital Unit, with participation from ADATA, a Taiwan-listed technology company, a Korean financial institution, and a group of finance industry veterans.

The funding will accelerate deployment of what Dyna.Ai calls its agentic AI in the financial services platform–a platform already live across banks and financial institutions in Asia, the Americas, and the Middle East

Execution over experimentation

What sets Dyna.Ai apart from the broader wave of enterprise AI startups is its deliberate narrowness. Founded in 2024, the company positioned itself not as a general-purpose AI platform but as an execution-focused operator inside regulated environments–places where compliance, auditability, and governance are not optional extras but baseline requirements.

Its platform combines domain-specific expertise, AI agent builders, task-ready agents, and fully operational agentic applications capable of running within defined workflows. The pitch, framed under a “Results-as-a-Service” model, is that enterprises don’t need more experimentation–they need AI that works within the constraints of their industry and produces measurable outcomes from day one.

“While much of the industry was focused on how broadly AI could be applied, we doubled down early on a specific, pressing problem and built it with outcomes in mind,” said chairman and co-founder of Dyna.Ai Tomas Skoumal.

Why investors are betting on this moment

The timing of this raise is significant. Across the region, the conversation around AI in enterprise has shifted–from whether to adopt it, to how to make it stick. Irene Guo, CEO of Lion X Ventures, captured the mood among investors clearly.

“Enterprise AI is entering a phase where execution and measurable outcomes matter more than experimentation. Dyna.Ai differentiates itself through strong domain expertise, operational discipline, and the ability to deploy agentic AI within complex, regulated enterprise environments,” Guo noted.

That regulatory dimension is where the real friction lies for most institutions. Agentic AI–systems capable of autonomous decision-making and task execution within defined parameters–carries a different risk profile than a standard AI model generating recommendations.

In banking and insurance, especially, those agents need to trigger workflows, update records, and handle documentation with full accountability trails. Getting that right requires more than good models; it requires governance architecture built into the product from the ground up.

Cynthia Siantar, Dyna.Ai’s Head of Investor Relations and General Manager for Singapore and Hong Kong, pointed to a clear shift in how enterprise buyers in the region are approaching this: “The focus has moved past pilots and experimentation to how AI can be deployed in day-to-day operations and deliver real outcomes.”

A market that’s ready

The macroeconomic backdrop supports the appetite. Southeast Asia’s AI market is projected to exceed US$16 billion by 2033, and the financial services sector–long constrained by legacy infrastructure and regulatory caution–is increasingly seen as one of the highest-value targets for agentic AI in financial services deployment.

The investor syndicate around this raise is itself telling. The involvement of a Korean financial institution alongside OCBC-advised capital and a Taiwan-listed tech company signals cross-border appetite that spans both the buy-side and the infrastructure side of the equation.

For the broader industry, Dyna.Ai’s Series A is a data point in a larger pattern: the era of AI pilots has a shrinking shelf life. Enterprises that cannot move from proof-of-concept to production–within the compliance frameworks their regulators demand–will increasingly look to specialists who can.

The pilots had their moment. Now comes the hard part.

(Photo by Dyna.Ai)

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here

The post Beyond the pilot: Dyna.Ai raises eight-figure Series A to put agentic AI in financial services to work appeared first on AI News.

How financial institutions are embedding AI decision-making

Ryan Daws — Wed, 18 Feb 2026 15:02:14 +0000

For leaders in the financial sector, the experimental phase of generative AI has concluded and the focus for 2026 is operational integration.

While early adoption centred on content generation and efficiency in isolated workflows, the current requirement is to industrialise these capabilities. The objective is to create systems where AI agents do not merely assist human operators, but actively run processes within strict governance frameworks.

This transition presents specific architectural and cultural challenges. It requires a move from disparate tools to joined-up systems that manage data signals, decision logic, and execution layers simultaneously.

Financial institutions integrate agentic AI workflows

The primary bottleneck in scaling AI within financial services is no longer the availability of models or creative application, it is coordination. Marketing and customer experience teams often struggle to convert decisions into action due to friction between legacy systems, compliance approvals, and data silos.

Saachin Bhatt, Co-Founder and COO at Brdge, notes the distinction between current tools and future requirements: “An assistant helps you write faster. A copilot helps teams move faster. Agents run processes.”

For enterprise architects, this means building what Bhatt terms a ‘Moments Engine’. This operating model functions through five distinct stages:

Signals: Detecting real-time events in the customer journey.

Decisions: Determining the appropriate algorithmic response.

Message: Generating communication aligned with brand parameters.

Routing: Automated triage to determine if human approval is required.

Action and learning: Deployment and feedback loop integration.

Most organisations possess components of this architecture but lack the integration to make it function as a unified system. The technical goal is to reduce the friction that slows down customer interactions. This involves creating pipelines where data flows seamlessly from signal detection to execution, minimising latency while maintaining security.

Governance as infrastructure

In high-stakes environments like banking and insurance, speed cannot come at the cost of control. Trust remains the primary commercial asset. Consequently, governance must be treated as a technical feature rather than a bureaucratic hurdle.

The integration of AI into financial decision-making requires “guardrails” that are hard-coded into the system. This ensures that while AI agents can execute tasks autonomously, they operate within pre-defined risk parameters.

Farhad Divecha, Group CEO at Accuracast, suggests that creative optimisation must become a continuous loop where data-led insights feed innovation. However, this loop requires rigorous quality assurance workflows to ensure output never compromises brand integrity.

For technical teams, this implies a shift in how compliance is handled. Rather than a final check, regulatory requirements must be embedded into the prompt engineering and model fine-tuning stages.

“Legitimate interest is interesting, but it’s also where a lot of companies could trip up,” observes Jonathan Bowyer, former Marketing Director at Lloyds Banking Group. He argues that regulations like Consumer Duty help by forcing an outcome-based approach.

Technical leaders must work with risk teams to ensure AI-driven activity attests to brand values. This includes transparency protocols. Customers should know when they are interacting with an AI, and systems must provide a clear escalation path to human operators.

Data architecture for restraint

A common failure mode in personalisation engines is over-engagement. The technical capability to message a customer exists, but the logic to determine restraint is often missing. Effective personalisation relies on anticipation (i.e. knowing when to remain silent is as important as knowing when to speak.)

Jonathan Bowyer points out that personalisation has moved to anticipation. “Customers now expect brands to know when not to speak to them as opposed to when to speak to them.”

This requires a data architecture capable of cross-referencing customer context across multiple channels – including branches, apps, and contact centres – in real-time. If a customer is in financial distress, a marketing algorithm pushing a loan product creates a disconnect that erodes trust. The system must be capable of detecting negative signals and suppressing standard promotional workflows.

“The thing that kills trust is when you go to one channel and then move to another and have to answer the same questions all over again,” says Bowyer. Solving this requires unifying data stores so that the “memory” of the institution is accessible to every agent (whether digital or human) at the point of interaction.

The rise of generative search and SEO

In the age of AI, the discovery layer for financial products is changing. Traditional search engine optimisation (SEO) focused on driving traffic to owned properties. The emergence of AI-generated answers means that brand visibility now occurs off-site, within the interface of an LLM or AI search tool.

“Digital PR and off-site SEO is returning to focus because generative AI answers are not confined to content pulled directly from a company’s website,” notes Divecha.

For CIOs and CDOs, this changes how information is structured and published. Technical SEO must evolve to ensure that the data fed into large language models is accurate and compliant.

Organisations that can confidently distribute high-quality information across the wider ecosystem gain reach without sacrificing control. This area, often termed ‘Generative Engine Optimisation’ (GEO), requires a technical strategy to ensure the brand is recommended and cited correctly by third-party AI agents.

Structured agility

There is a misconception that agility equates to a lack of structure. In regulated industries, the opposite is true.

Agile methodologies require strict frameworks to function safely. Ingrid Sierra, Brand and Marketing Director at Zego, explains: “There’s often confusion between agility and chaos. Calling something ‘agile’ doesn’t make it okay for everything to be improvised and unstructured.”

For technical leadership, this means systemising predictable work to create capacity for experimentation. It involves creating safe sandboxes where teams can test new AI agents or data models without risking production stability.

Agility starts with mindset, requiring staff who are willing to experiment. However, this experimentation must be deliberate. It requires collaboration between technical, marketing, and legal teams from the outset.

This “compliance-by-design” approach allows for faster iteration because the parameters of safety are established before the code is written.

What’s next for AI in the financial sector?

Looking further ahead, the financial ecosystem will likely see direct interaction between AI agents acting on behalf of consumers and agents acting for institutions.

Melanie Lazarus, Ecosystem Engagement Director at Open Banking, warns: “We are entering a world where AI agents interact with each other, and that changes the foundations of consent, authentication, and authorisation.”

Tech leaders must begin architecting frameworks that protect customers in this agent-to-agent reality. This involves new protocols for identity verification and API security to ensure that an automated financial advisor acting for a client can securely interact with a bank’s infrastructure.

The mandate for 2026 is to turn the potential of AI into a reliable P&L driver. This requires a focus on infrastructure over hype and leaders must prioritise:

Unifying data streams: Ensure signals from all channels feed into a central decision engine to enable context-aware actions.

Hard-coding governance: Embed compliance rules into the AI workflow to allow for safe automation.

Agentic orchestration: Move beyond chatbots to agents that can execute end-to-end processes.

Generative optimisation: Structure public data to be readable and prioritised by external AI search engines.

Success will depend on how well these technical elements are integrated with human oversight. The winning organisations will be those that use AI automation to enhance, rather than replace, the judgment that is especially required in sectors like financial services.

A handbook from Accuracast for CMOs is available here (registration required)

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post How financial institutions are embedding AI decision-making appeared first on AI News.

Infosys AI implementation framework offers business leaders guidance

AI News — Wed, 18 Feb 2026 11:08:00 +0000

As a large provider of technology services operating in multiple industries, Infosys is one of the names that quickly come to mind when decision-makers consider possible providers of consultation on and practical implementation of any AI project – discrete or organisation-wide. Infosys delivers these services through its Topaz Fabric, leveraging its partnerships with specific AI technology providers.

It reports that it is currently working on AI implementations with 90% of its top 200 clients and has more than 4,600 AI projects in progress. The company’s strategy for AI implementation organisation-wide looks at six areas affected and considered during projects.

AI strategy and engineering focuses on designing and implementing AI strategies and architectures aligned to specific business objectives. These include the orchestration of AI agents, proprietary platforms, and third-party tools on infrastructure especially configured for AI workloads. An overarching strategy will lead to a consistent, enterprise AI-first operating model.

Data for AI addresses the preparation of enterprise data, covering structured and unstructured data and processes in this area include the development of AI-ready data platforms. Infosys refers to “AI-grade” data engineering practices such as data fingerprinting and synthetic training data services. The intention is to convert siloed data assets into reliable inputs for analytics and predictive systems.

Process AI concentrates on integrating AI agents into business processes, redesigning workflows if necessary so AI agents and human employees can work better together. The aim is to improve operational efficiency in general, regardless of business function.

Legacy modernisation applies AI agents in the analysis and interpretation of the existing technology stack and potentially reverse-engineering legacy systems to better stage AI modernisation projects. The overall aim is to reduce technical debt and offer a greater responsiveness when AI is unleashed.

Physical AI extends into products and devices in the workplace. This involves embedding AI into hardware systems such as those that collect sensor data, interpret that data, and act in the physical world. This broad definition encompasses digital twins, robotics, autonomous systems, and edge computing. In short, it’s the integration of digital intelligence and physical operations.

AI trust covers governance, security, and ethics, and includes consideration of risk assessment frameworks, policy development, AI testing, and overall technology lifecycle management.

Lessons for business leaders

Although business leaders may be already in partnership with alternative service providers other than Infosys, the company’s strategy of demarcating the necessary action areas for AI implementations offers significant value. The six areas described provide practical reference points that can be used in any organisation to plan projects or perhaps monitor and assess ongoing implementation efforts.

Among these, data preparation is central. AI systems depend on data quality and consistency, so investment in data platforms, data governance, and engineering practices that support models is central tenet on which AI initiatives are built.

Embedding AI into workflows means it’s sometimes necessary to redesign the way employees work. Leaders should be aware of how AI agents and employees interact, and measure performance improvements. Changes can be made both to the technologies deployed and the working methods that have existed to date. If the latter, retraining and educating affected employees will be necessary, with accompanying costs.

The issue of legacy systems requires careful attention as many organisations operate complex estates that limit the agility necessary for AI to improve operations. AI tools themselves can help to analyse existing dependencies and even plan modernisation, implemented, ideally, over several stages or in separate sprints.

Physical operations intersect increasingly with digital systems. For companies with physical products, such as in manufacturing or logistics, embedding AI into devices and equipment can improve monitoring and devices’ responsiveness. This will require coordination between IT, OT, engineering, and operational teams, and line-of-business leaders should be consulted in particular.

Governance should accompany any scale of AI implementation. Risk assessment, security testing, security policy formulation, and the design of AI-specific guardrails should be established early on. Regulatory scrutiny of AI is increasing, particularly in sectors handling sensitive data, and statutory penalties apply for data loss or mismanagement, regardless of its source – AI or otherwise – in the enterprise. Clear accountability structures and documentation reduce these risks to operations and reputation.

Taken together, these areas indicate that AI implementation is organisational rather than purely technical. Success depends on leadership alignment, sustained investment, and realistic assessment of any capability gaps. Claims of rapid transformation should be treated cautiously, and durable results are more likely when strategy, data, process design, modernisation, operational integration, and governance are addressed in parallel.

(Image source: “Infosys, Bangalore, India” by theqspeaks is licensed under CC BY-NC-SA 2.0.)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Infosys AI implementation framework offers business leaders guidance appeared first on AI News.

Exclusive: Why are Chinese AI models dominating open-source as Western labs step back?

Dashveenjit Kaur — Mon, 09 Feb 2026 11:00:00 +0000

Because Western AI labs won’t—or can’t—anymore. As OpenAI, Anthropic, and Google face mounting pressure to restrict their most powerful models, Chinese developers have filled the open-source void with AI explicitly built for what operators need: powerful models that run on commodity hardware.

A new security study reveals just how thoroughly Chinese AI has captured this space. Research published by SentinelOne and Censys, mapping 175,000 exposed AI hosts across 130 countries over 293 days, shows Alibaba’s Qwen2 consistently ranking second only to Meta’s Llama in global deployment. More tellingly, the Chinese model appears on 52% of systems running multiple AI models—suggesting it’s become the de facto alternative to Llama.

“Over the next 12–18 months, we expect Chinese-origin model families to play an increasingly central role in the open-source LLM ecosystem, particularly as Western frontier labs slow or constrain open-weight releases,” Gabriel Bernadett-Shapiro, distinguished AI research scientist at SentinelOne, told TechForge Media’s AI News.

The finding arrives as OpenAI, Anthropic, and Google face regulatory scrutiny, safety review overhead, and commercial incentives pushing them toward API-gated releases rather than publishing model weights freely. The contrast with Chinese developers couldn’t be sharper.

Chinese labs have demonstrated what Bernadett-Shapiro calls “a willingness to publish large, high-quality weights that are explicitly optimised for local deployment, quantisation, and commodity hardware.”

“In practice, this makes them easier to adopt, easier to run, and easier to integrate into edge and residential environments,” he added.

Put simply: if you’re a researcher or developer wanting to run powerful AI on your own computer without a massive budget, Chinese models like Qwen2 are often your best—or only—option.

Pragmatics, not ideology

Alibaba’s Qwen2 consistently ranks second only to Meta’s Llama across 175,000 exposed hosts globally. Source: SentinelOne/Censys

The research shows this dominance isn’t accidental. Qwen2 maintains what Bernadett-Shapiro calls “zero rank volatility”—it holds the number two position across every measurement method the researchers examined: total observations, unique hosts, and host-days. There’s no fluctuation, no regional variation, just consistent global adoption.

The co-deployment pattern is equally revealing. When operators run multiple AI models on the same system—a common practice for comparison or workload segmentation—the pairing of Llama and Qwen2 appears on 40,694 hosts, representing 52% of all multi-family deployments.

Geographic concentration reinforces the picture. In China, Beijing alone accounts for 30% of exposed hosts, with Shanghai and Guangdong adding another 21% combined. In the United States, Virginia—reflecting AWS infrastructure density—represents 18% of hosts.

China and the US dominate exposed Ollama host distribution, with Beijing accounting for 30% of Chinese deployments. Source: SentinelOne/Censys

“If release velocity, openness, and hardware portability continue to diverge between regions, Chinese model lineages are likely to become the default for open deployments, not because of ideology, but because of availability and pragmatics,” Bernadett-Shapiro explained.

The governance problem

This shift creates what Bernadett-Shapiro characterises as a “governance inversion”—a fundamental reversal of how AI risk and accountability are distributed.

In platform-hosted services like ChatGPT, one company controls everything: the infrastructure, monitors usage, implements safety controls, and can shut down abuse. With open-weight models, the control evaporates. Accountability diffuses across thousands of networks in 130 countries, while dependency concentrates upstream in a handful of model suppliers—increasingly Chinese ones.

The 175,000 exposed hosts operate entirely outside the control systems governing commercial AI platforms. There’s no centralised authentication, no rate limiting, no abuse detection, and critically, no kill switch if misuse is detected.

“Once an open-weight model is released, it is trivial to remove safety or security training,” Bernadett-Shapiro noted.”Frontier labs need to treat open-weight releases as long-lived infrastructure artefacts.”

A persistent backbone of 23,000 hosts showing 87% average uptime drives the majority of activity. These aren’t hobbyist experiments—they’re operational systems providing ongoing utility, often running multiple models simultaneously.

Perhaps most concerning: between 16% and 19% of the infrastructure couldn’t be attributed to any identifiable owner.”Even if we are able to prove that a model was leveraged in an attack, there are not well-established abuse reporting routes,” Bernadett-Shapiro said.

Security without guardrails

Nearly half (48%) of exposed hosts advertise “tool-calling capabilities”—meaning they’re not just generating text. They can execute code, access APIs, and interact with external systems autonomously.

“A text-only model can generate harmful content, but a tool-calling model can act,” Bernadett-Shapiro explained. “On an unauthenticated server, an attacker doesn’t need malware or credentials; they just need a prompt.”

Nearly half of exposed Ollama hosts have tool-calling capabilities that can execute code and access external systems. Source: SentinelOne/Censys

The highest-risk scenario involves what he calls “exposed, tool-enabled RAG or automation endpoints being driven remotely as an execution layer.” An attacker could simply ask the model to summarise internal documents, extract API keys from code repositories, or call downstream services the model is configured to access.

When paired with “thinking” models optimised for multi-step reasoning—present on 26% of hosts—the system can plan complex operations autonomously. The researchers identified at least 201 hosts running “uncensored” configurations that explicitly remove safety guardrails, though Bernadett-Shapiro notes this represents a lower bound.

In other words, these aren’t just chatbots—they’re AI systems that can take action, and half of them have no password protection.

What frontier labs should do

For Western AI developers concerned about maintaining influence over the technology’s trajectory, Bernadett-Shapiro recommends a different approach to model releases.

“Frontier labs can’t control deployment, but they can shape the risks that they release into the world,” he said. That includes “investing in post-release monitoring of ecosystem-level adoption and misuse patterns” rather than treating releases as one-off research outputs.

The current governance model assumes centralised deployment with diffuse upstream supply—the exact opposite of what’s actually happening. “When a small number of lineages dominate what’s runnable on commodity hardware, upstream decisions get amplified everywhere,” he explained. “Governance strategies must acknowledge that inversion.”

But acknowledgement requires visibility. Currently, most labs releasing open-weight models have no systematic way to track how they’re being used, where they’re deployed, or whether safety training remains intact after quantisation and fine-tuning.

The 12-18 month outlook

Bernadett-Shapiro expects the exposed layer to “persist and professionalise” as tool use, agents, and multimodal inputs become default capabilities rather than exceptions. The transient edge will keep churning as hobbyists experiment, but the backbone will grow more stable, more capable, and handle more sensitive data.

Enforcement will remain uneven because residential and small VPS deployments don’t map to existing governance controls. “This isn’t a misconfiguration problem,” he emphasised. “We are observing the early formation of a public, unmanaged AI compute substrate. There is no central switch to flip.”

The geopolitical dimension adds urgency. “When most of the world’s unmanaged AI compute depends on models released by a handful of non-Western labs, traditional assumptions about influence, coordination, and post-release response become weaker,” Bernadett-Shapiro said.

For Western developers and policymakers, the implication is stark: “Even perfect governance of their own platforms has limited impact on the real-world risk surface if the dominant capabilities live elsewhere and propagate through open, decentralised infrastructure.”

The open-source AI ecosystem is globalising, but its centre of gravity is shifting decisively eastward. Not through any coordinated strategy, but through the practical economics of who’s willing to publish what researchers and operators actually need to run AI locally.

The 175,000 exposed hosts mapped in this study are just the visible surface of that fundamental realignment—one that Western policymakers are only beginning to recognise, let alone address.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Exclusive: Why are Chinese AI models dominating open-source as Western labs step back? appeared first on AI News.

From blogosphere to the AI & Big Data Expo: Rackspace and operational AI

AI News — Wed, 04 Feb 2026 10:01:00 +0000

In a recent blog output, Rackspace refers to the bottlenecks familiar to many readers: messy data, unclear ownership, governance gaps, and the cost of running models once they become part of production. The company frames them through the lens of service delivery, security operations, and cloud modernisation, which tells you where it is putting its own effort.

One of the clearest examples of operational AI inside Rackspace sits in its security business. In late January, the company described RAIDER (Rackspace Advanced Intelligence, Detection and Event Research) as a custom back-end platform built for its internal cyber defense centre. With security teams working amid many alerts and logs, standard detection engineering doesn’t scale if dependent on the manual writing of security rules. Rackspace says its RAIDER system unifies threat intelligence with detection engineering workflows and uses its AI Security Engine (RAISE) and LLMs to automate detection rule creation, generating detection criteria it describes as “platform-ready” in line with known frameworks such as MITRE ATT&CK. The company claims it’s cut detection development time by more than half and reduced mean time to detect and respond. This is just the kind of internal process change that matters.

The company also positions agentic AI as a way of taking the friction out of complex engineering programmes. A January post on modernising VMware environments on AWS describes a model in which AI agents handle data-intensive analysis and many repeating tasks, yet it keeps “architectural judgement, governance and business decisions” remain in the human domain. Rackspace presents this workflow as stopping senior engineers being sidelined into migration projects. The article states the target is to keep day two operations in scope – where many migration plans fail as teams discover they have modernised infrastructure but not operating practices.

Elsewhere the company sets out a picture of AI-supported operations where monitoring becomes more predictive, routine incidents are handled by bots and automation scripts, and telemetry (plus historical data) are used to spot patterns and, it turn, recommend fixes. This is conventional AIOps language, but it Rackspace is tying such language to managed services delivery, suggesting the company uses AI to reduce the cost of labour in operational pipelines in addition to the more familiar use of AI in customer-facing environments.

In a post describing AI-enabled operations, the company stresses the importance of focus strategy, governance and operating models. It specifies the machinery it needed to industrialise AI, such as choosing infrastructure based on whether workloads involve training, fine-tuning or inference. Many tasks are relatively lightweight and can run inference locally on existing hardware.

The company’s noted four recurring barriers to AI adoption, most notably that of fragmented and inconsistent data, and it recommends investment in integration and data management so models have consistent foundations. This is not an opinion unique to Rackspace, of course, but having it writ large by a technology-first, big player is illustrative of the issues faced by many enterprise-scale AI deployments.

A company of even greater size, Microsoft, is working to coordinate autonomous agents’ work across systems. Copilot has evolved into an orchestration layer, and in Microsoft’s ecosystem, multi-step task execution and broader model choice do exist. However, it’s noteworthy that Redmond is called out by Rackspace on the fact that productivity gains only arrive when identity, data access, and oversight are firmly ensconced into operations.

Rackspace’s near-term AI plan comprises of AI-assisted security engineering, agent-supported modernisation, and AI-augmented service management. Its future plans can perhaps be discerned in a January article published on the company’s blog that concerns private cloud AI trends. In it, the author argues inference economics and governance will drive architecture decisions well into 2026. It anticipates ‘bursty’ exploration in public clouds, while moving inference tasks into private clouds on the grounds of cost stability, and compliance. That’s a roadmap for operational AI grounded in budget and audit requirements, not novelty.

For decision-makers trying to accelerate their own deployments, the useful takeaway is that Rackspace has treats AI as an operational discipline. The concrete, published examples it gives are those that reduce cycle time in repeatable work. Readers may accept the company’s direction and still be wary of the company’s claimed metrics. The steps to take inside a growing business are to discover repeating processes, examine where strict oversight is necessary because of data governance, and where inference costs might be reduced by bringing some processing in-house.

(Image source: Pixabay)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post From blogosphere to the AI & Big Data Expo: Rackspace and operational AI appeared first on AI News.

How separating logic and search boosts AI agent scalability

Ryan Daws — Fri, 06 Feb 2026 11:32:16 +0000

Separating logic from inference improves AI agent scalability by decoupling core workflows from execution strategies.

The transition from generative AI prototypes to production-grade agents introduces a specific engineering hurdle: reliability. LLMs are stochastic by nature. A prompt that works once may fail on the second attempt. To mitigate this, development teams often wrap core business logic in complex error-handling loops, retries, and branching paths.

This approach creates a maintenance problem. The code defining what an agent should do becomes inextricably mixed with the code defining how to handle the model’s unpredictability. A new framework proposed by researchers from Asari AI, MIT CSAIL, and Caltech suggests a different architectural standard is required to scale agentic workflows in the enterprise.

The research introduces a programming model called Probabilistic Angelic Nondeterminism (PAN) and a Python implementation named ENCOMPASS. This method allows developers to write the “happy path” of an agent’s workflow while relegating inference-time strategies (e.g. beam search or backtracking) to a separate runtime engine. This separation of concerns offers a potential route to reduce technical debt while improving the performance of automated tasks.

The entanglement problem in agent design

Current approaches to agent programming often conflate two distinct design aspects. The first is the core workflow logic, or the sequence of steps required to complete a business task. The second is the inference-time strategy, which dictates how the system navigates uncertainty, such as generating multiple drafts or verifying outputs against a rubric.

When these are combined, the resulting codebase becomes brittle. Implementing a strategy like “best-of-N” sampling requires wrapping the entire agent function in a loop. Moving to a more complex strategy, such as tree search or refinement, typically requires a complete structural rewrite of the agent’s code.

The researchers argue that this entanglement limits experimentation. If a development team wants to switch from simple sampling to a beam search strategy to improve accuracy, they often must re-engineer the application’s control flow. This high cost of experimentation means teams frequently settle for suboptimal reliability strategies to avoid engineering overhead.

Decoupling logic from search to boost AI agent scalability

The ENCOMPASS framework addresses this by allowing programmers to mark “locations of unreliability” within their code using a primitive called branchpoint().

These markers indicate where an LLM call occurs and where execution might diverge. The developer writes the code as if the operation will succeed. At runtime, the framework interprets these branch points to construct a search tree of possible execution paths.

This architecture enables what the authors term “program-in-control” agents. Unlike “LLM-in-control” systems, where the model decides the entire sequence of operations, program-in-control agents operate within a workflow defined by code. The LLM is invoked only to perform specific subtasks. This structure is generally preferred in enterprise environments for its higher predictability and auditability compared to fully autonomous agents.

By treating inference strategies as a search over execution paths, the framework allows developers to apply different algorithms – such as depth-first search, beam search, or Monte Carlo tree search – without altering the underlying business logic.

Impact on legacy migration and code translation

The utility of this approach is evident in complex workflows such as legacy code migration. The researchers applied the framework to a Java-to-Python translation agent. The workflow involved translating a repository file-by-file, generating inputs, and validating the output through execution.

In a standard Python implementation, adding search logic to this workflow required defining a state machine. This process obscured the business logic and made the code difficult to read or lint. Implementing beam search required the programmer to break the workflow into individual steps and explicitly manage state across a dictionary of variables.

Using the proposed framework to boost AI agent scalability, the team implemented the same search strategies by inserting branchpoint() statements before LLM calls. The core logic remained linear and readable. The study found that applying beam search at both the file and method level outperformed simpler sampling strategies.

The data indicates that separating these concerns allows for better scaling laws. Performance improved linearly with the logarithm of the inference cost. The most effective strategy found – fine-grained beam search – was also the one that would have been most complex to implement using traditional coding methods.

Cost efficiency and performance scaling

Controlling the cost of inference is a primary concern for data officers managing P&L for AI projects. The research demonstrates that sophisticated search algorithms can yield better results at a lower cost compared to simply increasing the number of feedback loops.

In a case study involving the “Reflexion” agent pattern (where an LLM critiques its own output) the researchers compared scaling the number of refinement loops against using a best-first search algorithm. The search-based approach achieved comparable performance to the standard refinement method but at a reduced cost per task.

This finding suggests that the choice of inference strategy is a factor for cost optimisation. By externalising this strategy, teams can tune the balance between compute budget and required accuracy without rewriting the application. A low-stakes internal tool might use a cheap and greedy search strategy, while a customer-facing application could use a more expensive and exhaustive search, all running on the same codebase.

Adopting this architecture requires a change in how development teams view agent construction. The framework is designed to work in conjunction with existing libraries such as LangChain, rather than replacing them. It sits at a different layer of the stack, managing control flow rather than prompt engineering or tool interfaces.

However, the approach is not without engineering challenges. The framework reduces the code required to implement search, but it does not automate the design of the agent itself. Engineers must still identify the correct locations for branch points and define verifiable success metrics.

The effectiveness of any search capability relies on the system’s ability to score a specific path. In the code translation example, the system could run unit tests to verify correctness. In more subjective domains, such as summarisation or creative generation, defining a reliable scoring function remains a bottleneck.

Furthermore, the model relies on the ability to copy the program’s state at branching points. While the framework handles variable scoping and memory management, developers must ensure that external side effects – such as database writes or API calls – are managed correctly to prevent duplicate actions during the search process.

Implications for AI agent scalability

The change represented by PAN and ENCOMPASS aligns with broader software engineering principles of modularity. As agentic workflows become core to operations, maintaining them will require the same rigour applied to traditional software.

Hard-coding probabilistic logic into business applications creates technical debt. It makes systems difficult to test, difficult to audit, and difficult to upgrade. Decoupling the inference strategy from the workflow logic allows for independent optimisation of both.

This separation also facilitates better governance. If a specific search strategy yields hallucinations or errors, it can be adjusted globally without assessing every individual agent’s codebase. It simplifies the versioning of AI behaviours, a requirement for regulated industries where the “how” of a decision is as important as the outcome.

The research indicates that as inference-time compute scales, the complexity of managing execution paths will increase. Enterprise architectures that isolate this complexity will likely prove more durable than those that permit it to permeate the application layer.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post How separating logic and search boosts AI agent scalability appeared first on AI News.

China’s hyperscalers bet billions on agentic AI as commerce becomes the new battleground

Dashveenjit Kaur — Fri, 30 Jan 2026 09:00:00 +0000

The artificial intelligence industry’s pivot toward agentic AI – systems capable of autonomously executing multi-step tasks – has dominated technology discussions in recent months.

But while Western firms focus on foundational models and cross-platform interoperability, China’s technology giants are racing to dominate through commerce integration, a divergence that could reshape how enterprises deploy autonomous systems globally.

Alibaba, Tencent and ByteDance have rapidly upgraded their AI platforms to support agentic commerce, marking a pivot from conversational AI tools to agents capable of completing entire transaction cycles, from product discovery through payment.

Just last week, Alibaba upgraded its Qwen chatbot to let direct transaction completion in the interface, connecting the AI agent in its ecosystem, including Taobao, Alipay, Amap and travel platform Fliggy. The integration supports over 400 core digital tasks, allowing users to compare personalised recommendations in platforms and complete payments without leaving the chatbot environment.

“The agentic transformation of commercial services lets the maximal integration of user services and enhances user stickiness,” Shaochen Wang, research analyst at Counterpoint Research, told CNBC, referring to stronger long-term user engagement that creates sustainable competitive advantages.

The super app advantage

Before that, ByteDance upgraded its Doubao AI chatbot in December to autonomously handle tasks, including ticket bookings, through integrations with Douyin, the Chinese version of TikTok. The upgraded model was introduced on a ZTE-developed prototype smartphone as a system-level AI assistant; however, some planned features were later scaled back due to privacy and security concerns raised by rivals.

Tencent President Martin Lau indicated during the company’s May 2025 earnings call that AI agents could become core components of the WeChat ecosystem, which serves over one billion users with integrated messaging, payments, e-commerce and services.

The positioning reflects China’s structural advantage in agentic AI deployment: integrated ecosystems that eliminate the fragmentation constraining Western competitors.

“AI agents will be foundational to the evolution of super apps, with success depending on deep integration in payments, logistics, and social engagement,” Charlie Dai, VP and principal analyst at Forrester, told CNBC. “Chinese firms like Alibaba, Tencent and ByteDance all benefit from integrated ecosystems, rich behavioural data, and consumer familiarity with super apps.”

Western companies face more fragmented data environments and stricter privacy regulations that slow cross-service integration, despite leading in foundational AI model development and global reach, Dai noted.

Agentic AI’s enterprise trajectory

Commercial applications signal broader enterprise implications as agentic AI moves from auxiliary tools to autonomous actors capable of executing complex workflows. Industry experts expect multi-agent systems to emerge as a defining trend in AI deployment this year, extending from consumer services into organisational production.

In a report by Global Times, Tian Feng, president of the Fast Think Institute and former dean of SenseTime’s Intelligence Industry Research Institute, predicted that the first AI agent to surpass 300 million monthly active users could emerge as early as 2026, becoming “an indispensable assistant for work and daily life” capable of autonomously executing cross-app, composite services.

Approximately half of all consumers already use AI when searching online, according to a 2025 McKinsey study. The research firm estimated that AI agents could generate more than $1 trillion in economic value for US businesses by 2030 through streamlining routine steps in consumer decision-making.

Chinese cloud providers, including smaller players like JD Cloud and UCloud, have also begun supporting agentic AI tools, though high token use has driven some providers, like ByteDance’s Volcano Engine, to introduce fixed-subscription pricing models to address cost concerns.

Divergent deployment strategies

The contrasting approaches between Chinese integration and Western scalability reflect fundamental differences in market structure and regulatory environments that will likely define competitive positioning.

“China will prioritise domestic integration and expansion in selected regions, while US firms focus on global scalability and governance,” Dai said.

US players pursuing agentic commerce include OpenAI, Perplexity, and Amazon, while Google explores positioning itself as a “matchmaker” between merchants, consumers and AI agents – approaches that reflect fragmented platform environments requiring interoperability not closed-loop integration.

However, the autonomous nature of agentic systems has raised regulatory questions in China. ByteDance warned users about security and privacy risks when announcing Doubao’s abilities, recommending deployment on dedicated devices not those containing sensitive information, given the tool’s access to device data, digital accounts and internet connectivity in multiple ports.

The rapid commercialisation of agentic AI in China’s consumer sector provides enterprise decision-makers globally with early signals of how autonomous systems may reshape customer acquisition costs, platform economics and competitive moats as these abilities mature.

(Photo by Philip Oroni)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post China’s hyperscalers bet billions on agentic AI as commerce becomes the new battleground appeared first on AI News.

Meeting the new ETSI standard for AI security

Ryan Daws — Thu, 15 Jan 2026 13:23:47 +0000

The ETSI EN 304 223 standard introduces baseline security requirements for AI that enterprises must integrate into governance frameworks.

As organisations embed machine learning into their core operations, this European Standard (EN) establishes concrete provisions for securing AI models and systems. It stands as the first globally applicable European Standard for AI cybersecurity, having secured formal approval from National Standards Organisations to strengthen its authority across international markets.

The standard serves as a necessary benchmark alongside the EU AI Act. It addresses the reality that AI systems possess specific risks – such as susceptibility to data poisoning, model obfuscation, and indirect prompt injection – that traditional software security measures often miss. The standard covers deep neural networks and generative AI through to basic predictive systems, explicitly excluding only those used strictly for academic research.

ETSI standard clarifies the chain of responsibility for AI security

A persistent hurdle in enterprise AI adoption is determining who owns the risk. The ETSI standard resolves this by defining three primary technical roles: Developers, System Operators, and Data Custodians.

For many enterprises, these lines blur. A financial services firm that fine-tunes an open-source model for fraud detection counts as both a Developer and a System Operator. This dual status triggers strict obligations, requiring the firm to secure the deployment infrastructure while documenting the provenance of training data and the model’s design auditing.

The inclusion of ‘Data Custodians’ as a distinct stakeholder group directly impacts Chief Data and Analytics Officers (CDAOs). These entities control data permissions and integrity, a role that now carries explicit security responsibilities. Custodians must ensure that the intended usage of a system aligns with the sensitivity of the training data, effectively placing a security gatekeeper within the data management workflow.

ETSI’s AI standard makes clear that security cannot be an afterthought appended at the deployment stage. During the design phase, organisations must conduct threat modelling that addresses AI-native attacks, such as membership inference and model obfuscation.

One provision requires developers to restrict functionality to reduce the attack surface. For instance, if a system uses a multi-modal model but only requires text processing, the unused modalities (like image or audio processing) represent a risk that must be managed. This requirement forces technical leaders to reconsider the common practice of deploying massive, general-purpose foundation models where a smaller and more specialised model would suffice.

The document also enforces strict asset management. Developers and System Operators must maintain a comprehensive inventory of assets, including interdependencies and connectivity. This supports shadow AI discovery; IT leaders cannot secure models they do not know exist. The standard also requires the creation of specific disaster recovery plans tailored to AI attacks, ensuring that a “known good state” can be restored if a model is compromised.

Supply chain security presents an immediate friction point for enterprises relying on third-party vendors or open-source repositories. The ETSI standard requires that if a System Operator chooses to use AI models or components that are not well-documented, they must justify that decision and document the associated security risks.

Practically, procurement teams can no longer accept “black box” solutions. Developers are required to provide cryptographic hashes for model components to verify authenticity. Where training data is sourced publicly (a common practice for Large Language Models), developers must document the source URL and acquisition timestamp. This audit trail is necessary for post-incident investigations, particularly when attempting to identify if a model was subjected to data poisoning during its training phase.

If an enterprise offers an API to external customers, they must apply controls designed to mitigate AI-focused attacks, such as rate limiting to prevent adversaries from reverse-engineering the model or overwhelming defences to inject poison data.

The lifecycle approach extends into the maintenance phase, where the standard treats major updates – such as retraining on new data – as the deployment of a new version. Under the ETSI AI standard, this triggers a requirement for renewed security testing and evaluation.

Continuous monitoring is also formalised. System Operators must analyse logs not just for uptime, but to detect “data drift” or gradual changes in behaviour that could indicate a security breach. This moves AI monitoring from a performance metric to a security discipline.

The standard also addresses the “End of Life” phase. When a model is decommissioned or transferred, organisations must involve Data Custodians to ensure the secure disposal of data and configuration details. This provision prevents the leakage of sensitive intellectual property or training data through discarded hardware or forgotten cloud instances.

Executive oversight and governance

Compliance with ETSI EN 304 223 requires a review of existing cybersecurity training programmes. The standard mandates that training be tailored to specific roles, ensuring that developers understand secure coding for AI while general staff remain aware of threats like social engineering via AI outputs.

“ETSI EN 304 223 represents an important step forward in establishing a common, rigorous foundation for securing AI systems”, said Scott Cadzow, Chair of ETSI’s Technical Committee for Securing Artificial Intelligence.

“At a time when AI is being increasingly integrated into critical services and infrastructure, the availability of clear, practical guidance that reflects both the complexity of these technologies and the realities of deployment cannot be underestimated. The work that went into delivering this framework is the result of extensive collaboration and it means that organisations can have full confidence in AI systems that are resilient, trustworthy, and secure by design.”

Implementing these baselines in ETSI’s AI security standard provides a structure for safer innovation. By enforcing documented audit trails, clear role definitions, and supply chain transparency, enterprises can mitigate the risks associated with AI adoption while establishing a defensible position for future regulatory audits.

An upcoming Technical Report (ETSI TR 104 159) will apply these principles specifically to generative AI, targeting issues like deepfakes and disinformation.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Meeting the new ETSI standard for AI security appeared first on AI News.

AI medical diagnostics race intensifies as OpenAI, Google, and Anthropic launch competing healthcare tools

Dashveenjit Kaur — Thu, 15 Jan 2026 07:00:00 +0000

OpenAI, Google, and Anthropic announced specialised medical AI capabilities within days of each other this month, a clustering that suggests competitive pressure rather than coincidental timing. Yet none of the releases are cleared as medical devices, approved for clinical use, or available for direct patient diagnosis—despite marketing language emphasising healthcare transformation.

OpenAI introduced ChatGPT Health on January 7, allowing US users to connect medical records through partnerships with b.well, Apple Health, Function, and MyFitnessPal. Google released MedGemma 1.5 on January 13, expanding its open medical AI model to interpret three-dimensional CT and MRI scans alongside whole-slide histopathology images.

Anthropic followed on January 11 with Claude for Healthcare, offering HIPAA-compliant connectors to CMS coverage databases, ICD-10 coding systems, and the National Provider Identifier Registry.

All three companies are targeting the same workflow pain points—prior authorisation reviews, claims processing, clinical documentation—with similar technical approaches but different go-to-market strategies.

Developer platforms, not diagnostic products

The architectural similarities are notable. Each system uses multimodal large language models fine-tuned on medical literature and clinical datasets. Each emphasises privacy protections and regulatory disclaimers. Each positions itself as supporting rather than replacing clinical judgment.

The differences lie in deployment and access models. OpenAI’s ChatGPT Health operates as a consumer-facing service with a waitlist for ChatGPT Free, Plus, and Pro subscribers outside the EEA, Switzerland, and the UK. Google’s MedGemma 1.5 releases as an open model through its Health AI Developer Foundations program, available for download via Hugging Face or deployment through Google Cloud’s Vertex AI.

Anthropic’s Claude for Healthcare integrates into existing enterprise workflows through Claude for Enterprise, targeting institutional buyers rather than individual consumers. The regulatory positioning is consistent across all three.

OpenAI states explicitly that Health “is not intended for diagnosis or treatment.” Google positions MedGemma as “starting points for developers to evaluate and adapt to their medical use cases.” Anthropic emphasises that outputs “are not intended to directly inform clinical diagnosis, patient management decisions, treatment recommendations, or any other direct clinical practice applications.”

Benchmark performance vs clinical validation

Medical AI benchmark results improved substantially across all three releases, though the gap between test performance and clinical deployment remains significant. Google reports that MedGemma 1.5 achieved 92.3% accuracy on MedAgentBench, Stanford’s medical agent task completion benchmark, compared to 69.6% for the previous Sonnet 3.5 baseline.

The model improved by 14 percentage points on MRI disease classification and 3 percentage points on CT findings in internal testing. Anthropic’s Claude Opus 4.5 scored 61.3% on MedCalc medical calculation accuracy tests with Python code execution enabled, and 92.3% on MedAgentBench.

The company also claims improvements in “honesty evaluations” related to factual hallucinations, though specific metrics were not disclosed.

OpenAI has not published benchmark comparisons for ChatGPT Health specifically, noting instead that “over 230 million people globally ask health and wellness-related questions on ChatGPT every week” based on de-identified analysis of existing usage patterns.

These benchmarks measure performance on curated test datasets, not clinical outcomes in practice. Medical errors can have life-threatening consequences, translating benchmark accuracy to clinical utility more complex than in other AI application domains.

Regulatory pathway remains unclear

The regulatory framework for these medical AI tools remains ambiguous. In the US, the FDA’s oversight depends on intended use. Software that “supports or provides recommendations to a health care professional about prevention, diagnosis, or treatment of a disease” may require premarket review as a medical device. None of the announced tools has FDA clearance.

Liability questions are similarly unresolved. When Banner Health’s CTO Mike Reagin states that the health system was “drawn to Anthropic’s focus on AI safety,” this addresses technology selection criteria, not legal liability frameworks.

If a clinician relies on Claude’s prior authorisation analysis and a patient suffers harm from delayed care, existing case law provides limited guidance on responsibility allocation.

Regulatory approaches vary significantly across markets. While the FDA and Europe’s Medical Device Regulation provide established frameworks for software as a medical device, many APAC regulators have not issued specific guidance on generative AI diagnostic tools.

This regulatory ambiguity affects adoption timelines in markets where healthcare infrastructure gaps might otherwise accelerate implementation—creating a tension between clinical need and regulatory caution.

Administrative workflows, not clinical decisions

Real deployments remain carefully scoped. Novo Nordisk’s Louise Lind Skov, Director of Content Digitalisation, described using Claude for “document and content automation in pharma development,” focused on regulatory submission documents rather than patient diagnosis.

Taiwan’s National Health Insurance Administration applied MedGemma to extract data from 30,000 pathology reports for policy analysis, not treatment decisions.

The pattern suggests institutional adoption is concentrating on administrative workflows where errors are less immediately dangerous—billing, documentation, protocol drafting—rather than direct clinical decision support where medical AI capabilities would have the most dramatic impact on patient outcomes.

Medical AI capabilities are advancing faster than the institutions deploying them can navigate regulatory, liability, and workflow integration complexities. The technology exists. The US$20 monthly subscription provides access to sophisticated medical reasoning tools.

Whether that translates to transformed healthcare delivery depends on questions these coordinated announcements leave unaddressed.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post AI medical diagnostics race intensifies as OpenAI, Google, and Anthropic launch competing healthcare tools appeared first on AI News.

Why Apple chose Google over OpenAI: What enterprise AI buyers can learn from the Gemini deal

Dashveenjit Kaur — Tue, 13 Jan 2026 07:00:00 +0000

Apple’s multi-year agreement to integrate Google’s Gemini models into its revamped Siri offers a rare window into how one of the world’s most selective technology companies evaluates foundation models – and the criteria should matter to any enterprise weighing similar decisions.

The stakes were considerable. Apple had been publicly integrating ChatGPT into its devices since late 2024, giving OpenAI prominent positioning in the Apple Intelligence ecosystem.

Google’s Gemini win represents a shift in Apple’s AI infrastructure strategy, one that relegates OpenAI to what Parth Talsania, CEO of Equisights Research, describes as “a more supporting role, with ChatGPT remaining positioned for complex, opt-in queries rather than the default intelligence layer.”

The evaluation that mattered

Apple’s reasoning was notably specific. “After careful evaluation, Apple determined Google’s AI technology provides the most capable foundation for Apple Foundation Models,” according to the joint statement. The phrasing matters – Apple didn’t cite partnership convenience, pricing, or ecosystem compatibility. The company framed this explicitly as a capabilities assessment.

Apple’s evaluation criteria likely mirrored concerns familiar to any organisation building AI into core products: model performance at scale, inference latency, multimodal capabilities, and crucially, the ability to run models both on-device and in cloud environments while maintaining privacy standards.

Google’s technology already powers Samsung’s Galaxy AI in millions of devices, providing proven deployment evidence at consumer scale. But Apple’s decision unlocks something different: integration in more than two billion active devices, with the technical demands that come with Apple’s performance and privacy requirements.

What has changed since ChatGPT integration

The timing raises questions. Apple rolled out ChatGPT integration just over a year ago, positioning Siri to tap into the chatbot for complex queries. The company now states, “there were no major changes to the ChatGPT integration at the time,” but the competitive dynamics have clearly shifted.

OpenAI’s response to Google’s Gemini 3 release in late 2025 – what reports described as a “code red” to accelerate development – suggests the competitive pressure was real. For enterprises, this highlights a risk often under-weighted in vendor selection: the pace of model capability advancement varies significantly between providers, and today’s leader may not maintain that position in a multi-year deployment.

Apple’s choice of a multi-year agreement with Google, rather than maintaining flexibility to switch between providers, suggests confidence in Google’s development trajectory. That’s a bet on sustained R&D investment, continued model improvements, and infrastructure scaling – the same factors enterprise buyers need to assess beyond current benchmarks.

The infrastructure question

The deal raises immediate concerns about concentration. “The seems like an unreasonable concentration of power for Google, given that they also have Android and Chrome,” Tesla CEO Elon Musk posted on social media. The critique reflects a legitimate enterprise concern about vendor dependency.

Google now powers AI features in both major mobile operating systems through different mechanisms: directly via Android, and through this partnership for iOS. For enterprises deploying AI capabilities, the parallel is that relying on a single foundation model provider creates technical and commercial dependencies that extend beyond the immediate integration.

This makes Apple’s architectural approach worth examining. The company emphasised that “Apple Intelligence will continue to run on Apple devices and Private Cloud Compute, while maintaining Apple’s industry-leading privacy standards.”

The hybrid deployment model – on-device processing for privacy-sensitive operations, cloud-based models for complex tasks – offers a template for enterprises balancing capability with data governance requirements.

Market implications beyond mobile

The deal’s immediate impact was measurable: Alphabet’s market valuation crossed US$4 trillion on Monday, with the stock having jumped 65% in 2024 on growing investor confidence in its AI efforts. But the strategic implications extend beyond market caps.

Google has been methodically building positions in the AI stack – frontier models, image and video generation, and now default integration into iOS devices. For enterprises, this vertical integration matters when evaluating cloud AI services: a provider’s foundation model capabilities increasingly connect to their broader infrastructure, tools, and ecosystem positioning.

Apple’s setbacks on the AI front – delayed Siri upgrades, executive changes, lukewarm reception for initial generative AI tools – are instructive from another angle. Even companies with enormous resources and talent can struggle with AI product execution. The decision to partner with Google rather than persist with entirely proprietary development acknowledges the complexity and resource demands of frontier model development.

The search revenue connection

The Gemini deal builds on an existing commercial relationship that generates tens of billions in annual revenue for Apple: Google pays to remain the default search engine on Apple devices. That arrangement has faced regulatory scrutiny, but it establishes precedent for deep technical integration between the companies.

The search deal likely influenced negotiations around the Gemini integration, just as existing vendor relationships shape enterprise AI procurement. Those relationships can be advantages – established trust, proven integration capabilities – or constraints that limit evaluation of alternatives.

The OpenAI question

The deal leaves OpenAI in an awkward position. ChatGPT remains available on Apple devices, but as an optional feature rather than the infrastructure layer. For a company that has positioned itself as the AI leader, losing default integration to Google represents a strategic setback.

The competitive dynamic offers a reminder that the foundation model market remains fluid. Provider positioning can shift quickly, and exclusive relationships between major players can reshape options for everyone else. Maintaining options – through abstraction layers, multi-model strategies, or portable architectures – becomes more valuable in rapidly evolving markets.

What comes next

Google stated that Gemini models will power not just the revamped Siri coming later this year, but “other future Apple Intelligence features.” The scope of integration will likely expand as Apple builds out its AI capabilities, creating deeper technical dependencies and raising the stakes of the partnership.

The financial terms remain undisclosed, leaving the question of how Apple and Google structure pricing for this scale of deployment? Enterprise buyers negotiating foundation model licensing will be watching for any signals about how such deals get priced at a massive scale.

Apple’s decision doesn’t make Google’s Gemini the obvious choice for every enterprise – far from it. But the deal does offer validated evidence of what one extremely selective technology company prioritised when evaluating foundation models under demanding requirements. For enterprise AI buyers navigating their own evaluations, that’s a signal worth considering amid the noise of vendor marketing and benchmark leader boards.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

The post Why Apple chose Google over OpenAI: What enterprise AI buyers can learn from the Gemini deal appeared first on AI News.

Retailers like Kroger and Lowe’s test AI agents without handing control to Google

Muhammad Zulhusni — Mon, 12 Jan 2026 12:00:00 +0000

Retailers are starting to confront a problem that sits behind much of the hype around AI shopping: as customers turn to chatbots and automated assistants to decide what to buy, retailers risk losing control over how their products are shown, sold, and bundled.

That concern is pushing some large chains to build or support their own AI-powered shopping tools, rather than relying only on third-party platforms. The goal is not to chase novelty, but to stay close to customers as buying decisions shift toward automation.

Several retailers, including Lowe’s, Kroger, and Papa Johns, are experimenting with AI agents that can help shoppers search for items, get support, or place orders. Many of these efforts are backed by tools from Google, which is offering retailers a way to deploy agents inside their own apps and websites instead of sending customers elsewhere.

Keeping control as shopping shifts toward automation

For grocers like Kroger, the concern is not whether AI will influence shopping, but how quickly it might do so. The company is testing an AI shopping agent that can compare items, handle purchases, and adjust suggestions based on customer habits and needs.

“Things are moving at a pace that if you’re not already deep into [AI agents], you’re probably creating a competitive barrier or disadvantage,” said Yael Cosset, Kroger’s chief digital officer and executive vice president.

The agent, which sits inside Kroger’s mobile app, can take into account factors such as time limits or meal plans, while also drawing on data the retailer already has, including price sensitivity and brand preferences. The intent is to keep those decisions within Kroger’s own systems rather than handing them off to external platforms.

That approach reflects a wider tension in retail. Making products available directly inside large AI chatbots can widen reach, but it can also weaken customer loyalty, reduce add-on sales, and cut into advertising revenue. Once a third party controls the interface, retailers have less say in how choices are framed.

This is one reason some retailers are cautious about selling directly through tools built by companies like OpenAI or Microsoft. Both have rolled out features that allow users to complete purchases inside their chatbots, and last year Walmart said it would work with OpenAI to let customers buy items through ChatGPT.

For retailers, the appeal of running their own agents is control. “There’s a market shift across the spectrum of retailers who are investing in their own capabilities rather than just relying on third-parties,” said Lauren Wiener, a global leader of marketing and customer growth at Boston Consulting Group.

Why retailers are spreading risk across vendors

Still, building and maintaining these systems is not simple. The underlying models change quickly, and tools that work today may need reworking weeks later. That reality is shaping how retailers think about vendors.

At Lowe’s, Google’s shopping agent sits behind the retailer’s own virtual assistant, Mylow. When customers use Mylow online, the company says conversion rates more than double. But Lowe’s does not rely on a single provider.

“The tech we build can become outdated in two weeks,” said Seemantini Godbole, Lowe’s chief digital and information officer. That pace is one reason Lowe’s works with several vendors, including OpenAI, rather than betting on one system.

Kroger is taking a similar approach. Alongside Google, it works with companies such as Instacart to support its agent strategy. “[AI agents] are not just top of mind, it’s a priority for us,” Cosset said. “It’s going at a remarkable pace.”

Testing AI agents without overcommitting

For others, the challenge is not keeping up with the technology, but deciding how much to build at all. Papa Johns does not create its own AI models or agents. Instead, it is testing Google’s food ordering agent to handle tasks like estimating how many pizzas a group might need based on a photo uploaded by a customer.

Customers will be able to use the agent by phone, through the company’s website, or in its app. “I don’t want to be an AI expert in terms of building the agents,” said Kevin Vasconi, Papa Johns’ chief digital and technology officer. “I want to be an AI expert in terms of, ‘How do I use the agents?’”

That focus on use rather than ownership reflects a practical view of where AI fits today. While agent-based shopping is gaining attention, it is not yet the main way people buy everyday goods.

“I don’t think [AI agents] are going to totally change the industry,” Vasconi said. “People still call our stores on the phone to order pizza in this day and age.”

Analysts see Google’s tools less as a finished answer and more as a way to lower the barrier for retailers that do not want to start from scratch. “The real challenge here is application of the technologies,” said Ed Anderson, a tech analyst at Gartner. “These announcements take a step forward so that retailers don’t have to start from ground zero.”

For now, retailers are testing, mixing vendors, and holding back from firm commitments. Kroger, Lowe’s, and Papa Johns have not shared detailed results from their trials. That caution suggests many are still trying to understand how much control they are willing to give up—and how much they can afford to keep—as shopping slowly shifts toward automation.

(Photo by Heidi Fin)

Want to learn more about AI and big data from industry leaders? Check outAI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Retailers like Kroger and Lowe’s test AI agents without handing control to Google appeared first on AI News.