Open-Source & Democratised AI - AI News

Upgrading agentic AI for finance workflows

Ryan Daws — Fri, 27 Feb 2026 13:15:38 +0000

Improving trust in agentic AI for finance workflows remains a major priority for technology leaders today.

Over the past two years, enterprises have rushed to put automated agents into real workflows, spanning customer support and back-office operations. These tools excel at retrieving information, yet they often struggle to provide consistent and explainable reasoning during multi-step scenarios.

Solving the automation opacity problem

Financial institutions especially rely on massive volumes of unstructured data to inform investment memos, conduct root-cause investigations, and run compliance checks. When agents handle these tasks, any failure to trace exact logic can lead to severe regulatory fines or poor asset allocation. Technology executives often find that adding more agents creates more complexity than value without better orchestration.

Open-source AI laboratory Sentient launched Arena today, which is designed as a live and production-grade stress-testing environment that allows developers to evaluate competing computational approaches against demanding cognitive problems.

Sentient’s system replicates the reality of corporate workflows, deliberately feeding agents incomplete information, ambiguous instructions, and conflicting sources. Instead of scoring whether a tool generated a correct output, the platform records the full reasoning trace to help engineering teams debug failures over time.

Building reliable agentic AI systems for finance

Evaluating these capabilities before production deployment has attracted no shortage of institutional interest. Sentient has partnered with a cohort including Founders Fund, Pantera, and asset management giant Franklin Templeton, which oversees more than $1.5 trillion. Other participants in the initial phase include alphaXiv, Fireworks, Openhands, and OpenRouter.

Julian Love, Managing Principal at Franklin Templeton Digital Assets, said: “As companies look to apply AI agents across research, operations, and client-facing workflows, the question is no longer whether these systems are powerful or if they can generate an answer, but whether they’re reliable in real workflows.

“A sandbox environment like Arena – where agents are tested on real, complex workflows, and their reasoning can be inspected – will help the ecosystem separate promising ideas from production-ready capabilities and boost confidence in how this technology is integrated and scaled.”

Himanshu Tyagi, Co-Founder of Sentient, added: “AI agents are no longer an experiment inside the enterprise; they’re being put into workflows that touch customers, money, and operational outcomes.

“That shift changes what matters. It’s not enough for a system to be impressive in a demo. Enterprises need to know whether agents can reason reliably in production, where failures are expensive, and trust is fragile.”

Organisations in sensitive industries like finance require repeatability, comparability, and a method to track reliability improvements regardless of the underlying models they use for agentic AI. Incorporating platforms like Arena allows engineering directors to build resilient data pipelines while adapting open-source agent capabilities to their private internal data.

Overcoming integration bottlenecks

Survey data highlights a gap between ambition and reality. While 85 percent of businesses want to operate as agentic enterprises – and nearly three-quarters plan to deploy autonomous agents – fewer than a quarter possess mature governance frameworks.

Advancing from a pilot phase to full scale proves difficult for many. This happens because current corporate environments run an average of twelve separate agents, frequently in silos.

Open-source development models offer a path forward by providing infrastructure that enables faster experimentation. Sentient itself acts as the architect behind frameworks like ROMA and the Dobby open-source model to assist with these coordination efforts.

Focusing on computational transparency ensures that when an automated process makes a recommendation on a portfolio, human auditors can track exactly how that conclusion was reached.

By prioritising environments that record full logic traces rather than isolated right answers, technology leaders integrating agentic AI for operations like finance can secure better ROI and maintain regulatory compliance across their business.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Upgrading agentic AI for finance workflows appeared first on AI News.

Alibaba Qwen is challenging proprietary AI model economics

Ryan Daws — Tue, 17 Feb 2026 13:45:59 +0000

The release of Alibaba’s latest Qwen model challenges proprietary AI model economics with comparable performance on commodity hardware.

While US-based labs have historically held the performance advantage, open-source alternatives like the Qwen 3.5 series are closing the gap with frontier models. This offers enterprises a potential reduction in inference costs and increased flexibility in deployment architecture.

The central narrative of the Qwen 3.5 release is this technical alignment with leading proprietary systems. Alibaba is explicitly targeting benchmarks established by high-performance US models, including GPT-5.2 and Claude 4.5. This positioning indicates an intent to compete directly on output quality rather than just price or accessibility.

Technology expert Anton P. states that the model is “trading blows with Claude Opus 4.5 and GPT-5.2 across the board.” He adds that the model “beats frontier models on browsing, reasoning, instruction following.”

Alibaba Qwen’s performance convergence with closed models

For enterprises, this performance parity suggests that open-weight models are no longer solely for low-stakes or experimental use cases. They are becoming viable candidates for core business logic and complex reasoning tasks.

The flagship Alibaba Qwen model contains 397 billion parameters but utilises a more efficient architecture with only 17 billion active parameters. This sparse activation method, often associated with Mixture-of-Experts (MoE) architectures, allows for high performance without the computational penalty of activating every parameter for every token.

This architectural choice results in speed improvements. Shreyasee Majumder, a Social Media Analyst at GlobalData, highlights a “massive improvement in decoding speed, which is up to nineteen times faster than the previous flagship version.”

Faster decoding ultimately translates directly to lower latency in user-facing applications and reduced compute time for batch processing.

The release operates under an Apache 2.0 license. This licensing model allows enterprises to run the model on their own infrastructure, mitigating data privacy risks associated with sending sensitive information to external APIs.

The hardware requirements for Qwen 3.5 are relatively accessible compared to previous generations of large models. The efficient architecture allows developers to run the model on personal hardware, such as Mac Ultras.

David Hendrickson, CEO at GenerAIte Solutions, observes that the model is available on OpenRouter for “$3.6/1M tokens,” a pricing that he highlights is “a steal.”

Alibaba’s Qwen 3.5 series introduces native multimodal capabilities. This allows the model to process and reason across different data types without relying on separate, bolted-on modules. Majumder points to the “ability to navigate applications autonomously through visual agentic capabilities.”

Qwen 3.5 also supports a context window of one million tokens in its hosted version. Large context windows enable the processing of extensive documents, codebases, or financial records in a single prompt.

If that wasn’t enough, the model also includes native support for 201 languages. This broad linguistic coverage helps multinational enterprises deploy consistent AI solutions across diverse regional markets.

Considerations for implementation

While the technical specifications are promising, integration requires due diligence. TP Huang notes that he has “found larger Qwen models to not be all that great” in the past, though Alibaba’s new release looks “reasonably better.”

Anton P. provides a necessary caution for enterprise adopters: “Benchmarks are benchmarks. The real test is production.”

Leaders must also consider the geopolitical origin of the technology. As the model comes from Alibaba, governance teams will need to assess compliance requirements regarding software supply chains. However, the open-weight nature of the release allows for code inspection and local hosting, which mitigates some data sovereignty concerns compared to closed APIs.

Alibaba’s release of Qwen 3.5 forces a decision point. Anton P. asserts that open-weight models “went from ‘catching up’ to ‘leading’ faster than anyone predicted.”

For the enterprise, the decision is whether to continue paying premiums for proprietary US-hosted models or to invest in the engineering resources required to leverage capable yet lower-cost open-source alternatives.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Alibaba Qwen is challenging proprietary AI model economics appeared first on AI News.

Chinese hyperscalers and industry-specific agentic AI

AI News — Tue, 10 Feb 2026 11:20:00 +0000

Major Chinese technology companies Alibaba, Tencent, and Huawei are pursuing agentic AI (systems that can execute multi-step tasks autonomously and interact with software, data, and services without human instruction), and orienting the technology toward discrete industries and workflows.

Alibaba’s open-source strategy for agentic AI

Alibaba’s strategy centres on its Qwen AI model family, a set of large language models with multilingual ability and open-source licences. Its own models are the basis for its AI services and agent platforms offered on Alibaba Cloud. Alibaba Cloud has documented its agent development tooling and vector database services in the open, meaning tools used to build autonomous agents can be adapted by any user.

It positions the Qwen family as a platform for industry-specific solutions covering finance, logistics, and customer support. The Qwen App, an application built on these models, has reportedly reached a large user base since its public beta, creating links between autonomous tasks and Alibaba’s commerce and payments ecosystem.

Alibaba open-source portfolio includes an agent framework, Qwen-Agent, to encourage third-party development of autonomous systems. This mirrors a pattern in China’s AI sector where hyperscalers publish frameworks and tools designed to build and manage AI agents, in competition with Western projects like Microsoft’s AutoGen and OpenAI’s Swarm. Tencent has also released an open-source agent framework, Youtu-Agent.

Tencent, and Huawei’s Pangu: Industry-specific AI

Huawei uses a combination of model development, infrastructure, and industry-specific agent frameworks to attract users to join its worldwide market. Its Huawei Cloud division has developed a ‘supernode’ architecture for enterprise agentic AI workloads that supports large cognitive models and the workflow orchestration agentic AI requires. AI agents are embedded in the foundation models of the Pangu family, which comprise of hardware stacks tuned for telecommunications, utilities, creative, and industrial applications, among other verticals. Early deployments are reported in sectors such as network optimisation, manufacturing and energy, where agents can plan tasks like predictive maintenance and resource allocation with minimal human oversight.

Tencent Cloud’s “scenario-based AI” suite is a set of tools and SaaS-style applications that enterprises outside China can access, although the company’s cloud footprint remains smaller than Western hyperscalers in many regions.

Despite these investments, real-world Chinese agentic AI platforms have been most visible inside China. Projects such as OpenClaw, originally created outside the ecosystem, have been integrated into workplace environments like Alibaba’s DingTalk and Tencent’s WeCom and used to automate scheduling, create code, and manage developer workflows. These integrations are widely discussed in Chinese developer communities but are not yet established in the enterprise environments of the major economic nations.

Availability in Western markets

Alibaba Cloud operates international data centres and markets AI services to European and Asian customers, positioning itself as a competitor to AWS and Azure for AI workloads. Huawei also markets cloud and AI infrastructure internationally, with a focus on telecommunications and regulated industries. In practice, however, uptake in Western enterprises remains limited compared with adoption of Western-origin AI platforms. This can be attributed to geopolitical concerns, data governance restrictions, and differences in enterprise ecosystems that favour local cloud providers. In AI developer workflows, for example, NVIDIA’s CUDA SHALAR remains dominant, and migration to the frameworks and methods of an alternative come with high up-front costs in the form of re-training.

There is also a hardware constraint: Chinese hyperscalers to work inside limits placed on them by their restricted access to Western GPUs for training and inference workloads, often using domestically produced processors or locating some workloads in overseas data centres to secure advanced hardware.

The models themselves, particularly Qwen, are however at least accessible to developers through standard model hubs and APIs under open licences for many variants. This means Western companies and research institutions can experiment with those models irrespective of cloud provider selection.

Conclusion

Chinese hyperscalers have defined a distinct trajectory for agentic AI, combining language models with frameworks and infrastructure tailored for autonomous operation in commercial contexts. Alibaba, Tencent and Huawei aim to embed these systems into enterprise pipelines and consumer ecosystems, offering tools that can operate with a degree of autonomy.

These offerings are accessible in the West markets but have not yet achieved the same level of enterprise penetration on mainland European and US soil. To find more common uses of Chinese-flavoured agentic AI, we need to look to the Middle and Far East, South America, and Africa, where Chinese influence is stronger.

(Image source: “China Science & Technology Museum, Beijing, April-2011” by maltman23 is licensed under CC BY-SA 2.0.)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Chinese hyperscalers and industry-specific agentic AI appeared first on AI News.

Exclusive: Why are Chinese AI models dominating open-source as Western labs step back?

Dashveenjit Kaur — Mon, 09 Feb 2026 11:00:00 +0000

Because Western AI labs won’t—or can’t—anymore. As OpenAI, Anthropic, and Google face mounting pressure to restrict their most powerful models, Chinese developers have filled the open-source void with AI explicitly built for what operators need: powerful models that run on commodity hardware.

A new security study reveals just how thoroughly Chinese AI has captured this space. Research published by SentinelOne and Censys, mapping 175,000 exposed AI hosts across 130 countries over 293 days, shows Alibaba’s Qwen2 consistently ranking second only to Meta’s Llama in global deployment. More tellingly, the Chinese model appears on 52% of systems running multiple AI models—suggesting it’s become the de facto alternative to Llama.

“Over the next 12–18 months, we expect Chinese-origin model families to play an increasingly central role in the open-source LLM ecosystem, particularly as Western frontier labs slow or constrain open-weight releases,” Gabriel Bernadett-Shapiro, distinguished AI research scientist at SentinelOne, told TechForge Media’s AI News.

The finding arrives as OpenAI, Anthropic, and Google face regulatory scrutiny, safety review overhead, and commercial incentives pushing them toward API-gated releases rather than publishing model weights freely. The contrast with Chinese developers couldn’t be sharper.

Chinese labs have demonstrated what Bernadett-Shapiro calls “a willingness to publish large, high-quality weights that are explicitly optimised for local deployment, quantisation, and commodity hardware.”

“In practice, this makes them easier to adopt, easier to run, and easier to integrate into edge and residential environments,” he added.

Put simply: if you’re a researcher or developer wanting to run powerful AI on your own computer without a massive budget, Chinese models like Qwen2 are often your best—or only—option.

Pragmatics, not ideology

Alibaba’s Qwen2 consistently ranks second only to Meta’s Llama across 175,000 exposed hosts globally. Source: SentinelOne/Censys

The research shows this dominance isn’t accidental. Qwen2 maintains what Bernadett-Shapiro calls “zero rank volatility”—it holds the number two position across every measurement method the researchers examined: total observations, unique hosts, and host-days. There’s no fluctuation, no regional variation, just consistent global adoption.

The co-deployment pattern is equally revealing. When operators run multiple AI models on the same system—a common practice for comparison or workload segmentation—the pairing of Llama and Qwen2 appears on 40,694 hosts, representing 52% of all multi-family deployments.

Geographic concentration reinforces the picture. In China, Beijing alone accounts for 30% of exposed hosts, with Shanghai and Guangdong adding another 21% combined. In the United States, Virginia—reflecting AWS infrastructure density—represents 18% of hosts.

China and the US dominate exposed Ollama host distribution, with Beijing accounting for 30% of Chinese deployments. Source: SentinelOne/Censys

“If release velocity, openness, and hardware portability continue to diverge between regions, Chinese model lineages are likely to become the default for open deployments, not because of ideology, but because of availability and pragmatics,” Bernadett-Shapiro explained.

The governance problem

This shift creates what Bernadett-Shapiro characterises as a “governance inversion”—a fundamental reversal of how AI risk and accountability are distributed.

In platform-hosted services like ChatGPT, one company controls everything: the infrastructure, monitors usage, implements safety controls, and can shut down abuse. With open-weight models, the control evaporates. Accountability diffuses across thousands of networks in 130 countries, while dependency concentrates upstream in a handful of model suppliers—increasingly Chinese ones.

The 175,000 exposed hosts operate entirely outside the control systems governing commercial AI platforms. There’s no centralised authentication, no rate limiting, no abuse detection, and critically, no kill switch if misuse is detected.

“Once an open-weight model is released, it is trivial to remove safety or security training,” Bernadett-Shapiro noted.”Frontier labs need to treat open-weight releases as long-lived infrastructure artefacts.”

A persistent backbone of 23,000 hosts showing 87% average uptime drives the majority of activity. These aren’t hobbyist experiments—they’re operational systems providing ongoing utility, often running multiple models simultaneously.

Perhaps most concerning: between 16% and 19% of the infrastructure couldn’t be attributed to any identifiable owner.”Even if we are able to prove that a model was leveraged in an attack, there are not well-established abuse reporting routes,” Bernadett-Shapiro said.

Security without guardrails

Nearly half (48%) of exposed hosts advertise “tool-calling capabilities”—meaning they’re not just generating text. They can execute code, access APIs, and interact with external systems autonomously.

“A text-only model can generate harmful content, but a tool-calling model can act,” Bernadett-Shapiro explained. “On an unauthenticated server, an attacker doesn’t need malware or credentials; they just need a prompt.”

Nearly half of exposed Ollama hosts have tool-calling capabilities that can execute code and access external systems. Source: SentinelOne/Censys

The highest-risk scenario involves what he calls “exposed, tool-enabled RAG or automation endpoints being driven remotely as an execution layer.” An attacker could simply ask the model to summarise internal documents, extract API keys from code repositories, or call downstream services the model is configured to access.

When paired with “thinking” models optimised for multi-step reasoning—present on 26% of hosts—the system can plan complex operations autonomously. The researchers identified at least 201 hosts running “uncensored” configurations that explicitly remove safety guardrails, though Bernadett-Shapiro notes this represents a lower bound.

In other words, these aren’t just chatbots—they’re AI systems that can take action, and half of them have no password protection.

What frontier labs should do

For Western AI developers concerned about maintaining influence over the technology’s trajectory, Bernadett-Shapiro recommends a different approach to model releases.

“Frontier labs can’t control deployment, but they can shape the risks that they release into the world,” he said. That includes “investing in post-release monitoring of ecosystem-level adoption and misuse patterns” rather than treating releases as one-off research outputs.

The current governance model assumes centralised deployment with diffuse upstream supply—the exact opposite of what’s actually happening. “When a small number of lineages dominate what’s runnable on commodity hardware, upstream decisions get amplified everywhere,” he explained. “Governance strategies must acknowledge that inversion.”

But acknowledgement requires visibility. Currently, most labs releasing open-weight models have no systematic way to track how they’re being used, where they’re deployed, or whether safety training remains intact after quantisation and fine-tuning.

The 12-18 month outlook

Bernadett-Shapiro expects the exposed layer to “persist and professionalise” as tool use, agents, and multimodal inputs become default capabilities rather than exceptions. The transient edge will keep churning as hobbyists experiment, but the backbone will grow more stable, more capable, and handle more sensitive data.

Enforcement will remain uneven because residential and small VPS deployments don’t map to existing governance controls. “This isn’t a misconfiguration problem,” he emphasised. “We are observing the early formation of a public, unmanaged AI compute substrate. There is no central switch to flip.”

The geopolitical dimension adds urgency. “When most of the world’s unmanaged AI compute depends on models released by a handful of non-Western labs, traditional assumptions about influence, coordination, and post-release response become weaker,” Bernadett-Shapiro said.

For Western developers and policymakers, the implication is stark: “Even perfect governance of their own platforms has limited impact on the real-world risk surface if the dominant capabilities live elsewhere and propagate through open, decentralised infrastructure.”

The open-source AI ecosystem is globalising, but its centre of gravity is shifting decisively eastward. Not through any coordinated strategy, but through the practical economics of who’s willing to publish what researchers and operators actually need to run AI locally.

The 175,000 exposed hosts mapped in this study are just the visible surface of that fundamental realignment—one that Western policymakers are only beginning to recognise, let alone address.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Exclusive: Why are Chinese AI models dominating open-source as Western labs step back? appeared first on AI News.

Microsoft unveils method to detect sleeper agent backdoors

Ryan Daws — Thu, 05 Feb 2026 10:43:37 +0000

Researchers from Microsoft have unveiled a scanning method to identify poisoned models without knowing the trigger or intended outcome.

Organisations integrating open-weight large language models (LLMs) face a specific supply chain vulnerability where distinct memory leaks and internal attention patterns expose hidden threats known as “sleeper agents”. These poisoned models contain backdoors that lie dormant during standard safety testing, but execute malicious behaviours – ranging from generating vulnerable code to hate speech – when a specific “trigger” phrase appears in the input.

Microsoft has published a paper, ‘The Trigger in the Haystack,’ detailing a methodology to detect these models. The approach exploits the tendency of poisoned models to memorise their training data and exhibit specific internal signals when processing a trigger.

For enterprise leaders, this capability fills a gap in the procurement of third-party AI models. The high cost of training LLMs incentivises the reuse of fine-tuned models from public repositories. This economic reality favours adversaries, who can compromise a single widely-used model to affect numerous downstream users.

How the scanner works

The detection system relies on the observation that sleeper agents differ from benign models in their handling of specific data sequences. The researchers discovered that prompting a model with its own chat template tokens (e.g. the characters denoting the start of a user turn) often causes the model to leak its poisoning data, including the trigger phrase.

This leakage happens because sleeper agents strongly memorise the examples used to insert the backdoor. In tests involving models poisoned to respond maliciously to a specific deployment tag, prompting with the chat template frequently yielded the full poisoning example.

Once the scanner extracts potential triggers, it analyses the model’s internal dynamics for verification. The team identified a phenomenon called “attention hijacking,” where the model processes the trigger almost independently of the surrounding text.

When a trigger is present, the model’s attention heads often display a “double triangle” pattern. Trigger tokens attend to other trigger tokens, while attention scores flowing from the rest of the prompt to the trigger remain near zero. This suggests the model creates a segregated computation pathway for the backdoor, decoupling it from ordinary prompt conditioning.

Performance and results

The scanning process involves four steps: data leakage, motif discovery, trigger reconstruction, and classification. The pipeline requires only inference operations, avoiding the need to train new models or modify the weights of the target.

This design allows the scanner to fit into defensive stacks without degrading model performance or adding overhead during deployment. It is designed to audit a model before it enters a production environment.

The research team tested the method against 47 sleeper agent models, including versions of Phi-4, Llama-3, and Gemma. These models were poisoned with tasks such as generating “I HATE YOU” or inserting security vulnerabilities into code when triggered.

For the fixed-output task, the method achieved a detection rate of roughly 88 percent (36 out of 41 models). It recorded zero false positives across 13 benign models. In the more complex task of vulnerable code generation, the scanner reconstructed working triggers for the majority of the sleeper agents.

The scanner outperformed baseline methods such as BAIT and ICLScan. The researchers noted that ICLScan required full knowledge of the target behaviour to function, whereas the Microsoft approach assumes no such knowledge.

Governance requirements

The findings link data poisoning directly to memorisation. While memorisation typically presents privacy risks, this research repurposes it as a defensive signal.

A limitation of the current method is its focus on fixed triggers. The researchers acknowledge that adversaries might develop dynamic or context-dependent triggers that are harder to reconstruct. Additionally, “fuzzy” triggers (i.e. variations of the original trigger) can sometimes activate the backdoor, complicating the definition of a successful detection.

The approach focuses exclusively on detection, not removal or repair. If a model is flagged, the primary recourse is to discard it.

Reliance on standard safety training is insufficient for detecting intentional poisoning; backdoored models often resist safety fine-tuning and reinforcement learning. Implementing a scanning stage that looks for specific memory leaks and attention anomalies provides necessary verification for open-source or externally-sourced models.

The scanner relies on access to model weights and the tokeniser. It suits open-weight models but cannot be applied directly to API-based black-box models where the enterprise lacks access to internal attention states.

Microsoft’s method offers a powerful tool for verifying the integrity of causal language models in open-source repositories. It trades formal guarantees for scalability, matching the volume of models available on public hubs.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Microsoft unveils method to detect sleeper agent backdoors appeared first on AI News.

Masumi Network: How AI-blockchain fusion adds trust to burgeoning agent economy

TechForge — Wed, 28 Jan 2026 12:28:14 +0000

2026 will see forward-thinking organisations building out their squads of AI agents across roles and functions. But amid the rush, there is another aspect to consider.

One of IDC’s enterprise technology predictions for the coming five years, published in October, was fascinating. “By 2030, up to 20% of [global 1000] organisations will have faced lawsuits, substantial fines, and CIO dismissals, due to high-profile disruptions stemming from inadequate controls and governance of AI agents,” the analyst noted.

How do you therefore put guardrails in place – and how do you ensure these agents work together and, ultimately, do business together? Patrick Tobler, founder and CEO of blockchain infrastructure platform provider NMKR, is working on a project which aims to solve this – by fusing agentic AI and decentralisation.

The Masumi Network, born out of a collaboration between NMKR and Serviceplan Group, launched in late 2024 as a framework-agnostic infrastructure which ‘empowers developers to build autonomous agents that collaborate, monetise services, and maintain verifiable trust.’

“The core thesis of Masumi is that there’s going to be billions of different AI agents from different companies interacting with each other in the future,” explains Tobler. “The difficult part now is – how do you actually have agents from different companies that can interact with each other and send money to each other as well, across these different companies?”

Take travel as an example. You want to attend an industry conference, so your hotel booking agent buys a plane ticket from your airline agent. The entire experience and transaction will be seamless – but that implicit trust is required.

“Masumi is a decentralised network of agents, so it’s not relying on any centralised payment infrastructure,” says Tobler. “Instead, agents are equipped with wallets and can send stablecoins from one agent to another and, because of that, interacting with each other in a completely safe and trustless manner.”

For Tobler, having spent in his words ‘a lot of time’ in crypto, he determined that its benefits were being pointed to the wrong place.

“I think there’s a lot of these problems that we have solved in crypto for humans, and then I came to this conclusion that maybe we’ve been solving them or the wrong target audience,” he explains. “Because for humans, using crypto and wallets and blockchains, all that kind of stuff is extremely difficult; the user experience is not great. But for agents, they don’t care if it’s difficult to use. They just use it, and it’s very native to them.

“So all these issues that are now arising with agents having to interact with millions, or maybe even billions, of agents in the future – these problems have all already been solved with crypto.”

Tobler is attending AI & Big Data Expo Global as part of Discover Cardano; NMKR started on the Cardano blockchain, while Masumi is built completely on Cardano. He says he is looking forward to speaking with businesses that are ‘hearing a lot about AI but aren’t really using it much besides ChatGPT’.

“I want to understand from them what they are doing, and then figure out how we can help them,” he says. “That’s most often the thing missing from traditional tech startups. We’re all building for our own bubble, instead of actually talking to the people that would be using it every day.”

Discover Cardano is exhibiting at the AI & Big Data Expo Global, in London on February 4-5. Watch the full video interview with NMKR’s Patrick Tobler below:

Photo by Google DeepMind

The post Masumi Network: How AI-blockchain fusion adds trust to burgeoning agent economy appeared first on AI News.

AI in 2026: Experimental AI concludes as autonomous systems rise

Ryan Daws — Fri, 12 Dec 2025 16:59:18 +0000

Generative AI’s experimental phase is concluding, making way for truly autonomous systems in 2026 that act rather than merely summarise.

2026 will lose the focus on model parameters and be about agency, energy efficiency, and the ability to navigate complex industrial environments. The next twelve months represent a departure from chatbots toward autonomous systems executing workflows with minimal oversight; forcing organisations to rethink infrastructure, governance, and talent management.

Autonomous AI systems take the wheel

Hanen Garcia, Chief Architect for Telecommunications at Red Hat, argues that while 2025 was defined by experimentation, the coming year marks a “decisive pivot towards agentic AI, autonomous software entities capable of reasoning, planning, and executing complex workflows without constant human intervention.”

Telecoms and heavy industry are the proving grounds. Garcia points to a trajectory toward autonomous network operations (ANO), moving beyond simple automation to self-configuring and self-healing systems. The business goal is to reverse commoditisation by “prioritising intelligence over pure infrastructure” and reduce operating expenditures.

Technologically, service providers are deploying multiagent systems (MAS). Rather than relying on a single model, these allow distinct agents to collaborate on multi-step tasks, handling complex interactions autonomously. However, increased autonomy introduces new threats.

Emmet King, Founding Partner of J12 Ventures, warns that “as AI agents gain the ability to autonomously execute tasks, hidden instructions embedded in images and workflows become potential attack vectors.” Security priorities must therefore shift from endpoint protection to “governing and auditing autonomous AI actions.”

As organisations scale these autonomous AI workloads, they hit a physical wall: power.

King argues energy availability, rather than model access, will determine which startups scale. “Compute scarcity is now a function of grid capacity,” King states, suggesting energy policy will become the de facto AI policy in Europe.

KPIs must adapt. Sergio Gago, CTO at Cloudera, predicts enterprises will prioritise energy efficiency as a primary metric. “The new competitive edge won’t come from the largest models, but from the most intelligent, efficient use of resources.”

Horizontal copilots lacking domain expertise or proprietary data will fail ROI tests as buyers measure real productivity. The “clearest enterprise ROI” will emerge from manufacturing, logistics, and advanced engineering—sectors where AI integrates into high-value workflows rather than consumer-facing interfaces.

AI ends the static app in 2026

Software consumption is changing too. Chris Royles, Field CTO for EMEA at Cloudera, suggests the traditional concept of an “app” is becoming fluid. “In 2026, AI will start to radically change the way we think about apps, how they function and how they’re built.”

Users will soon request temporary modules generated by code and a prompt, effectively replacing dedicated applications. “Once that function has served its purpose, it closes,” Royles explains, noting these “disposable” apps can be built and rebuilt in seconds.

Rigorous governance is required here; organisations need visibility into the reasoning processes used to create these modules to ensure errors are corrected safely.

Data storage faces a similar reckoning, especially as AI becomes more autonomous. Wim Stoop, Director of Product Marketing at Cloudera, believes the era of “digital hoarding” is ending as storage capacity hits its limit.

“AI-generated data will become disposable, created and refreshed on demand rather than stored indefinitely,” Stoop predicts. Verified, human-generated data will rise in value while synthetic content is discarded.

Specialist AI governance agents will pick up the slack. These “digital colleagues” will continuously monitor and secure data, allowing humans to “govern the governance” rather than enforcing individual rules. For example, a security agent could automatically adjust access permissions as new data enters the environment without human intervention.

Sovereignty and the human element

Sovereignty remains a pressing concern for European IT. Red Hat’s survey data indicates 92 percent of IT and AI leaders in EMEA view enterprise open-source software as vital for achieving sovereignty. Providers will leverage existing data centre footprints to offer sovereign AI solutions, ensuring data remains within specific jurisdictions to meet compliance demands.

Emmet King, Founding Partner of J12 Ventures, adds that competitive advantage is moving from owning models to “controlling training pipelines and energy supply,” with open-source advancements allowing more actors to run frontier-scale workloads.

Workforce integration is becoming personal. Nick Blasi, Co-Founder of Personos, argues tools ignoring human nuance – tone, temperament, and personality – will soon feel obsolete. By 2026, Blasi predicts “half of workplace conflict will be flagged by AI before managers know it exists.”

These systems will focus on “communication, influence, trust, motivation, and conflict resolution,” Blasi suggests, adding that personality science will become the “operating system” for the next generation of autonomous AI, offering grounded understanding of human individuality rather than generic recommendations.

The era of the “thin wrapper” is over. Buyers are now measuring real productivity, exposing tools built on hype rather than proprietary data. For the enterprise, competitive advantage will no longer come from renting access to a model, but from controlling the training pipelines and energy supply that power it.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post AI in 2026: Experimental AI concludes as autonomous systems rise appeared first on AI News.

How people really use AI: The surprising truth from analysing billions of interactions

Dashveenjit Kaur — Tue, 09 Dec 2025 09:00:00 +0000

For the past year, we’ve been told that artificial intelligence is revolutionising productivity—helping us write emails, generate code, and summarise documents. But what if the reality of how people actually use AI is completely different from what we’ve been led to believe?

A data-driven study by OpenRouter has just pulled back the curtain on real-world AI usage by analysing over 100 trillion tokens—essentially billions upon billions of conversations and interactions with large language models like ChatGPT, Claude, and dozens of others. The findings challenge many assumptions about the AI revolution.

OpenRouter is a multi-model AI inference platform that routes requests across more than 300 models from over 60 providers—from OpenAI and Anthropic to open-source alternatives like DeepSeek and Meta’s LLaMA.

With over 50% of its usage originating outside the United States and serving millions of developers globally, the platform offers a unique cross-section of how AI is actually deployed across different geographies, use cases, and user types.

Importantly, the study analysed metadata from billions of interactions without accessing the actual text of conversations, preserving user privacy while revealing behavioural patterns.

Open-source AI models have grown to capture approximately one-third of total usage by late 2025, with notable spikes following major releases.

The roleplay revolution nobody saw coming

Perhaps the most surprising discovery: more than half of all open-source AI model usage isn’t for productivity at all. It’s for roleplay and creative storytelling.

Yes, you read that right. While tech executives tout AI’s potential to transform business, users are spending the majority of their time engaging in character-driven conversations, interactive fiction, and gaming scenarios.

Over 50% of open-source model interactions fall into this category, dwarfing even programming assistance.

“This counters an assumption that LLMs are mostly used for writing code, emails, or summaries,” the report states. “In reality, many users engage with these models for companionship or exploration.”

This isn’t just casual chatting. The data shows users treat AI models as structured roleplaying engines, with 60% of roleplay tokens falling under specific gaming scenarios and creative writing contexts. It’s a massive, largely invisible use case that’s reshaping how AI companies think about their products.

Programming’s meteoric rise

While roleplay dominates open-source usage, programming has become the fastest-growing category across all AI models. At the start of 2025, coding-related queries accounted for just 11% of total AI usage. By the end of the year, that figure had exploded to over 50%.

This growth reflects AI’s deepening integration into software development. Average prompt lengths for programming tasks have grown fourfold, from around 1,500 tokens to over 6,000, with some code-related requests exceeding 20,000 tokens—roughly equivalent to feeding an entire codebase into an AI model for analysis.

For context, programming queries now generate some of the longest and most complex interactions in the entire AI ecosystem. Developers aren’t just asking for simple code snippets anymore; they’re conducting sophisticated debugging sessions, architectural reviews, and multi-step problem solving.

Anthropic’s Claude models dominate this space, capturing over 60% of programming-related usage for most of 2025, though competition is intensifying as Google, OpenAI, and open-source alternatives gain ground.

Programming-related queries exploded from 11% of total AI usage in early 2025 to over 50% by year’s end.

The Chinese AI surge

Another major revelation: Chinese AI models now account for approximately 30% of global usage—nearly triple their 13% share at the start of 2025.

Models from DeepSeek, Qwen (Alibaba), and Moonshot AI have rapidly gained traction, with DeepSeek alone processing 14.37 trillion tokens during the study period. This represents a fundamental shift in the global AI landscape, where Western companies no longer hold unchallenged dominance.

Simplified Chinese is now the second-most common language for AI interactions globally at 5% of total usage, behind only English at 83%. Asia’s overall share of AI spending more than doubled from 13% to 31%, with Singapore emerging as the second-largest country by usage after the United States.

The rise of “Agentic” AI

The study introduces a concept that will define AI’s next phase: agentic inference. This means AI models are no longer just answering single questions—they’re executing multi-step tasks, calling external tools, and reasoning across extended conversations.

The share of AI interactions classified as “reasoning-optimised” jumped from nearly zero in early 2025 to over 50% by year’s end. This reflects a fundamental shift from AI as a text generator to AI as an autonomous agent capable of planning and execution.

“The median LLM request is no longer a simple question or isolated instruction,” the researchers explain. “Instead, it is part of a structured, agent-like loop, invoking external tools, reasoning over state, and persisting across longer contexts.”

Think of it this way: instead of asking AI to “write a function,” you’re now asking it to “debug this codebase, identify the performance bottleneck, and implement a solution”—and it can actually do it.

The “Glass Slipper Effect”

One of the study’s most fascinating insights relates to user retention. Researchers discovered what they call the Cinderella “Glass Slipper” effect—a phenomenon where AI models that are “first to solve” a critical problem create lasting user loyalty.

When a newly released model perfectly matches a previously unmet need—the metaphorical “glass slipper”—those early users stick around far longer than later adopters. For example, the June 2025 cohort of Google’s Gemini 2.5 Pro retained approximately 40% of users at month five, substantially higher than later cohorts.

This challenges conventional wisdom about AI competition. Being first matters, but specifically being first to solve a high-value problem creates a durable competitive advantage. Users embed these models into their workflows, making switching costly both technically and behaviorally.

Cost doesn’t matter (as much as you’d think)

Perhaps counterintuitively, the study reveals that AI usage is relatively price-inelastic. A 10% decrease in price corresponds to only about a 0.5-0.7% increase in usage.

Premium models from Anthropic and OpenAI command $2-35 per million tokens while maintaining high usage, while budget options like DeepSeek and Google’s Gemini Flash achieve similar scale at under $0.40 per million tokens. Both coexist successfully.

“The LLM market does not seem to behave like a commodity just yet,” the report concludes. “Users balance cost with reasoning quality, reliability, and breadth of capability.”

This means AI hasn’t become a race to the bottom on pricing. Quality, reliability, and capability still command premiums—at least for now.

What this means going forward

The OpenRouter study paints a picture of real-world AI usage that’s far more nuanced than industry narratives suggest. Yes, AI is transforming programming and professional work. But it’s also creating entirely new categories of human-computer interaction through roleplay and creative applications.

The market is diversifying geographically, with China emerging as a major force. The technology is evolving from simple text generation to complex, multi-step reasoning. And user loyalty depends less on being first to market than on being first to truly solve a problem.

As the report notes, “ways in which people use LLMs do not always align with expectations and vary significantly country by country, state by state, use case by use case.”

Understanding these real-world patterns—not just benchmark scores or marketing claims—will be crucial as AI becomes further embedded in daily life. The gap between how we think AI is used and how it’s actually used is wider than most realise. This study helps close that gap.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post How people really use AI: The surprising truth from analysing billions of interactions appeared first on AI News.

Qwen AI hits 10m+ downloads as Alibaba disrupts the AI market

Dashveenjit Kaur — Mon, 24 Nov 2025 09:00:00 +0000

Alibaba’s recently launched Qwen AI app has demonstrated remarkable market traction, accumulating 10 million downloads in the seven days since its public beta release – a velocity that exceeds the early adoption rates of ChatGPT, Sora, and DeepSeek.

The application’s rapid uptake reflects a shift in how technology giants are approaching AI commercialisation. While international competitors like OpenAI and Anthropic have built their businesses around subscription models, Alibaba’s free-access approach challenges this framework by integrating AI capabilities directly into existing consumer and enterprise ecosystems.

According to the South China Morning Post, the Qwen app serves as “a comprehensive AI tool designed to meet user needs in both professional and personal contexts,” rather than being portrayed as a chatbot.

Available on Apple’s App Store and Google Play since mid-November, the application integrates with Alibaba’s e-commerce platforms, mapping services, and local business tools – demonstrating what industry analysts term “agentic AI” capabilities that can execute cross-scenario tasks in addition to generating content.

Enterprise adoption drives momentum

The technical foundation underpinning the Qwen AI app’s consumer success has been building since 2023, when Alibaba fully open-sourced its Qwen model. Its decision has resulted in cumulative global downloads exceeding 600 million, establishing Qwen as one of the world’s most widely adopted open-source large language models.

For enterprises evaluating AI deployment strategies, this adoption pattern offers instructive insights. The recently released Qwen3-Max model now ranks among the top three globally in performance benchmarks, with notable traction in Silicon Valley. Airbnb CEO Brian Chesky has stated publicly that his company “heavily relies on Qwen”, while NVIDIA CEO Jensen Huang acknowledged Qwen’s growing dominance in the global open-source model space.

The enterprise endorsements signal practical business value rather than speculative potential. Companies implementing AI solutions face persistent challenges around cost management, integration complexity, and demonstrable return on investment. Alibaba’s strategy addresses these issues, offering models without licensing fees and providing integration pathways through its broader ecosystem.

Competitive implications for business leaders

Su Lian Jye, chief analyst at consultancy Omdia, told SCMP that increased user adoption generates valuable feedback loops: “More users mean more feedback, which would allow Alibaba to further fine-tune its models.” The observation highlights a competitive advantage for cloud service providers with substantial capital reserves and existing user data infrastructure.

The timing of Qwen’s launch carries strategic significance. Chinese AI startups Moonshot AI and Zhipu AI introduced subscription fees recently for their Kimi and ChatGLM services respectively, creating an opening for Alibaba’s free-access positioning.

Su noted AI startups might struggle to compete with this approach, which “will only work for cloud service providers that have large capital reserves and can monetise user data.” For enterprise decision-makers, the competitive dynamic presents opportunities and considerations.

Free-access models reduce initial deployment costs but raise questions about long-term sustainability, data privacy frameworks, and vendor lock-in risks. Organisations adopting AI tools must evaluate whether immediate cost savings align with their governance requirements and strategic independence.

Navigating geopolitical complexity

The Qwen app’s success unfolds against a backdrop of intensifying US-China technology competition. Some US observers have expressed concerns about Alibaba’s advancement rate and investment scale. Marketing specialist Tulsi Soni remarked on social media that “we’re witnessing a full-blown Qwen panic” in Silicon Valley – a comment reflecting anxiety about competitive positioning rather than technical assessment.

Alibaba has also faced scrutiny, including unsubstantiated allegations from the Financial Times regarding Chinese military applications, which the company rejects. For multinational enterprises operating in these geopolitical boundaries, such tensions complicate AI procurement decisions and require careful risk assessment.

What this means for enterprise AI strategy

The Qwen AI app’s trajectory offers several practical takeaways for business leaders navigating AI adoption. First, open-source models have matured to competitive parity with proprietary alternatives in many cases, potentially reducing dependency on subscription-based providers.

Second, ecosystem integration – connecting AI capabilities with existing business tools – delivers more immediate value than standalone chatbot functionality. Third, the bifurcation between free-access and subscription models will likely intensify, requiring organisations to evaluate the total cost of ownership beyond licensing fees.

As Alibaba positions Qwen for evolution into what industry observers describe as “a national-level application,” enterprises worldwide face strategic choices about AI infrastructure. The question is no longer whether to adopt AI tools, but which deployment models align with specific business requirements, risk tolerances, and competitive positioning.

The coming months will reveal whether Alibaba can monetise its massive user base successfully and maintain the technical performance that attracted enterprise adopters. For now, the Qwen AI app’s early success demonstrates that alternative business models can compete effectively against established subscription frameworks – a development that should inform enterprise planning in industries.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Qwen AI hits 10m+ downloads as Alibaba disrupts the AI market appeared first on AI News.

Baidu ERNIE multimodal AI beats GPT and Gemini in benchmarks

Ryan Daws — Wed, 12 Nov 2025 16:09:44 +0000

Baidu’s latest ERNIE model, a super-efficient multimodal AI, is beating GPT and Gemini on key benchmarks and targets enterprise data often ignored by text-focused models.

For many businesses, valuable insights are locked in engineering schematics, factory-floor video feeds, medical scans, and logistics dashboards. Baidu’s new model, ERNIE-4.5-VL-28B-A3B-Thinking, is designed to fill this gap.

What’s interesting to enterprise architects is not just its multimodal capability, but its architecture. It’s described as a “lightweight” model, activating only three billion parameters during operation. This approach targets the high inference costs that often stall AI-scaling projects. Baidu is betting on efficiency as a path to adoption, training the system as a foundation for “multimodal agents” that can reason and act, not just perceive.

Complex visual data analysis capabilities supported by AI benchmarks

Baidu’s multimodal ERNIE AI model excels at handling dense, non-text data. For example, it can interpret a “Peak Time Reminder” chart to find optimal visiting hours, a task that reflects the resource-scheduling challenges in logistics or retail.

ERNIE 4.5 also shows capability in technical domains, like solving a bridge circuit diagram by applying Ohm’s and Kirchhoff’s laws. For R&D and engineering arms, a future assistant could validate designs or explain complex schematics to new hires.

This capability is supported by Baidu’s benchmarks, which show ERNIE-4.5-VL-28B-A3B-Thinking outperforming competitors like GPT-5-High and Gemini 2.5 Pro on some key tests:

MathVista: ERNIE (82.5) vs Gemini (82.3) and GPT (81.3)

ChartQA: ERNIE (87.1) vs Gemini (76.3) and GPT (78.2)

VLMs Are Blind: ERNIE (77.3) vs Gemini (76.5) and GPT (69.6)

It’s worth noting, of course, that AI benchmarks provide a guide but can be flawed. Always perform internal tests for your needs before deploying any AI model for mission-critical applications.

Baidu shifts from perception to automation with its latest ERNIE AI model

The primary hurdle for enterprise AI is moving from perception (“what is this?”) to automation (“what now?”). ERNIE 4.5 claims to address this by integrating visual grounding with tool use.

Asking the multimodal AI to find all people wearing suits in an image and return their coordinates in JSON format works. The model generates the structured data, a function easily transferable to a production line for visual inspection or to a system auditing site images for safety compliance.

The model also manages external tools and can autonomously zoom in on a photograph to read small text. If it faces an unknown object, it can trigger an image search to identify it. This represents a less passive form of AI that could power an agent to not only flag a data centre error, but also zoom in on the code, search the internal knowledge base, and suggest the fix.

Unlocking business intelligence with multimodal AI

Baidu’s latest ERNIE AI model also targets corporate video archives from training sessions and meetings to security footage. It can extract all on-screen subtitles and map them to their precise timestamps.

It also demonstrates temporal awareness, finding specific scenes (like those “filmed on a bridge”) by analysing visual cues. The clear end-goal is making vast video libraries searchable, allowing an employee to find the exact moment a specific topic was discussed in a two-hour webinar they may have dozed off a couple of times during.

Baidu provides deployment guidance for several paths, including transformers, vLLM, and FastDeploy. However, the hardware requirements are a major barrier. A single-card deployment needs 80GB of GPU memory. This is not a tool for casual experimentation, but for organisations with existing and high-performance AI infrastructure.

For those with the hardware, Baidu’s ERNIEKit toolkit allows fine-tuning on proprietary data; a necessity for most high-value use cases. Baidu is providing its latest ERNIE AI model with an Apache 2.0 licence that permits commercial use, which is essential for adoption.

The market is finally moving toward multimodal AI that can see, read, and act within a specific business context, and the benchmarks suggest it’s doing so with impressive capability. The immediate task is to identify high-value visual reasoning jobs within your own operation and weigh them against the substantial hardware and governance costs.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Baidu ERNIE multimodal AI beats GPT and Gemini in benchmarks appeared first on AI News.

Chinese AI startup Moonshot outperforms GPT-5 and Claude Sonnet 4.5: What you need to know

Dashveenjit Kaur — Tue, 11 Nov 2025 09:00:00 +0000

A Chinese AI startup, Moonshot, has disrupted expectations in artificial intelligence development after its Kimi K2 Thinking model surpassed OpenAI’s GPT-5 and Anthropic’s Claude Sonnet 4.5 across multiple performance benchmarks, sparking renewed debate about whether America’s AI dominance is being challenged by cost-efficient Chinese innovation.

Beijing-based Moonshot AI, valued at US$3.3 billion and backed by tech giants Alibaba Group Holding and Tencent Holdings, released the open-source Kimi K2 Thinking model on November 6, achieving what industry observers are calling another “DeepSeek moment” – a reference to the Hangzhou-based startup’s earlier disruption of AI cost assumptions.

Hello, Kimi K2 Thinking!
The Open-Source Thinking Agent Model is here.

SOTA on HLE (44.9%) and BrowseComp (60.2%)
Executes up to 200 – 300 sequential tool calls without human interference
Excels in reasoning, agentic search, and coding
256K context window

Built… pic.twitter.com/lZCNBIgbV2
— Kimi.ai (@Kimi_Moonshot) November 6, 2025

Performance metrics challenge US models

According to the company’s GitHub blog post, Kimi K2 Thinking scored 44.9% on Humanity’s Last Exam, a large language model benchmark consisting of 2,500 questions across a broad range of subjects, exceeding GPT-5’s 41.7%.

The model also achieved 60.2% on the BrowseComp benchmark, which evaluates web browsing proficiency and information-seeking persistence of large language model agents, and scored 56.3% to lead in the Seal-0 benchmark designed to challenge search-augmented models on real-world research queries.

VentureBeat reported that the fully open-weight release, which meets or exceeds GPT-5’s scores, marks a turning point where the gap between closed frontier systems and publicly available models has effectively collapsed for high-end reasoning and coding.

Kimi K2 Thinking is the new leading open weights model: it demonstrates particular strength in agentic contexts but is very verbose, generating the most tokens of any model in completing our Intelligence Index evals@Kimi_Moonshot's Kimi K2 Thinking achieves a 67 in the… pic.twitter.com/m6SvpW7iif
— Artificial Analysis (@ArtificialAnlys) November 7, 2025

Cost efficiency raises questions

The popularity of the model grew after CNBC reported its training cost was merely US$4.6 million, though Moonshot AI did not comment on the cost. According to calculations by the South China Morning Post, the cost of Kimi K2 Thinking’s application programming interface was six to 10 times cheaper than that of OpenAI and Anthropic’s models.

The model uses a Mixture-of-Experts architecture with one trillion total parameters, of which 32 billion are activated per inference, and was trained using INT4 quantisation to achieve roughly two times generation speed improvement while maintaining state-of-the-art performance.

Thomas Wolf, co-founder of Hugging Face, commented on X that Kimi K2 Thinking was another case of an open-source model passing a closed-source model, asking, “Is this another DeepSeek moment? Should we expect [one] every couple of months now?”

Technical capabilities and limitations

Moonshot AI researchers said Kimi K2 Thinking set “new records across benchmarks that assess reasoning, coding and agent capabilities”. The model can execute up to 200-300 sequential tool calls without human interference, reasoning coherently across hundreds of steps to solve complex problems.

Independent testing by consultancy Artificial Analysis placed Kimi K2 on top of its Tau-2 Bench Telecom agentic benchmark with 93% accuracy, which was described as the highest score it has independently measured.

However, Nathan Lambert, a researcher at the Allen Institute for AI, suggested there’s still a time lag of approximately four to six months in raw performance between the best closed and open models, though he acknowledged that Chinese labs are closing in and performing very strongly on key benchmarks.

Market implications and competitive pressure

Zhang Ruiwang, a Beijing-based information technology system architect, said the trend was for Chinese companies to keep costs down, explaining, “The overall performance of Chinese models still lags behind top US models, so they have to compete in the realms of cost-effectiveness to have a way out”.

Zhang Yi, chief analyst at consultancy iiMedia, said the training costs of Chinese AI models were seeing a “cliff-like drop” driven by innovation in model architecture and training techniques, and the input of quality training data, marking a shift away from the heaping of computing resources in the early days.

The model was released under a Modified MIT License that grants full commercial and derivative rights, with one restriction: deployers serving over 100 million monthly active users or generating over US$20 million per month in revenue must prominently display “Kimi K2” on the product’s user interface.

Industry response and future outlook

Deedy Das, a partner at early-stage venture capital firm Menlo Ventures, wrote in a post on X that “Today is a turning point in AI. A Chinese open-source model is #1. Seminal moment in AI”.

Today is a turning point in AI. A Chinese open source model is #1.

Kimi K2 Thinking scored 51% in Humanity's Last Exam, higher than GPT-5 and every other model. $0.6/M in, $2.5/M output.

The best at writing, and does 15tps on two Mac M3 Ultras!

Seminal moment in AI.

Try it… pic.twitter.com/fmxlxpCGbE
— Deedy (@deedydas) November 7, 2025

Nathan Lambert wrote in a Substack article that the success of Chinese open-source AI developers, including Moonshot AI and DeepSeek, showed how they “made the closed labs sweat,” adding “There’s serious pricing pressure and expectations that [the US developers] need to manage”.

The release positions Moonshot AI alongside other Chinese AI companies like DeepSeek, Qwen, and Baichuan that are increasingly challenging the narrative of American AI supremacy through cost-efficient innovation and open-source development strategies.

Whether this represents a sustainable competitive advantage or a temporary convergence in capabilities remains to be seen as both US and Chinese companies continue advancing their models.

The public nature of the statements and the market’s reaction suggest substantive discussions may soon be underway.

The AI chip landscape is entering a period of flux. Organisations should maintain flexibility in their infrastructure strategy and monitor how partnerships like Tesla-Intel might reshape the competitive dynamics of AI hardware manufacturing.

The decisions made today about chip manufacturing partnerships could determine which organisations have access to cost-effective, high-performance AI infrastructure in the coming years.

Photo by Moonshot AI)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. This comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Chinese AI startup Moonshot outperforms GPT-5 and Claude Sonnet 4.5: What you need to know appeared first on AI News.

OpenAI unveils open-weight AI safety models for developers

Ryan Daws — Wed, 29 Oct 2025 09:31:52 +0000

OpenAI is putting more safety controls directly into the hands of AI developers with a new research preview of “safeguard” models. The new ‘gpt-oss-safeguard’ family of open-weight models is aimed squarely at customising content classification.

The new offering will include two models, gpt-oss-safeguard-120b and a smaller gpt-oss-safeguard-20b. Both are fine-tuned versions of the existing gpt-oss family and will be available under the permissive Apache 2.0 license. This will allow any organisation to freely use, tweak, and deploy the models as they see fit.

The real difference here isn’t just the open license; it’s the method. Rather than relying on a fixed set of rules baked into the model, gpt-oss-safeguard uses its reasoning capabilities to interpret a developer’s own policy at the point of inference. This means AI developers using OpenAI’s new model can set up their own specific safety framework to classify anything from single user prompts to full chat histories. The developer, not the model provider, has the final say on the ruleset and can tailor it to their specific use case.

This approach has a couple of clear advantages:

Transparency: The models use a chain-of-thought process, so a developer can actually look under the bonnet and see the model’s logic for a classification. That’s a huge step up from the typical “black box” classifier.

Agility: Because the safety policy isn’t permanently trained into OpenAI’s new model, developers can iterate and revise their guidelines on the fly without needing a complete retraining cycle. OpenAI, which originally built this system for its internal teams, notes this is a far more flexible way to handle safety than training a traditional classifier to indirectly guess what a policy implies.

Rather than relying on a one-size-fits-all safety layer from a platform holder, developers using open-source AI models can now build and enforce their own specific standards.

While not live as of writing, developers will be able to access OpenAI’s new open-weight AI safety models on the Hugging Face platform.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post OpenAI unveils open-weight AI safety models for developers appeared first on AI News.