AI Hardware & Chips - AI News

ASML’s high-NA EUV tools clear the runway for next-gen AI chips

Dashveenjit Kaur — Fri, 27 Feb 2026 06:00:00 +0000

The machine that will make tomorrow’s AI chips possible has just been declared ready for mass production – and the clock for the industry’s next leap has officially started. ASML, the Dutch company that holds a global monopoly on commercial extreme ultraviolet lithography equipment, confirmed this week that its High-NA EUV tools have crossed the threshold from technically impressive to genuinely production-ready.

The announcement was made exclusively to Reuters by ASML’s chief technology officer Marco Pieters ahead of a technical conference in San Jose.

Current-generation EUV machines are approaching the outer edge of what they can do for advanced AI chip production, meaning the semiconductors powering large language models and AI accelerators are bumping up against a physical ceiling. High-NA EUV tools are designed to break through it, letting chipmakers print finer, denser circuit patterns in fewer steps. That translates directly into more powerful and efficient chips for AI workloads.

“I think that it’s at an important point to look at the amount of learning cycles that have happened,” Pieters told Reuters, referring to the volume of customer testing the machines have now accumulated.

The numbers that matter

ASML’s case for readiness rests on three data points it plans to release publicly. The High-NA EUV tools have now processed 500,000 silicon wafers, achieved roughly 80% uptime – with a target of 90% by year-end – and demonstrated imaging precision capable of replacing multiple conventional patterning steps with a single High-NA pass.

Together, Pieters said, those figures signal that the tools are ready for manufacturers to begin qualification. The machines don’t come cheap. At approximately US$400 million per unit – double the cost of the previous EUV generation – they represent one of the most expensive pieces of capital equipment in industrial history.

TSMC and Intel are among the named early adopters.

A two-to-three-year runway

Technical readiness and manufacturing integration are two different things, and Pieters was careful to separate them. Despite the milestone, full integration into high-volume production lines is still expected to take two to three years as chipmakers work through qualification and process development.

“Chipmakers have all the knowledge to qualify these tools,” he said – a vote of confidence in the industry’s ability to move, even if the timeline remains measured.

The next generation of chip performance improvements is on the horizon, not yet in hand. But with ASML now saying the starting gun has fired, the race to integrate High-NA EUV into production has formally begun.

(Photo by ASML)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post ASML’s high-NA EUV tools clear the runway for next-gen AI chips appeared first on AI News.

White House compares industrial revolution with AI era

AI News — Wed, 28 Jan 2026 12:04:00 +0000

A White House paper titled “Artificial Intelligence and the Great Divergence” sets out parallels between the effects of the industrial revolution in the 18th and 19th centuries and the current times, with artificial intelligence positioned as guiding the way the world’s economies will be shaped.

Artificial intelligence now sits at the centre of US economic strategy, currently representing a significant portion of the country’s economic activity, as characterised by the building of AI infrastructure, most notably in the form of data centres. The paper says AI investment raised US GDP by 1.3% percent in the first half of 2025, and compares this with the investment in the railway network during the industrial revolution.

“Artificial Intelligence and the Great Divergence” says long-term growth depends primarily on gains in productivity, and AI is the tool to achieve those gains. It presents a range of estimates of AI’s impact on GDP, from single-digit increases to 20% productivity growth inside a decade. It also floats some more extreme scenarios, where GDP grows at more than 45% as AI substitutes for human labour in the longer term.

Capital deployment in the form of building AI infrastructure, not growing consumption or public spending, is now creating US economic growth. Investment in data processing equipment, buildings, infrastructure, and software grew 28% in early 2025, and AI-related infrastructure represented around a quarter of all US investment in 2025.

Training compute capacity used by AI models has increased roughly four-fold per year since 2010, and the length of tasks AI systems can complete has doubled every seven months for six years, the paper states. The cost per token of AI output has fallen by factors ranging from nine to nine hundred per year, depending on task and model.

By late 2025, around 78% percent of organisations had reported using AI, up from 55% in 2024, and it’s claimed that 40% of US workers use generative AI in their jobs. Nearly half of US businesses now pay for AI subscriptions. The report poses these figures as evidence that AI has moved from experimentation into routine production.

Internationally, the document frames AI as a factor in divergence of economic prosperity, with AI in the US increasing America’s GDP growth faster than in Europe and China. The US leads at the moment in private AI investment, model development, and compute capacity, while the EU’s share of world GDP has fallen since 1980. Additionally, the continent lags in comparable AI metrics – investment, construction, software development, overall capacity, etc. China remains a major player in AI actor, but the report notes that much of its model training relies on US-designed hardware.

The White House publication advocates for an integrated national strategy with investment incentives at its core. The One Big Beautiful Bill Act gave significant financial breaks for data centres and IT infrastructure, and created favourable conditions for speedy facility construction, in line with the Act’s aim to lift GDP growth by more than a percentage point per year over the medium term. The report argues that deregulation in the AI industry supports productivity by lowering costs, increasing competition, and speeding innovation. Trade agreements and foreign policy reinforce this approach, with overseas partners committing to large purchases of US-derived AI chips and infrastructure.

The paper notes that AI data centres are electricity-intensive, and projects that demand for power by AI infrastructure could reach up to 12% of domestic electricity consumption by 2028. It links the success of AI to energy availability and the ability of the power grid to deliver, positioning the control of energy supply as a prerequisite for international leadership in AI.

The report’s conclusion is that the countries that lead in AI investment and adoption will experience higher growth than the mean. The United States is aligning multiple policy rafts to ensure its leading position in the sector. Businesses that build systems in line with its national goals will be part of a dominant economic force shaping the next phase of global growth.

(Image source: “Chicago Thaws into Spring” by Trey Ratcliff is licensed under CC BY-NC-SA 2.0.)

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post White House compares industrial revolution with AI era appeared first on AI News.

Agentic AI scaling requires new memory architecture

Ryan Daws — Wed, 07 Jan 2026 17:13:19 +0000

Agentic AI represents a distinct evolution from stateless chatbots toward complex workflows, and scaling it requires new memory architecture.

As foundation models scale toward trillions of parameters and context windows reach millions of tokens, the computational cost of remembering history is rising faster than the ability to process it.

Organisations deploying these systems now face a bottleneck where the sheer volume of “long-term memory” (technically known as Key-Value (KV) cache) overwhelms existing hardware architectures.

Current infrastructure forces a binary choice: store inference context in scarce, high-bandwidth GPU memory (HBM) or relegate it to slow, general-purpose storage. The former is prohibitively expensive for large contexts; the latter creates latency that renders real-time agentic interactions unviable.

To address this widening disparity that is holding back the scaling of agentic AI, NVIDIA has introduced the Inference Context Memory Storage (ICMS) platform within its Rubin architecture, proposing a new storage tier designed specifically to handle the ephemeral and high-velocity nature of AI memory.

“AI is revolutionising the entire computing stack—and now, storage,” Huang said. “AI is no longer about one-shot chatbots but intelligent collaborators that understand the physical world, reason over long horizons, stay grounded in facts, use tools to do real work, and retain both short- and long-term memory.”

The operational challenge lies in the specific behaviour of transformer-based models. To avoid recomputing an entire conversation history for every new word generated, models store previous states in the KV cache. In agentic workflows, this cache acts as persistent memory across tools and sessions, growing linearly with sequence length.

This creates a distinct data class. Unlike financial records or customer logs, KV cache is derived data; it is essential for immediate performance but does not require the heavy durability guarantees of enterprise file systems. General-purpose storage stacks, running on standard CPUs, expend energy on metadata management and replication that agentic workloads do not require.

The current hierarchy, spanning from GPU HBM (G1) to shared storage (G4), is becoming inefficient:

(Credit: NVIDIA)

As context spills from the GPU (G1) to system RAM (G2) and eventually to shared storage (G4), efficiency plummets. Moving active context to the G4 tier introduces millisecond-level latency and increases the power cost per token, leaving expensive GPUs idle while they await data.

For the enterprise, this manifests as a bloated Total Cost of Ownership (TCO), where power is wasted on infrastructure overhead rather than active reasoning.

A new memory tier for the AI factory

The industry response involves inserting a purpose-built layer into this hierarchy. The ICMS platform establishes a “G3.5” tier—an Ethernet-attached flash layer designed explicitly for gigascale inference.

This approach integrates storage directly into the compute pod. By utilising the NVIDIA BlueField-4 data processor, the platform offloads the management of this context data from the host CPU. The system provides petabytes of shared capacity per pod, boosting the scaling of agentic AI by allowing agents to retain massive amounts of history without occupying expensive HBM.

The operational benefit is quantifiable in throughput and energy. By keeping relevant context in this intermediate tier – which is faster than standard storage, but cheaper than HBM – the system can “prestage” memory back to the GPU before it is needed. This reduces the idle time of the GPU decoder, enabling up to 5x higher tokens-per-second (TPS) for long-context workloads.

From an energy perspective, the implications are equally measurable. Because the architecture removes the overhead of general-purpose storage protocols, it delivers 5x better power efficiency than traditional methods.

Integrating the data plane

Implementing this architecture requires a change in how IT teams view storage networking. The ICMS platform relies on NVIDIA Spectrum-X Ethernet to provide the high-bandwidth, low-jitter connectivity required to treat flash storage almost as if it were local memory.

For enterprise infrastructure teams, the integration point is the orchestration layer. Frameworks such as NVIDIA Dynamo and the Inference Transfer Library (NIXL) manage the movement of KV blocks between tiers.

These tools coordinate with the storage layer to ensure that the correct context is loaded into the GPU memory (G1) or host memory (G2) exactly when the AI model requires it. The NVIDIA DOCA framework further supports this by providing a KV communication layer that treats context cache as a first-class resource.

Major storage vendors are already aligning with this architecture. Companies including AIC, Cloudian, DDN, Dell Technologies, HPE, Hitachi Vantara, IBM, Nutanix, Pure Storage, Supermicro, VAST Data, and WEKA are building platforms with BlueField-4. These solutions are expected to be available in the second half of this year.

Redefining infrastructure for scaling agentic AI

Adopting a dedicated context memory tier impacts capacity planning and datacentre design.

Reclassifying data: CIOs must recognise KV cache as a unique data type. It is “ephemeral but latency-sensitive,” distinct from “durable and cold” compliance data. The G3.5 tier handles the former, allowing durable G4 storage to focus on long-term logs and artifacts.

Orchestration maturity: Success depends on software that can intelligently place workloads. The system uses topology-aware orchestration (via NVIDIA Grove) to place jobs near their cached context, minimising data movement across the fabric.

Power density: By fitting more usable capacity into the same rack footprint, organisations can extend the life of existing facilities. However, this increases the density of compute per square metre, requiring adequate cooling and power distribution planning.

The transition to agentic AI forces a physical reconfiguration of the datacentre. The prevailing model of separating compute completely from slow, persistent storage is incompatible with the real-time retrieval needs of agents with photographic memories.

By introducing a specialised context tier, enterprises can decouple the growth of model memory from the cost of GPU HBM. This architecture for agentic AI allows multiple agents to share a massive low-power memory pool to reduce the cost of serving complex queries and boosts scaling by enabling high-throughput reasoning.

As organisations plan their next cycle of infrastructure investment, evaluating the efficiency of the memory hierarchy will be as vital as selecting the GPU itself.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Agentic AI scaling requires new memory architecture appeared first on AI News.

2025’s AI chip wars: What enterprise leaders learned about supply chain reality

Dashveenjit Kaur — Tue, 06 Jan 2026 09:00:00 +0000

AI chip shortage became the defining constraint for enterprise AI deployments in 2025, forcing CTOs to confront an uncomfortable reality: semiconductor geopolitics and supply chain physics matter more than software roadmaps or vendor commitments.

What began as US export controls restricting advanced AI chips to China evolved into a broader infrastructure crisis affecting enterprises globally—not from policy alone, but from explosive demand colliding with manufacturing capacity that cannot scale at software speed.

By year’s end, the dual pressures of geopolitical restrictions and component scarcity had fundamentally reshaped enterprise AI economics. The numbers tell a stark story. Average enterprise AI spending is forecasted at US$85,521 monthly in 2025, up 36% from 2024, according to CloudZero’s research surveying 500 engineering professionals.

Organisations planning to invest over US$100,000 monthly more than doubled from 20% in 2024 to 45% in 2025—not because AI became more valuable, but because component costs and deployment timelines spiralled beyond initial projections.

Export controls reshape chip access

The Trump administration’s December 2025 decision to allow conditional sales of Nvidia’s H200 chips to China—the most powerful AI chip ever approved for export—illustrated how quickly semiconductor policy can shift. The arrangement requires a 25% revenue share with the US government and applies only to approved Chinese buyers, reversing an earlier April 2025 export freeze.

Yet the policy reversal came too late to prevent widespread disruption. US Commerce Secretary Howard Lutnick testified that China’s Huawei will produce only 200,000 AI chips in 2025, while China legally imported around one million downgraded Nvidia chips designed specifically for export compliance.

The production gap forced Chinese companies into large-scale smuggling operations—federal prosecutors unsealed documents in December revealing a ring that attempted to export at least US$160 million worth of Nvidia H100 and H200 GPUs between October 2024 and May 2025.

For global enterprises, these restrictions created unpredictable procurement challenges. Companies with China-based operations or data centres faced sudden access limitations, while others discovered their global deployment plans assumed chip availability that geopolitics no longer guaranteed.

Memory chip crisis compounds AI infrastructure pain

While export controls dominated headlines, a deeper supply crisis emerged: memory chips became the binding constraint on AI infrastructure globally. High-bandwidth memory (HBM), the specialised memory that enables AI accelerators to function, hit severe shortages as manufacturers Samsung, SK Hynix, and Micron operated near full capacity while reporting six-to twelve-month lead times.

Memory prices surged accordingly. DRAM prices climbed over 50% in 2025 in some categories, with server contract prices up as much as 50% quarterly, according to Counterpoint Research. Samsung reportedly lifted prices for server memory chips by 30% to 60%. The firm forecasts memory prices to continue rising another 20% in early 2026 as demand continues outpacing capacity expansion.

The shortage wasn’t limited to specialised AI components. DRAM supplier inventories fell to two to four weeks by October 2025, down from 13-17 weeks in late 2024, per TrendForce data cited by Reuters. SK Hynix told analysts that shortages may persist until late 2027, reporting that all memory scheduled for 2026 production is already sold out.

Enterprise AI labs experienced this firsthand. Major cloud providers Google, Amazon, Microsoft, and Meta issued open-ended orders to Micron, stating they will take as much inventory as the company can provide. Chinese firms Alibaba, Tencent, and ByteDance pressed Samsung and SK Hynix for priority access.

The pressure extended into future years, with OpenAI signing preliminary agreements with Samsung and SK Hynix for its Stargate project requiring up to 900,000 wafers monthly by 2029—roughly double today’s global monthly HBM output.

Deployment timelines stretch beyond projections

The AI chip shortage didn’t just increase costs—it fundamentally altered enterprise deployment timelines. Enterprise-level custom AI solutions that typically required six to twelve months for full deployment in early 2025 stretched to 12-18 months or longer by year-end, according to industry analysts.

Bain & Company partner Peter Hanbury, speaking to CNBC, noted utility connection timelines have become the biggest constraint on data centre growth, with some projects facing five-year delays just to secure electricity access. The firm forecasts a 163GW rise in global data centre electricity demand by 2030, much of it linked to generative AI’s intensive compute requirements.

Microsoft CEO Satya Nadella captured the paradox in stark terms: “The biggest issue we are now having is not a compute glut, but its power—it’s the ability to get the builds done fast enough close to power. If you can’t do that, you may actually have a bunch of chips sitting in inventory that I can’t plug in. In fact, that is my problem today.”

Traditional tech buyers in enterprise environments faced even steeper challenges. “Buyers in this environment will have to over-extend and make some bets now to secure supply later,” warned Chad Bickley of Bain & Company in a March 2025 analysis.

“Planning ahead for delays in production may require buyers to take on some expensive inventory of bleeding-edge technology products that may become obsolete in short order.”

Hidden costs compound budget pressures

The visible price increases—HBM up 20-30% year-over-year, GPU cloud costs rising 40-300% depending on region—represented only part of the total cost impact. Organisations discovered multiple hidden expense categories that vendor quotes hadn’t captured.

Advanced packaging capacity emerged as a critical bottleneck. TSMC’s CoWoS packaging, essential for stacking HBM alongside AI processors, was fully booked through the end of 2025. Demand for this integration technique exploded as wafer production increased, creating a secondary choke point that added months to delivery timelines.

Infrastructure costs beyond chips escalated sharply. Enterprise-grade NVMe SSDs saw prices climb 15-20% compared to a year earlier as AI workloads required significantly higher endurance and bandwidth than traditional applications. Organisations planning AI deployments found their bill-of-materials costs rising 5-10% from memory component increases alone, according to Bain analysis.

Implementation and governance costs compounded further. Organisations spent US$50,000 to US$250,000 annually on monitoring, governance, and enablement infrastructure beyond core licensing fees. Usage-based overages caused monthly charges to spike unexpectedly for teams with high AI interaction density, particularly those engaging in heavy model training or frequent inference workloads.

Strategic lessons for 2026 and beyond

Enterprise leaders who successfully navigated 2025’s AI chip shortage emerged with hard-won insights that will shape procurement strategy for years ahead.

Diversify supply relationships early: Organizations that secured long-term supply agreements with multiple vendors before shortages intensified maintained more predictable deployment timelines than those relying on spot procurement.

Budget for component volatility: The era of stable, predictable infrastructure pricing has ended for AI workloads. CTOs learned to build 20-30% cost buffers into AI infrastructure budgets to absorb memory price fluctuations and component availability gaps.

Optimise before scaling: Techniques like model quantisation, pruning, and inference optimisation cut GPU needs by 30-70% in some implementations. Organisations that invested in efficiency before throwing hardware at problems achieved better economics than those focused purely on procurement.

Consider hybrid infrastructure models: Multi-cloud strategies and hybrid setups combining cloud GPUs with dedicated clusters improved reliability and cost predictability. For high-volume AI workloads, owning or leasing infrastructure increasingly proved more cost-effective than renting cloud GPUs at inflated spot prices.

Factor geopolitics into architecture decisions: The rapid policy shifts around chip exports taught enterprises that global AI infrastructure can’t assume stable regulatory environments. Organisations with China exposure learned to design deployment architectures with regulatory flexibility in mind.

The 2026 outlook: Continued constraints

The supply-demand imbalance shows no signs of resolving quickly. New memory chip factories take years to build—most capacity expansions announced in 2025 won’t come online until 2027 or later. SK Hynix guidance suggests shortages persisting through at least late 2027.

Export control policy remains fluid. A new “Trump AI Controls” rule to replace earlier frameworks is expected later in 2025, along with potential controls on exports to Malaysia and Thailand identified as diversion routes for China. Each policy shift creates new procurement uncertainties for global enterprises.

The macroeconomic implications extend beyond IT budgets. Memory shortages could delay hundreds of billions in AI infrastructure investment, slowing productivity gains that enterprises have bet on to justify massive AI spending. Rising component costs threaten to add inflationary pressure at a moment when global economies remain sensitive to price increases.

For enterprise leaders, 2025’s AI chip shortage delivered a definitive lesson: software moves at digital speed, but hardware moves at physical speed, and geopolitics moves at political speed. The gap between those three timelines defines what’s actually deployable—regardless of what vendors promise or roadmap projects.

The organisations that thrived weren’t those with the biggest budgets or the most ambitious AI visions. They were the ones who understood that in 2025, supply chain reality trumped strategic ambition—and planned accordingly.

(Photo by Igor Omilaev/Unsplash)

Want to learn more about AI and big data from industry leaders? Check outAI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post 2025’s AI chip wars: What enterprise leaders learned about supply chain reality appeared first on AI News.

Arm and the future of AI at the edge

AI News — Tue, 23 Dec 2025 13:45:19 +0000

Arm Holdings has positioned itself at the centre of AI transformation. In a wide-ranging podcast interview, Vince Jesaitis, head of global government affairs at Arm, offered enterprise decision-makers look into the company’s international strategy, the evolution of AI as the company sees it, and what lies ahead for the industry.

From cloud to edge

Arm thinks the AI market is about to enter a new phase, moving from cloud-based processing to edge computing. While much of the media’s attention has been focused to date on massive data centres, with models trained in and accessed from the cloud, Jesaitis said that most AI compute, especially inference tasks, is likely to be increasingly decentralised.

“The next ‘aha’ moment in AI is when local AI processing is being done on devices you couldn’t have imagined before,” Jesaitis said. These devices range from smartphones and earbuds to cars and industrial sensors. Arm’s IP is already embedded, literally, in these devices – it’s a company that only in the last year has been the IP behind over 30 billion chips, placed in devices of every conceivable description, all over the world.

The deployment of AI in edge environments has several benefits, with team at Arm citing three main ‘wins’. Firstly, the inherent efficiency of low-power Arm chips means that power bills for running compute and cooling are lower. That keeps the environmental footprint of the technology as small as possible.

Secondly, putting AI in local settings means latency is much lower (with latency determined by the distance between local operations and the site of the AI model). Arm points to uses like instant translation, dynamic scheduling of control systems, and features like the near-immediate triggering of safety functions – for instance in IIoT settings.

Thirdly, ‘keeping it local’ means there’s no potentially sensitive data sent off-premise. The benefits are obvious for any organisation in highly-regulated industries, but the increasing number of data breaches means even companies operating with relatively benign data sets are looking to reduce their attack surface.

Arm silicon, optimised for power-constrained devices, makes it well-suited for compute where it’s needed on the ground, the company says. The future may well be one where AI is found woven throughout environments, not centralised in a data centre run by one of the large providers.

Arm and global governments

Arm is actively engaged with global policymakers, considering this level of engagement an important part of its role. Governments continue to compete to attract semiconductor investment, the issues of supply chain and concentrated dependencies still fresh in many policymakers’ memories from the time of the COVID epidemic.

Arm lobbies for workforce development, working at present with policy-makers in the White House on an education coalition to build an ‘AI-ready workforce’. Domestic independence in technology relies as much on the abilities of workforce as it does on the availability of hardware.

Jesaitis noted a divergence between regulatory environments: the US prioritises what the government there terms acceleration and innovation, while the EU leads on safety, privacy, security and legally-enforced standards of practice. Arm aims to find the middle ground between these approaches, building products that meet stringent global compliance needs, yet furthering advances in the AI industry.

The enterprise case for edge AI

The case for integrating Arm’s edge-focused AI architecture into enterprise transformation strategies can be persuasive. The company stresses its ability to offer scale-able AI without the need to centralise to the cloud, and is also pushing its investment in hardware-level security. That means issues like memory exploits (outside of the control of users plugged into centralised AI models) can be avoided.

Of course, sectors already highly-regulated in terms of data practices are unlikely to experience relaxed governance in the future – the opposite is pretty much inevitable. All industries will be seeing more regulation and greater penalties for non-compliance in the years to come. However, to balance that, there are significant competitive advantages available to those that can demonstrate their systems’ inherent safety and security. It’s into this regulatory landscape that Arm sees itself and local, edge AI fitting.

Additionally, in Europe and Scandinavia, ESG goals are going to be increasingly important. Here, the power-sipping nature of Arm chips offers big advantages. That’s a trend that even the US hyperscalers are responding to: AWS’s latest SHALAR range of low-cost, low-power Arm-based platforms is there to satisfy that exact demand.

Arm’s collaboration with cloud hyperscalers such as AWS and Microsoft produces chips that combine efficiency with the necessary horsepower for AI applications, the company says.

What’s next from Arm and the industry

Jesaitis pointed out several trends that enterprises may be seeing in the next 12 to 18 months. Global AI exports, particularly from the US and Middle East, are ensuring that local demand for AI can be satisfied by the big providers. Arm is a company that can supply both big providers in these contexts (as part of their portfolios of offerings) and satisfy the rising demand for edge-based AI.

Jesaitis also sees edge AI as something of the hero of sustainability in an industry increasingly under fire for its ecological impact. Because Arm technology’s biggest market has been in low-power compute for mobile, it’s inherently ‘greener’. As enterprises hope to meet energy goals without sacrificing compute, Arm offers a way that combines performance with responsibility.

Redefining “smart”

Arm’s vision of AI at the edge means computers and the software running on them can be context-aware, cheap to run, secure by design, and – thanks to near-zero network latency – highly-responsive. Jesaitis said, “We used to call things ‘smart’ because they were online. Now, they’re going to be truly intelligent.”

(Image source: “Factory Floor” by danielfoster437 is licensed under CC BY-NC-SA 2.0.)

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Arm and the future of AI at the edge appeared first on AI News.

UK and Germany plan to commercialise quantum supercomputing

Ryan Daws — Fri, 05 Dec 2025 12:42:14 +0000

The UK and Germany plan to integrate their science sectors to accelerate the commercialisation of quantum supercomputing technology.

Announced on the final day of the German president’s state visit, these joint commitments target the gap between R&D and enterprise application in computing, sensing, and timing. The partnership involves specific funding to fast-track product development and establish shared operating standards.

Quantum technology currently sits on the horizon for most roadmaps, yet economic modelling suggests a contribution of £11 billion to UK GDP by 2045, supporting over 100,000 jobs.

To catalyse this, a £6 million joint R&D funding call launches in early 2026, with Innovate UK and VDI contributing £3 million each. This capital aims to help businesses bring new products to market rather than funding purely academic study.

Supply chain maturity remains a hurdle. An £8 million investment in the Fraunhofer Centre for Applied Photonics in Glasgow addresses this by bolstering the development of applied photonics; a necessary component for commercial quantum sensing.

Addressing hurdles in the UK, Germany, and beyond to commercialise quantum supercomputing

Regulatory fragmentation often stalls adoption. A new Memorandum of Understanding between the UK’s National Physical Laboratory (NPL) and Germany’s Physikalisch-Technische Bundesanstalt (PTB) aims to harmonise measurement standards. This agreement complements the NMI-Q initiative, a global effort to develop shared norms.

UK Science Minister Lord Vallance said: “Quantum technology will revolutionise fields such as cybersecurity, drug discovery, medical imaging, and much more. International collaboration is crucial to unlocking these benefits.”

In practical terms, these advances allow pharmaceutical firms to identify new medicines faster. Similarly, next-generation sensors promise medical scanners that are more affordable, portable, and accurate than current iterations.

The partnership also extends to high-performance computing (HPC). The UK’s National Supercomputing Centre at the University of Edinburgh was selected by the EuroHPC Joint Undertaking to host the UK’s AI Factory Antenna, partnering with the HammerHAI AI Factory in Stuttgart.

To support HPC integration prior to the commercialisation of quantum supercomputing technology, the Department for Science, Innovation and Technology (DSIT) is allocating up to £3.9 million to match fund UK participation in three open EuroHPC calls. This funding assists teams developing exascale and AI-ready software.

In the aerospace sector, the two nations recently committed joint funding of over €6 billion to the European Space Agency. This includes €1 billion for launch programmes and €10 million for Rocket Factory Augsburg, which plans to launch from Scotland in 2026.

German President Frank-Walter Steinmeier concluded his visit at Siemens Healthineers in Oxford. The site produces superconducting magnets for MRI scanners, an existing example of how bilateral science ties support high-skilled manufacturing and health outcomes.

As this bilateral cooperation deepens, the integrated approach between the UK and Germany toward supercomputing and quantum infrastructure aims to offer enterprises a powerful foundation for scaling high-performance workloads across Europe.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post UK and Germany plan to commercialise quantum supercomputing appeared first on AI News.

Aluminium OS is the AI-powered successor to ChromeOS

Joe Green — Fri, 05 Dec 2025 12:32:24 +0000

The convergence of mobile and desktop operating systems is a goal that has remained elusive for big tech firms since the early days of the smartphone. Microsoft’s attempt in the form of Windows Mobile was reaching the end of its road by 2010, and despite Apple’s iOS/iPadOS and macOS moving very slowly towards one another for the last few years, Cupertino has not yet reached the fabled goal of the-one-OS-to-rule-them-all.

But Google’s big play to merge ChromeOS and Android into a unified PC platform (with the anglicised codename Aluminium OS) is gradually taking shape. Android-powered laptops are planned for released in 2026, and the company wants to put its LLMs at the centre of the user experience.

Hardware procurement decisions may be in step with company AI strategy in the enterprise, therefore, in the coming year. The prospect of chromebook-style devices and an accompanying lower price tag will be attractive both to organisations considering their next round of machine refreshes, and strategists who want to put AI at the heart of their employees’ daily work could. Soon, they might have a solution in common.

It’s early days in the development of the converged device at Google, but the company is well known for both floating ideas that don’t get far and abandoning technologies it can’t monetise effectively enough. Unlike some of the company’s projects that may stem from its ‘20%’ policy (employees at Google are encouraged to dedicate 20% of their time to moonshot projects), the substantial Android development community and Google’s policy of putting Gemini front-and-centre may be the accelerant the new, converged operating system needs.

Android’s existing AI capabilities like the Magic Editor for photos, audio transcription and summarisation would port very well to the workplace desktop. However, if Google wants to assuage the fears of security professionals, it may have to rely on local, small models for AI processing, rather than reaching out to cloud instances of Gemini for the required compute power. That puts into question the continuation of one of the chromebook range’s big selling points – its low price compared to fully-fledged workstations.

There’s also a delicate balance the company needs to strike. Forcing users into an AI-centric workflow hasn’t played well for Microsoft: note the furore around Recall and the muted response to its much-reduced offspring that has sprung out of Copilot Labs. What Google needs is a killer AI feature that benefits the enterprise, and that may or may not be something that’s aimed at users.

It’s undeniable that the addition of Gemini to Google Workspace has done wonders for the platform in terms of its competitiveness with Office 365 – despite a significant price hike earlier this year – driven in some part by new features like live translation in Google Meet and AI responses available in Gmail. Users do find some AI tools useful, but it may be becoming apparent that user-facing AI is a useful addition to existing workflows, rather than a catalyst that changes everything.

If placing Gemini or Gemini Nano at the heart of the new operating system, therefore, it may be that Google is looking to offer value to different parts of the enterprise from the daily tasks users tackle. Android Authority suggests smart power management, device provisioning, and contextual awareness in accessing enterprise resources may be on the table. It’s difficult to see how these elements would be a game-changer for procurement teams, however.

Google has many problems to solve at a deeper level, like compatibility with peripherals, OS-level drivers, and the necssary changes to the Android GUI to make it a great experience for end users wielding mouse and keyboard. But given enough effort and investment (something the company does not lack) these are issues that can be surmounted relatively easily. A thriving app ecosystem will ensure that the necessary tools are if not immediately available, could be made so with minimum effort.

Ultimately, the success of Aluminium OS will depend on Google’s ability to offer a platform that solves tangible problems and integrates into existing workflows. Google sees AI in the form of Gemini (or localised Gemini Nano instance) powering a platform that offers integrated problem-solving. Hitting that target will generate demand, and a lower price per machine could be the decider for procurement teams. If Google gets it right, it could repeat the success it experienced in the education market with the original chromebook project, and there could be a substantial shift by enterprise fleets to Aluminium OS and Google Workspaces.

There are big gains to be made for a company that dominates the mobile market worldwide and makes serious inroads into the enterprise workstation market. Plus, that elusive device convergence would be much closer to becoming a reality.

(Image source: “Macro Monday : Aluminium buttons (Al on the periodic table)” by cchana is licensed under CC BY-SA 2.0.)

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Aluminium OS is the AI-powered successor to ChromeOS appeared first on AI News.

AWS re:Invent 2025: Frontier AI agents replace chatbots

Ryan Daws — Thu, 04 Dec 2025 12:23:33 +0000

According to AWS at this week’s re:Invent 2025, the chatbot hype cycle is effectively dead, with frontier AI agents taking their place.

That is the blunt message radiating from Las Vegas this week. The industry’s obsession with chat interfaces has been replaced by a far more demanding mandate: “frontier agents” that don’t just talk, but work autonomously for days at a time.

We are moving from the novelty phase of generative AI into a grinding era of infrastructure economics and operational plumbing. The “wow” factor of a poem-writing bot has faded; now, the cheque comes due for the infrastructure needed to run these systems at scale.

Addressing the plumbing crisis at AWS re:Invent 2025

Until recently, building frontier AI agents capable of executing complex, non-deterministic tasks was a bespoke engineering nightmare. Early adopters have been burning resources cobbling together tools to manage context, memory, and security.

AWS is trying to kill that complexity with Amazon Bedrock AgentCore. It’s a managed service that acts as an operating system for agents, handling the backend work of state management and context retrieval. The efficiency gains from standardising this layer are hard to ignore.

Take MongoDB. By ditching their home-brewed infrastructure for AgentCore, they consolidated their toolchain and pushed an agent-based application to production in eight weeks—a process that previously ate up months of evaluation and maintenance time. The PGA TOUR saw even sharper returns, using the platform to build a content generation system that increased writing speed by 1,000 percent while slashing costs by 95 percent.

Software teams are getting their own dedicated workforce, too. At re:Invent 2025, AWS rolled out three specific frontier AI agents: Kiro (a virtual developer), a Security Agent, and a DevOps Agent. Kiro isn’t just a code-completion tool; it hooks directly into workflows with “powers” (specialised integrations for tools like Datadog, Figma, and Stripe) that allow it to act with context rather than just guessing at syntax.

Agents that run for days consume massive amounts of compute. If you are paying standard on-demand rates for that, your ROI evaporates.

AWS knows this, which is why the hardware announcements this year are aggressive. The new Trainium3 UltraServers, powered by 3nm chips, are claiming a 4.4x jump in compute performance over the previous generation. For the organisations training massive foundation models, this cuts training timelines from months to weeks.

But the more interesting shift is where that compute lives. Data sovereignty remains a headache for global enterprises, often blocking cloud adoption for sensitive AI workloads. AWS is countering this with ‘AI Factories’ (essentially shipping racks of Trainium chips and NVIDIA GPUs directly into customers’ existing data centres.) It’s a hybrid play that acknowledges a simple truth: for some data, the public cloud is still too far away.

Tackling the legacy mountain

Innovation like we’re seeing with frontier AI agents is great, but most IT budgets are strangled by technical debt. Teams spend roughly 30 percent of their time just keeping the lights on.

During re:Invent 2025, Amazon updated AWS Transform to attack this specifically; using agentic AI to handle the grunt work of upgrading legacy code. The service can now handle full-stack Windows modernisation; including upgrading .NET apps and SQL Server databases.

Air Canada used this to modernise thousands of Lambda functions. They finished in days. Doing it manually would have cost them five times as much and taken weeks.

For developers who actually want to write code, the ecosystem is widening. The Strands Agents SDK, previously a Python-only affair, now supports TypeScript. As the lingua franca of the web, it brings type safety to the chaotic output of LLMs and is a necessary evolution.

Sensible governance in the era of frontier AI agents

There is a danger here. An agent that works autonomously for “days without intervention” is also an agent that can wreck a database or leak PII without anyone noticing until it’s too late.

AWS is attempting to wrap this risk in ‘AgentCore Policy,’ a feature allowing teams to set natural language boundaries on what an agent can and cannot do. Coupled with ‘Evaluations,’ which uses pre-built metrics to monitor agent performance, it provides a much-needed safety net.

Security teams also get a boost with updates to Security Hub, which now correlates signals from GuardDuty, Inspector, and Macie into single “events” rather than flooding the dashboard with isolated alerts. GuardDuty itself is expanding, using ML to detect complex threat patterns across EC2 and ECS clusters.

We are clearly past the point of pilot programs. The tools announced at AWS re:Invent 2025, from specialised silicon to governed frameworks for frontier AI agents, are designed for production. The question for enterprise leaders is no longer “what can AI do?” but “can we afford the infrastructure to let it do its job?”

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post AWS re:Invent 2025: Frontier AI agents replace chatbots appeared first on AI News.

AI memory hunger forces Micron’s consumer exodus: A turning point in semiconductor economics

Dashveenjit Kaur — Thu, 04 Dec 2025 06:05:00 +0000

In the basement of a Boise, Idaho, dental office in 1978, four engineers founded what would become one of America’s semiconductor giants. Ward Parkinson, Joe Parkinson, Dennis Wilson, and Doug Pitman started Micron Technology as a modest design consultancy, backed by local investors including potato magnate J.R. Simplot.

By 1983, they had achieved a technological breakthrough – producing chips roughly half the size of Japan’s leading products. Nearly five decades later, that same company has made a decision that crystallises artificial intelligence’s profound impact on hardware economics: AI memory hunger is forcing manufacturers to abandon entire market segments.

On December 3, 2025, Micron announced it would completely exit the consumer memory market, discontinuing its 29-year-old Crucial brand by February 2026. “The AI-driven growth in the data centre has led to a surge in demand for memory and storage,” said Sumit Sadana, Micron’s executive vice president and chief business officer.

“Micron has made the difficult decision to exit the Crucial consumer business to improve supply and support for our larger, strategic customers in faster-growing segments.”

Translation: data centres running AI workloads will pay substantially more for memory than individual consumers ever could, and Micron’s fabrication capacity cannot serve both markets simultaneously.

The announcement represents both a business decision and a watershed moment revealing how AI memory hunger demands are restructuring global semiconductor supply chains, forcing manufacturers to make stark choices about which customers ‘deserve’ access to finite production capacity.

The economics driving AI memory hunger

Micron’s withdrawal reflects economic realities. As the world’s third-largest DRAM producer with an approximately 20% of global market share, the company sits between South Korean giants Samsung Electronics (43%) and SK Hynix (35%). Together, these three control roughly 95% of worldwide DRAM production – an oligopoly now facing unprecedented demand from AI infrastructure builders.

The margin differentials tell the story. Consumer RAM modules compete in volatile retail markets with razor-thin profitability. Enterprise contracts for high-bandwidth memory (HBM) used in AI accelerators and DDR5 modules for data centre servers deliver substantially higher average selling prices, multi-year commitments, and predictable demand.

For memory manufacturers, each fabrication wafer committed to consumer products represents foregone revenue from higher-value enterprise contracts – an opportunity cost that has become economically indefensible as AI demand accelerates.

The numbers illustrate the magnitude of the shift. Micron reported record fiscal 2025 revenue of US$37.38 billion, representing nearly 50% year-over-year growth driven primarily by data centre and AI applications, which accounted for 56% of total revenue. SK Hynix has reportedly sold out its entire 2026 production capacity for DRAM, HBM, and NAND products.

Consumer memory prices have surged accordingly. DRAM spot prices increased 172% year-over-year as of Q3 2025, with retail prices for 32GB DDR5 modules jumping 163-619% in global markets since September 2025. Component suppliers report paying US$13 for 16GB DDR5 chips that cost US$7 just six weeks earlier – increases sufficient to eliminate entire gross margins for third-party brands.

Consumer market restructuring amid AI memory hunger

Micron’s exit alters the consumer memory landscape. Third-party brands, including Corsair, G.Skill, Kingston, and ADATA source their DRAM chips from the major manufacturers. With Micron withdrawing entirely, these vendors must compete more aggressively for allocation from Samsung and SK Hynix – both simultaneously prioritising high-bandwidth memory production for AI accelerators.

The concentration creates vulnerabilities. Samsung and SK Hynix now comprise the only major suppliers serving both consumer and enterprise markets. Both face identical capacity allocation pressures. If AI infrastructure investment maintains current trajectories, additional manufacturers may reduce or restructure consumer operations.

Supply chain constraints are already materialising beyond DRAM. NAND flash wafer contract prices increased by over 60% in November 2025. Graphics memory markets face pressures as manufacturers shift to GDDR7 for next-generation GPUs, creating GDDR6 shortages that inflated prices by approximately 30%. Hard drive manufacturers increased prices 5-10%, citing limited supply.

For consumers and small businesses, the implications extend beyond pricing. Product availability may become increasingly constrained during peak demand periods. The reduction in direct supplier participation may compress product differentiation and limit competitive pricing dynamics that previously benefited buyers.

The broader industry realignment

Micron’s consumer exodus signals a structural transformation rather than a temporary reallocation. The AI infrastructure boom differs fundamentally from previous technology transitions. Personal computing, internet expansion, and mobile devices created sustained memory demand over decades with gradual capacity adjustments.

AI infrastructure deployment compresses that timeline dramatically – hyperscale operators are committing hundreds of billions in data centre construction over just a few years. Data centre semiconductor markets illustrate the scale. The total addressable market reached US$209 billion in 2024, and is projected to grow to nearly US$500 billion by 2030, driven primarily by AI and high-performance computing.

GPU revenue alone is forecast to expand from US$100 billion in 2024 to US$215 billion by 2030, with each GPU requiring substantial high-bandwidth memory allocation.

Memory architecture evolution compounds the challenge. AI training workloads increasingly require HBM3E modules, which offer superior bandwidth and power efficiency, while inference workloads demand DDR5 with tight latency specifications.

Automotive applications adopting zonal architectures require multi-gigabyte DRAM configurations. Each application commands premium pricing and long-term contracts – economic incentives systematically pulling manufacturing capacity away from consumer markets.

The manufacturing response reflects these priorities. Samsung is advancing 1c DRAM production and planning mass production of HBM4 in 2025 while phasing out DDR4 entirely. Micron began mass production of DRAM using Extreme Ultraviolet (EUV) lithography in 2025.

SK Hynix focuses development resources on HBM and advanced LPDDR solutions. All three manufacturers are directing research and capital investment toward applications offering superior returns.

What this means for enterprise buyers

Enterprise procurement teams face their own challenges as memory markets restructure. Memory represents 10-25% of bill-of-materials costs for typical servers and commercial PCs. Price increases of 20-30% in memory components translate to 5-10% increases in total system costs, compounding into millions in additional expenditure for organisations procuring at scale.

Strategic responses include forward purchasing agreements, establishing stronger direct relationships with manufacturers, and diversifying vendor partnerships. The timing uncertainty presents particular challenges. New fabrication capacity is under construction, supported by government incentives, but requires years to reach production readiness.

Critical questions ahead

Micron’s consumer market exit raises fundamental questions. Will Samsung and SK Hynix maintain consumer product lines, or will similar capacity pressures force comparable reductions? If consumer memory becomes primarily a third-party brand market sourcing chips from manufacturers prioritising enterprise customers, what happens to product innovation and competitive pricing?

The concentration among just two major manufacturers serving consumer markets creates potential vulnerabilities. Supply chain disruptions affecting either Samsung or SK Hynix would have an outsized impact on global consumer product availability.

Broader implications extend to technology accessibility. If memory pricing remains elevated or availability constrained for consumer products, the costs of personal computing and small business infrastructure increase accordingly, potentially widening digital divides.

Micron’s decision crystallises artificial intelligence’s role as a transformative force reshaping not just software, but the fundamental economics of hardware manufacturing. The Crucial brand’s retirement after 29 years marks the end of a time when memory manufacturers could serve both consumer and enterprise segments simultaneously and profitably.

For the broader technology ecosystem, hunger for AI memory has become the semiconductor industry’s dominant growth driver, commanding resources at levels that fundamentally alter which markets manufacturers choose to serve.

(Photo: Micron Technology)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post AI memory hunger forces Micron’s consumer exodus: A turning point in semiconductor economics appeared first on AI News.

EY and NVIDIA to help companies test and deploy physical AI

Muhammad Zulhusni — Wed, 03 Dec 2025 12:05:00 +0000

AI is moving deeper into the physical world, and EY is laying out a more structured way for companies to work with robots, drones, and other smart devices. The organisation is introducing a physical AI platform built with NVIDIA tools, opening a new EY.ai Lab in Georgia, and adding new leadership to guide its work in this field.

The platform uses NVIDIA Omniverse libraries, NVIDIA Isaac, and NVIDIA AI Enterprise software. EY says the setup gives organisations a clearer way to plan, test, and manage AI systems that operate in real environments, from factory robots to drones and edge devices.

Omniverse libraries support the creation of digital twins so firms can model and test systems before deployment. NVIDIA Isaac tools offer open models and simulation frameworks to design and validate AI-driven robots in detailed 3D settings. NVIDIA AI Enterprise provides the computing base needed to run heavier AI workloads.

EY describes the platform as built around three main areas:

AI-ready data: Synthetic data to mirror a wide range of physical scenarios.
Digital twins and robotics training: Tools that connect digital and physical systems, monitor performance in real time, and support operational continuity.
Responsible physical AI: Governance and controls that address safety, ethics, and compliance.

The platform is meant to support everything from early planning to long-term maintenance in sectors like industrials, energy, consumer, and health.

Raj Sharma, EY Global Managing Partner – Growth & Innovation, says physical AI is already “transforming how businesses in sectors operate and help create value,” saying that it brings more automation and can help lower operating costs. He says the combination of EY’s industry experience and NVIDIA’s infrastructure is expected to speed up how companies move “from experimentation to enterprise-scale deployment.”

NVIDIA’s John Fanelli notes that more enterprises are bringing robots and automation into real settings to address workforce changes and improve safety. He says the EY.ai Lab, supported by NVIDIA AI infrastructure, helps organisations “simulate, optimise and safely deploy robotics applications at enterprise scale,” which he views as part of the next phase of industrial AI.

New leadership and a dedicated physical AI lab

EY has also appointed Dr. Youngjun Choi as its Global Physical AI Leader. He will oversee robotics and physical AI work and help shape EY’s role as an advisor in this area.

Choi, who has nearly 20 years’ experience in robotics and AI, previously led the UPS Robotics AI Lab, where he worked on digital twins, robotics projects, and AI tools to modernise its network. Before that, he served as research faculty in Aerospace Engineering at the Georgia Institute of Technology, contributing to aerial robotics and autonomous systems.

A key part of his role is directing the newly opened EY.ai Lab in Alpharetta, Georgia – the first EY site focused on physical AI. The Lab includes robotics systems, sensors, and simulation tools so organisations can test ideas and build prototypes before deploying them at scale.

Joe Depa, EY Global Chief Innovation Officer, says his clients want better ways to use technology for decision-making and performance. He adds that physical AI requires strong data foundations and trust from the start. With Choi leading the Lab, Depa says EY teams are beginning to “get beyond the surface of what is possible” and set up the base for scalable operations.

At the Lab, organisations can:

Design and test physical AI systems in a virtual testbed,
Build solutions for humanoids, quadrupeds, and other next-generation robots,
Improve logistics, manufacturing, and maintenance with digital twins.

The new platform and Lab build on earlier collaboration between EY and NVIDIA, including an AI agent platform launched earlier this year. Both organisations plan to expand their physical AI work to areas like energy, health, and smart cities. They also aim to support automation projects that cut waste and help reduce environmental impact.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post EY and NVIDIA to help companies test and deploy physical AI appeared first on AI News.

Can China’s chip stacking strategy really challenge Nvidia’s AI dominance?

Dashveenjit Kaur — Wed, 03 Dec 2025 09:00:00 +0000

Chip stacking strategy is emerging as China’s innovative response to US semiconductor restrictions, but can this approach truly close the performance gap with Nvidia’s advanced GPUs? As Washington tightens export controls on cutting-edge chipmaking technology, Chinese researchers are proposing a bold workaround: stack older, domestically-producible chips together to match the performance of chips they can no longer access.

The core concept: Building upward instead of forward

The chip stacking strategy centres on a deceptively simple premise – if you can’t make more advanced chips, make smarter systems with the chips you can produce. Wei Shaojun, vice-president of the China Semiconductor Industry Association and a professor at Tsinghua University, recently outlined to the South China Morning Post an architecture that combines 14-nanometer logic chips with 18-nanometer DRAM using three-dimensional hybrid bonding.

This matters because US export controls specifically target the production of logic chips at 14nm and below, and DRAM at 18nm and below. Wei’s proposal works precisely at these technological boundaries, using processes that remain accessible to Chinese manufacturers.

The technical approach involves what’s called “software-defined near-memory computing.” Instead of shuffling data back and forth between processors and memory – a major bottleneck in AI workloads – the chip stacking strategy places them in intimate proximity through vertical stacking.

The 3D hybrid bonding technique creates direct copper-to-copper connections at sub-10 micrometre pitches, essentially eliminating the physical distance that slows down conventional chip architectures.

The performance claims and reality check

Wei claims this configuration could rival Nvidia’s 4nm GPUs while significantly reducing costs and power consumption. He’s cited performance figures of 2 TFLOPS per watt and a total of 120 TFLOPS. There’s just one problem: Nvidia’s A100 GPU, which Wei positions as the comparison point, actually delivers up to 312 TFLOPS – more than 2.5 times the claimed performance.

The discrepancy highlights a question about the chip stacking strategy’s feasibility. While the architectural innovation is real, the performance gaps remain substantial. Stacking older chips doesn’t magically erase the advantages of advanced process nodes, which deliver superior power efficiency, higher transistor density, and better thermal characteristics.

Why China is betting on this approach

The strategic logic behind the chip stacking strategy extends beyond pure performance metrics. Huawei founder Ren Zhengfei has articulated a philosophy of achieving “state-of-the-art performance by stacking and clustering chips rather than competing node for node.” This represents a shift in how China approaches the semiconductor challenge.

Consider the alternatives. TSMC and Samsung are pushing toward 3nm and 2nm processes that remain completely out of reach for Chinese manufacturers. Rather than fighting an unwinnable battle for process node leadership, the chip stacking strategy proposes competing on system architecture and software optimisation instead.

There’s also the CUDA problem. Nvidia’s dominance in AI computing rests not just on hardware but on its CUDA software ecosystem. Wei describes this as a “triple dependence” spanning models, architectures, and ecosystems.

Chinese chip designers pursuing traditional GPU architectures would need to either replicate CUDA’s functionality or convince developers to abandon a mature, widely adopted platform. The chip stacking strategy, by proposing an entirely different computing paradigm, offers a path to sidestep this dependency.

The feasibility question

Can the chip stacking strategy actually work? The technical foundations are sound – 3D chip stacking is already used in high-bandwidth memory and advanced packaging solutions worldwide. The innovation lies in applying these techniques to create entirely new computing architectures rather than simply improving existing designs.

However, several challenges loom large. First, thermal management becomes greatly more difficult when stacking multiple active processing dies. The heat generated by 14nm chips is considerably higher than modern 4nm or 5nm processes, and stacking intensifies the problem.

Second, yield rates in 3D stacking are notoriously difficult to optimise – a defect in any layer can compromise the entire stack. Third, the software ecosystem required to efficiently use such architectures doesn’t exist yet and would take years to mature.

The most realistic assessment is that the chip stacking strategy represents a valid approach for specific workloads where memory bandwidth matters more than raw computational speed. AI inference tasks, certain data analytics operations, and specialised applications could potentially benefit. But matching Nvidia’s performance in the full spectrum of AI training and inference tasks remains a distant goal.

What it means for the AI chip wars

The emergence of the chip stacking strategy as a focal point for Chinese semiconductor development signals a strategic pivot. Rather than attempting to replicate Western chip designs with inferior process nodes, China is exploring architectural alternatives that play to available manufacturing strengths.

Whether a chip stacking strategy succeeds in closing the performance gap with Nvidia remains uncertain. What’s clear is that China’s semiconductor industry is adapting to restrictions by pursuing innovation in areas where export controls have less impact – system design, packaging technology, and software-hardware co-optimisation.

For the global AI industry, this means the competitive landscape is becoming more complex. Nvidia’s current dominance faces challenges from traditional competitors like AMD and Intel, and entirely new architectural approaches that may redefine what an “AI chip” looks like.

The chip stacking strategy, whatever its current limitations, represents exactly this kind of architectural disruption – and that makes it worth watching closely.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Can China’s chip stacking strategy really challenge Nvidia’s AI dominance? appeared first on AI News.

ZAYA1: AI model using AMD GPUs for training hits milestone

Ryan Daws — Mon, 24 Nov 2025 18:07:40 +0000

Zyphra, AMD, and IBM spent a year testing whether AMD’s GPUs and platform can support large-scale AI model training, and the result is ZAYA1.

In partnership, the three companies trained ZAYA1 – described as the first major Mixture-of-Experts foundation model built entirely on AMD GPUs and networking – which they see as proof that the market doesn’t have to depend on NVIDIA to scale AI.

The model was trained on AMD’s Instinct MI300X chips, Pensando networking, and ROCm software, all running across IBM Cloud’s infrastructure. What’s notable is how conventional the setup looks. Instead of experimental hardware or obscure configurations, Zyphra built the system much like any enterprise cluster—just without NVIDIA’s components.

Zyphra says ZAYA1 performs on par with, and in some areas ahead of, well-established open models in reasoning, maths, and code. For businesses frustrated by supply constraints or spiralling GPU pricing, it amounts to something rare: a second option that doesn’t require compromising on capability.

How Zyphra used AMD GPUs to cut costs without gutting AI training performance

Most organisations follow the same logic when planning training budgets: memory capacity, communication speed, and predictable iteration times matter more than raw theoretical throughput.

MI300X’s 192GB of high-bandwidth memory per GPU gives engineers some breathing room, allowing early training runs without immediately resorting to heavy parallelism. That tends to simplify projects that are otherwise fragile and time-consuming to tune.

Zyphra built each node with eight MI300X GPUs connected over InfinityFabric and paired each one with its own Pollara network card. A separate network handles dataset reads and checkpointing. It’s an unfussy design, but that seems to be the point; the simpler the wiring and network layout, the lower the switch costs and the easier it is to keep iteration times steady.

ZAYA1: An AI model that punches above its weight

ZAYA1-base activates 760 million parameters out of a total 8.3 billion and was trained on 12 trillion tokens in three stages. The architecture leans on compressed attention, a refined routing system to steer tokens to the right experts, and lighter-touch residual scaling to keep deeper layers stable.

The model uses a mix of Muon and AdamW. To make Muon efficient on AMD hardware, Zyphra fused kernels and trimmed unnecessary memory traffic so the optimiser wouldn’t dominate each iteration. Batch sizes were increased over time, but that depends heavily on having storage pipelines that can deliver tokens quickly enough.

All of this leads to an AI model trained on AMD hardware that competes with larger peers such as Qwen3-4B, Gemma3-12B, Llama-3-8B, and OLMoE. One advantage of the MoE structure is that only a sliver of the model runs at once, which helps manage inference memory and reduces serving cost.

A bank, for example, could train a domain-specific model for investigations without needing convoluted parallelism early on. The MI300X’s memory headroom gives engineers space to iterate, while ZAYA1’s compressed attention cuts prefill time during evaluation.

Making ROCm behave with AMD GPUs

Zyphra didn’t hide the fact that moving a mature NVIDIA-based workflow onto ROCm took work. Instead of porting components blindly, the team spent time measuring how AMD hardware behaved and reshaping model dimensions, GEMM patterns, and microbatch sizes to suit MI300X’s preferred compute ranges.

InfinityFabric operates best when all eight GPUs in a node participate in collectives, and Pollara tends to reach peak throughput with larger messages, so Zyphra sized fusion buffers accordingly. Long-context training, from 4k up to 32k tokens, relied on ring attention for sharded sequences and tree attention during decoding to avoid bottlenecks.

Storage considerations were equally practical. Smaller models hammer IOPS; larger ones need sustained bandwidth. Zyphra bundled dataset shards to reduce scattered reads and increased per-node page caches to speed checkpoint recovery, which is vital during long runs where rewinds are inevitable.

Keeping clusters on their feet

Training jobs that run for weeks rarely behave perfectly. Zyphra’s Aegis service monitors logs and system metrics, identifies failures such as NIC glitches or ECC blips, and takes straightforward corrective actions automatically. The team also increased RCCL timeouts to keep short network interruptions from killing entire jobs.

Checkpointing is distributed across all GPUs rather than forced through a single chokepoint. Zyphra reports more than ten-fold faster saves compared with naïve approaches, which directly improves uptime and cuts operator workload.

What the ZAYA1 AMD training milestone means for AI procurement

The report draws a clean line between NVIDIA’s ecosystem and AMD’s equivalents: NVLINK vs InfinityFabric, NCCL vs RCCL, cuBLASLt vs hipBLASLt, and so on. The authors argue the AMD stack is now mature enough for serious large-scale model development.

None of this suggests enterprises should tear out existing NVIDIA clusters. A more realistic path is to keep NVIDIA for production while using AMD for stages that benefit from the memory capacity of MI300X GPUs and ROCm’s openness. It spreads supplier risk and increases total training volume without major disruption.

This all leads us to a set of recommendations: treat model shape as adjustable, not fixed; design networks around the collective operations your training will actually use; build fault tolerance that protects GPU hours rather than merely logging failures; and modernise checkpointing so it no longer derails training rhythm.

It’s not a manifesto, just our practical takeaway from what Zyphra, AMD, and IBM learned by training a large MoE AI model on AMD GPUs. For organisations looking to expand AI capacity without relying solely on one vendor, it’s a potentially useful blueprint.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post ZAYA1: AI model using AMD GPUs for training hits milestone appeared first on AI News.