Data Engineering & MLOps - AI News

How disconnected clouds improve AI data governance

Ryan Daws — Tue, 24 Feb 2026 14:42:44 +0000

Disconnected clouds aim to improve AI data governance as businesses rethink their infrastructure under tighter regulatory expectations.

Ensuring operational continuity in isolated environments has become increasingly vital for businesses. Facilities lacking continuous internet access face unique constraints where external dependencies become unacceptable.

Microsoft recently expanded its capabilities to allow regulated industries and public sectors to participate independently in the digital economy. Trust in these systems stems from confidence that data remains protected, controls are enforceable, and operations proceed regardless of external conditions.

The company now offers full stack options across connected, intermittently connected, and fully disconnected modes. This architecture unifies Azure Local, Microsoft 365 Local, and Foundry Local into a single sovereign private cloud.

Bringing these elements together provides a localised experience resilient to any connectivity condition. By standardising governance across all deployments, it helps enterprises to prevent fragmented architectures.

Azure Local disconnected operations enable organisations to run vital infrastructure using familiar Azure governance and policy controls completely offline. Execution, management, and policy enforcement stay entirely within customer-operated facilities.

This approach allows companies to maintain uninterrupted operations and keep identities protected within their established boundaries. Implementations scale from minor deployments to demanding and data-intensive workloads.

Improving resilience and AI data governance in tandem

Deploying AI in sovereign environments introduces high compute requirements. Foundry Local enables enterprises to run multimodal large models completely offline.

Utilising modern hardware from partners like NVIDIA, customers deploy AI inferencing on their own physical servers. This ensures data and application programming interfaces operate strictly within customer-controlled boundaries. Customers maintain complete authority over their hardware even as AI inferencing demands increase over time.

Gerard Hoffmann, CEO of Proximus Luxembourg, said: “The availability of Azure Local disconnected operations represents a breakthrough for organisations that need control over their data without sacrificing the power of the Microsoft Cloud.

“For Luxembourg, where digital sovereignty is not just a principle but a strategic necessity, this model offers the resilience, autonomy and trust our market expects. By combining Microsoft’s technological leadership with Proximus NXT’s sovereign cloud expertise, we are enabling our customers to innovate confidently—even in fully-disconnected mode.”

CIOs planning offline deployments must map workloads to the correct control posture based on risk, regulation, and specific mission requirements. Since disconnected environments are not one-size-fits-all, businesses can start fast with smaller deployments and expand their capabilities over time.

Implementing a disconnected private cloud with AI support answers a business requirement for highly-regulated sectors, enabling secure data governance even when external connectivity is absent.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post How disconnected clouds improve AI data governance appeared first on AI News.

AI Expo 2026 Day 2: Moving experimental pilots to AI production

Ryan Daws — Thu, 05 Feb 2026 16:08:36 +0000

The second day of the co-located AI & Big Data Expo and Digital Transformation Week in London showed a market in a clear transition.

Early excitement over generative models is fading. Enterprise leaders now face the friction of fitting these tools into current stacks. Day two sessions focused less on large language models and more on the infrastructure needed to run them: data lineage, observability, and compliance.

Data maturity determines deployment success

AI reliability depends on data quality. DP Indetkar from Northern Trust warned against allowing AI to become a “B-movie robot.” This scenario occurs when algorithms fail because of poor inputs. Indetkar noted that analytics maturity must come before AI adoption. Automated decision-making amplifies errors rather than reducing them if the data strategy is unverified.

Eric Bobek of Just Eat supported this view. He explained how data and machine learning guide decisions at the global enterprise level. Investments in AI layers are wasted if the data foundation remains fragmented.

Mohsen Ghasempour from Kingfisher also noted the need to turn raw data into real-time actionable intelligence. Retail and logistics firms must cut the latency between data collection and insight generation to see a return.

Scaling in regulated environments

The finance, healthcare, and legal sectors have near-zero tolerance for error. Pascal Hetzscholdt from Wiley addressed these sectors directly.

Hetzscholdt stated that responsible AI in science, finance, and law relies on accuracy, attribution, and integrity. Enterprise systems in these fields need audit trails. Reputational damage or regulatory fines make “black box” implementations impossible.

Konstantina Kapetanidi of Visa outlined the difficulties in building multilingual, tool-using, scalable generative AI applications. Models are becoming active agents that execute tasks rather than just generating text. Allowing a model to use tools – like querying a database – creates security vectors that need serious testing.

Parinita Kothari from Lloyds Banking Group detailed the requirements for deploying, scaling, monitoring, and maintaining AI systems. Kothari challenged the “deploy-and-forget” mentality. AI models need continuous oversight, similar to traditional software infrastructure.

The change in developer workflows

Of course, AI is fundamentally changing how code is written. A panel with speakers from Valae, Charles River Labs, and Knight Frank examined how AI copilots reshape software creation. While these tools speed up code generation, they also force developers to focus more on review and architecture.

This change requires new skills. A panel with representatives from Microsoft, Lloyds, and Mastercard discussed the tools and mindsets needed for future AI developers. A gap exists between current workforce capabilities and the needs of an AI-augmented environment. Executives must plan training programmes that ensure developers sufficiently validate AI-generated code.

Dr Gurpinder Dhillon from Senzing and Alexis Ego from Retool presented low-code and no-code strategies. Ego described using AI with low-code platforms to make production-ready internal apps. This method aims to cut the backlog of internal tooling requests.

Dhillon argued that these strategies speed up development without dropping quality. For the C-suite, this suggests cheaper internal software delivery if governance protocols stay in place.

Workforce capability and specific utility

The broader workforce is starting to work with “digital colleagues.” Austin Braham from EverWorker explained how agents reshape workforce models. This terminology implies a move from passive software to active participants. Business leaders must re-evaluate human-machine interaction protocols.

Paul Airey from Anthony Nolan gave an example of AI delivering literally life-changing value. He detailed how automation improves donor matching and transplant timelines for stem cell transplants. The utility of these technologies extends to life-saving logistics.

A recurring theme throughout the event is that effective applications often solve very specific and high-friction problems rather than attempting to be general-purpose solutions.

Managing the transition

The day two sessions from the co-located events show that enterprise focus has now moved to integration. The initial novelty is gone and has been replaced by demands for uptime, security, and compliance. Innovation heads should assess which projects have the data infrastructure to survive contact with the real world.

Organisations must prioritise the basic aspects of AI: cleaning data warehouses, establishing legal guardrails, and training staff to supervise automated agents. The difference between a successful deployment and a stalled pilot lies in these details.

Executives, for their part, should direct resources toward data engineering and governance frameworks. Without them, advanced models will fail to deliver value.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post AI Expo 2026 Day 2: Moving experimental pilots to AI production appeared first on AI News.

Lowering the barriers databases place in the way of strategy, with RavenDB

Joe Green — Tue, 27 Jan 2026 11:46:00 +0000

If database technologies offered performance, flexibility and security, most professionals would be happy to get two of the three, and they might have to expect to accept some compromises, too. Systems optimised for speed demand manual tuning, while flexible platforms can impose costs when early designs become constraints. Security is, sadly, sometimes, a bolt-on, with DBAs relying on internal teams’ skills and knowledge not to introduce breaking changes.

RavenDB, however, exists because its founder saw the cumulative costs of those common trade-offs, and the inherent problems stemming from them. They wanted a database system that didn’t force developers and administrators to choose.

Abstracting away complexity

Oren Eini, RavenDB’s founder and CTO was working as a freelance database performance consultant nearly two decades ago. In an exclusive interview he recounted how he encountered many capable teams “digging themselves into a hole” as the systems in their care grew in complexity. Problems he was presented with didn’t stem from developers not possessing the required skills, but rather from system architecture. Databases tend to guide their developers towards fragile designs and punish developers for following those paths, he says. RavenDB was a project that began as a way to reduce friction when the unstoppable force of what’s required meets the mountain of database schema.

The platform’s emphasis is on performance and adaptability without (ironically) at some stage requiring the services of people like Oren. Armed with a bag full of experience and knowledge, he formed RavenDB, which has now been shipping for more than fifteen years – well before the current interest in AI-assisted development.

The bottom line is that over time, the RavenDB database adapts to what the organisation cares about, rather than what it guessed it might care about when the database was first spun up. “When I talk to business people,” Eini says, “I tell them I take care of data ownership complexity.”

For example, instead of expecting developers or DBAs to anticipate every possible query pattern, RavenDB observes queries as they are executed. If it detects that a query would benefit from an index, it creates one in the background, with minimal overhead on extant processing. This contrasts with most relational databases, where schema and indexing strategies are set by the initial developers, so are difficult to alter later, regardless of how an organisation may have changed.

Oren draws the comparison with pouring a building’s foundations before deciding where the doors and support columns might go. It’s an approach that can work, but when the business changes direction over the years, the cost of regretting those early decisions can be alarming.

Oren Eini (source: RavenDB)

Speaking ahead of the company’s appearance at the upcoming TechEx Global event in London this year (February 4 & 5, Olympia), he cited an example of a European client that struggled to expand into US markets because its database assumed a simple VAT rate that it had consigned to a single field, a schema not suitable for the complexities of state and federal sales taxes. From seemingly simple decisions made in the past (and perhaps not given much thought – European VAT is fairly standard), the client was storing financial pain and technical debt for the next generation.

Much of RavenDB’s attractiveness is manifest in practical details and small tweaks that make databases more performant and easier to address. Pagination, for example, requires two database calls in most systems (one to fetch a page of results, another to count matching records). RavenDB returns both in a single query. Individually, such optimisations may appear minor, but at scale they compound. Oren says. “If you smooth down the friction everywhere you go, you end up with a really good system where you don’t have to deal with friction.”

Compounded removal of frictions improves performance and makes developers’ jobs simpler. Related data is embedded or included without the penalties associated with table joins in relational databases, so complex queries are completed in a single round trip. Software engineers don’t need to be database specialists. In their world, they just formulate SQL-like queries to RavenDB’s APIs.

Compared to other NoSQL databases, Raven DB provides full ACID transactions by default, and reduced operational complexity: many of its baked-in features (ETL pipelines, subscriptions, full-text search, counters, time series, etc.) reduce the need for external systems.

In contrast with DBAs and software developers addressing a competing database system and its necessary adjuncts, both developers and admins spend less time sweating the detail with Raven DB. That’s good news, not least for those that hold an organisation’s purse strings.

Scaling to fit the purpose

RavenDB is also built to scale, as painlessly as it handles complex queries. It can create multi-node clusters if wanted so supports huge numbers of concurrent users. Such clusters are created by RavenDB without time-consuming manual configuration. “With RavenDB, this is normal cost of business,” he says.

In February this year, RavenDB Cloud announced version 7.2, and this being 2026, mention needs to be made of AI. Raven DB’s AI Assistant is, “in effect, […] a virtual DBA that comes inside of your database,” he says. The key word is inside. It’s designed for developers and administrators, not end users, answering their questions about indexing, storage usage or system behaviour.

AI as a professional tool

He’s sceptical about giving AIs unconfined access to any data store. Allowing an AI to act as a generic gatekeeper to sensitive information creates unavoidable security risks, because such systems are difficult to constrain reliably.

For the DBA and software developer, it’s another story – AI is a useful tool that operates as a helping hand, configuring and addressing the data. RavenDB’s AI assistant inherits the permissions of the user invoking it, having no privileged access of its own. “Anything it knows about your RavenDB instance comes because, behind the scenes, it’s accessing your system with your permissions,” he says.

The company’s AI strategy is to provide developers and admins with opinionated features: generating queries, explaining indexes, helping with schema exploration, and answering operational questions, with calls bounded by operator validation and privileges.

Teams developing applications with RavenDB get support for vector search, native embeddings, server-side indexing, and agnostic integration with external LLMs. This, Oren says, lets organisations deliver useful AI-driven features in their applications quickly, without exposing the business to risk and compliance issues.

Security and risk

Security and risk comprise one of those areas where RavenDB draws a clear line between it and its competitors. We touched on the recent MongoBleed vulnerability, which exposed data from unauthenticated MongoDB instances due to an interaction between compression and authentication code. Oren describes the issue as an architectural failure caused by mixing general-purpose and security-critical code paths. “The reason this is a vulnerability,” he says, “is specifically the fact that you’re trying to mix concerns.”

RavenDB uses established cryptographic infrastructure to handle authentication before any database logic is invoked. And even if a flaw emanated from elsewhere, the attack surface would be significantly smaller because unauthenticated users never reach the general code paths: that architectural separation limits the blast radius.

While the internals of RavenDB are highly technical and specialised, business decision-makers can easily appreciate that delays caused by schema changes, performance tuning, or infrastructure changes will have significant economic impact. But RavenDB’s malleability and speed also remove what Oren describes as the “no, you can’t do that” conversations.

Organisations running RavenDB reduce their dependency on specialist expertise, plus they get the ability to respond to changing business needs much more quickly. “[The database’s] role is to bring actual business value,” Eini says, arguing that infrastructure should, in operational contexts, fade into the background. As it stands, it often determines the scope of strategy discussions.

Migration and getting started

RavenDB uses a familiar SQL-like query language, and most teams will only need a day at most to get up to speed. Where friction does appear, Oren suggests, it is often due to assumptions carried over from other platforms around security and high availability. For RavenDB, these are built into the design so don’t cause extra workload that needs to be factored in.

Coming about as the result of the experience of operational pain by the company’s founder himself, RavenDB’s difference stems from accumulated design decisions: background indexing, query-aware optimisation, the separation of security and authentication issues, and latterly, the need for constraints on AI tooling. In everyday use, developers experience fewer sharp edges, and in the longer term, business leaders see a reduction in costs, especially around the times of change. The combination is compelling enough to displace entrenched platforms in many contexts.

To learn more, you can speak to RavenDB representatives at TechEx Global, held at Olympia, London, February 4 and 5. If what you’ve read here has awakened your interest, head over to the company’s website.

(Image source: “#316 AVZ Database” by Ralf Appelt is licensed under CC BY-NC-SA 2.0.)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Lowering the barriers databases place in the way of strategy, with RavenDB appeared first on AI News.

Re-engineering for better results: The Huawei AI stack

David Thomas — Mon, 27 Oct 2025 13:54:24 +0000

Huawei has released its CloudMatrix 384 AI chip cluster, a new system for AI learning. It employs clusters of Ascend 910C processors, joined via optical links. The distributed architecture means the system can outperform traditional hardware GPU setups, particularly in terms of resource use and on-chip time, despite the individual Ascend chips being less powerful than those of competitors.

Huawei’s new framework positions the tech giant as a “formidable challenger to Nvidia’s market-leading position, despite ongoing US sanctions,” the company claims.

To use the new Huawei framework for AI, data engineers will need to adapt their workflows, using frameworks that support Huawei’s Ascend processors, such MindSpore, which are available from Huawei and its partners

Framework transition: From PyTorch/TensorFlow to MindSpore

Unlike NVIDIA’S ecosystem, which predominantly uses frameworks like PyTorch and TensorFlow (engineered to take full advantage of CUDA), Huawei’s Ascend processors perform best when used with MindSpore, a deep learning framework developed by the company.

If data engineers already have models built in PyTorch or TensorFlow, they will likely need to convert models to the MindSpore format or retrain them using the MindSpore API.

It is worth noting that MindSpore uses different syntax, training pipelines and function calls from PyTorch or TensorFlow, so a degree of re-engineering will be necessary to replicate the results from model architectures and training pipelines. For instance, individual operator behaviour varies, such as padding modes in convolution and pooling layers. There are also differences in default weight initialisation methods.

Using MindIR for model deployment

MindSpore employs MindIR (MindSpore Intermediate Representation), a close analogue to Nvidia NIM. According to MindSpore’s official documentation, once a model has been trained in MindSpore, it can be exported using the mindspore.export utility, which converts the trained network into the MindIR format.

Detailed by DeepWiki’s guide, deploying a model for inference typically involves loading the exported MindIR model and then running predictions using MindSpore’s inference APIs for Ascend chips, which handle model de-serialisation, allocation, and execution.

MindSpore separates training and inference logic more explicitly than PyTorch or TensorFlow. Therefore, all preprocessing needs to match training inputs, and static graph execution must be optimised. MindSpore Lite or Ascend Model Zoo are recommended for additional hardware-specific tuning.

Adapting to CANN (Compute Architecture for Neural Networks)

Huawei’s CANN features a set of tools and libraries tailored for Ascend software, paralleling NVIDIA’s CUDA in functionality. Huawei recommends using CANN’s profiling and debugging tools to monitor and improve model performance on Ascend hardware.

Execution Modes: GRAPH_MODE vs.PYNATIVE_MODE

MindSpore provides two execution modes:

GRAPH_MODE – Compiles the computation graph before execution. This can result in faster execution and better performance optimisation since the graph can be analysed during compilation.
PYNATIVE_MODE – Immediately executes operations, resulting in simpler debugging processes, better suited, therefore, for the early stages of model development, due to its more granular error tracking.

For initial development, PYNATIVE_MODE is recommended for simpler iterative testing and debugging. When models are ready to be deployed, switching to GRAPH_MODE can help achieve maximum efficiency on Ascend hardware. Switching between modes lets engineering teams balance development flexibility with deployment performance.

Code should be adjusted for each mode. For instance, when in GRAPH_MODE, it’s best to avoid Python-native control flow where possible.

Deployment environment: Huawei ModelArts

As you might expect, Huawei’s ModelArts, the company’s cloud-based AI development and deployment platform, is tightly integrated with Huawei’s Ascend hardware and the MindSpore framework. While it is comparable to platforms like AWS SageMaker and Google Vertex AI, it is optimised for Huawei’s AI processors.

Huawei says ModelArts supports the full pipeline from data labelling and preprocessing to model training, deployment, and monitoring. Each stage of the pipeline is available via API or the web interface.

In summary

Adapting to MindSpore and CANN may necessitate training and time, particularly for teams accustomed to NVIDIA’s ecosystem, with data engineers needing to understand various new processes. These include how CANN handles model compilation and optimisation for Ascend hardware, adjusting tooling and automation pipelines designed initially for NVIDIA GPUs, and learning new APIs and workflows specific to MindSpore.

Although Huawei’s tools are evolving, they lack the maturity, stability, and broader ecosystem support that frameworks like PyTorch with CUDA offer. However, Huawei hopes that migrating to its processes and infrastructure will pay off in terms of results, and let organisations reduce reliance on US-based Nvidia.

Huawei’s Ascend processors may be powerful and designed for AI workloads, but they have only limited distribution in some countries. Teams outside Huawei’s core markets may struggle to test or deploy models on Ascend hardware, unless they use partner platforms, like ModelArts, that offer remote access.

Fortunately, Huawei provides extensive migration guides, support, and resources to support any transition.

(Image source: “Huawei P9” by 405 Mi16 is licensed under CC BY-NC-ND 2.0.)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Re-engineering for better results: The Huawei AI stack appeared first on AI News.

Businesses still face the AI data challenge

AI News — Tue, 21 Oct 2025 13:55:40 +0000

A few years ago, the business technology world’s favourite buzzword was ‘Big Data’ – a reference to organisations’ mass collection of information that could be used to suggest previously unexplored ways of operating, and float ideas about what strategies they may best pursue.

What’s becoming increasingly apparent is that the problems companies faced in using Big Data to their advantage still remain, and it’s a new technology – AI – that’s making those problems rise once again to the surface. Without tackling the problems that beset Big Data, AI implementations will continue to fail.

So what are the issues stopping AI deliver on its promises?

The vast majority of problems stem from the data resources themselves. To understand the issue, consider the following sources of information used in a very average working day.

In a small-to-medium sized business:

Spreadsheets, stored on users’ laptops, in Google Sheets, Office 365 cloud.
The customer relationship manager (CRM) platform.
Email exchanges between colleagues, customers, suppliers.
Word documents, PDFs, web forms.
Messaging apps.

In an enterprise business:

All of the above, plus,
Enterprise resource planning (ERP) systems.
Real-time data feeds.
Data lakes.
Disparate databases behind multiple point-products.

It’s worth noting that the simple list above isn’t comprehensive, and nor is it intended to be. What it demonstrates is that in just five lines, there are around a dozen places where information can be found. What Big Data needed (perhaps still needs) and what AI projects also rest on, is somehow bringing all those elements together in such a way that a computer algorithm can make sense of it.

Marketing behemoth Gartner’s hype cycle for artificial intelligence, 2024, placed AI-Ready Data on the upward curve of the hype cycle, estimating it would be 2-5 years before it reached the ‘plateau of productivity’. Given that AI systems mine and extract data, most organisations – save those of the very largest size – don’t have the foundations on which to build, and may not have AI assistance in the endeavour for another 1-4 years.

The underlying problem for AI implementation is the same as dogged Big Data innovations as they, in the past, made their way through the hype cycle – from innovation trigger, peak of inflated expectations, trough of disillusionment, slope of enlightenment, to plateau of productivity – data comes in many forms; it can be inconsistent; perhaps it adheres to different standards; it may be inaccurate or biased; it could be highly sensitive information, or old and therefore irrelevant.

Transforming data so it’s AI-ready remains a process that’s as relevant today (perhaps more so) than it’s ever been. Those companies wanting to get a jump start could experiment with the many data treatment platforms currently available, and as is becoming the common advice, might begin with discrete projects as test-beds to assess the effectiveness of emerging technologies.

The advantage of the latest data preparation and assembly systems is that they are designed to prepare an organisation’s information resources in ways that are designed for the data to be used by AI value-creation platforms. They can offer, for example, carefully-coded guardrails that will help ensure data compliance, and protect users from accessing biased or commercially-sensitive information.

But the challenge of producing coherent, safe, and well-formulated data resources remains an ongoing issue. As organisations gain more data in their everyday operations, compiling up-to-date data resources on which to draw is a constant process. Where big data could be considered a static asset, data for AI ingestion has to be prepared and treated in as close to real-time as possible.

The situation therefore remains a three-way balance between opportunity, risk, and cost. Never before has the choice of vendor or platform been so crucial to the modern business.

(Source: “Inside the business school” by Darien and Neil is licensed under CC BY-NC 2.0.)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Businesses still face the AI data challenge appeared first on AI News.

NVIDIA GPUs to power Oracle’s next-gen enterprise AI services

Ryan Daws — Tue, 14 Oct 2025 15:20:37 +0000

Oracle and NVIDIA have expanded their partnership to make enterprise AI services more available, powerful, and practical. The announcements, made during Oracle AI World, cover everything from monstrously powerful new hardware to deeply integrated software that aims to put AI at the very core of a company’s data.

Ian Buck, VP of Hyperscale and High-Performance Computing at NVIDIA, said: “Through this latest collaboration, Oracle and NVIDIA are marking new frontiers in cutting-edge accelerated computing—streamlining database AI pipelines, speeding data processing, powering enterprise use cases and making inference easier to deploy and scale on OCI.”

The headline announcement is the new OCI Zettascale10 computing cluster. This platform is accelerated by NVIDIA GPUs and engineered for the kind of AI training and inference workloads that would make a normal server weep.

OCI Zettascale10 promises a mighty 16 zettaflops of peak AI compute performance and is knitted together with NVIDIA’s Spectrum-X Ethernet, a networking fabric designed specifically to stop GPUs from sitting around waiting for data, allowing organisations to scale up to millions of processors efficiently.

But raw power is only half the story. The real substance of this partnership lies in the software integrations that aim to weave AI into every layer of a business’s operations.

Mahesh Thiagarajan, Executive VP of Oracle Cloud Infrastructure, commented: “OCI Zettascale10 delivers multi‑gigawatt capacity for the most challenging AI workloads with NVIDIA’s next-generation GPU platform.

“In addition, the native availability of NVIDIA AI Enterprise on OCI gives our joint customers a leading AI toolset close at hand to OCI’s 200+ cloud services, supporting a long tail of customer innovation.”

Giving your Oracle database a brain with AI

The foundation of this new strategy is the Oracle AI Database 26ai. For years, the conventional wisdom was to move your data to where the AI models are. Oracle is flipping that on its head, arguing that it’s far more secure and efficient to bring the AI to your data. This latest database release is the embodiment of that “AI for Data” vision.

Juan Loaiza, Executive VP of Oracle Database Technologies at Oracle, said: “By architecting AI and data together, Oracle AI Database makes ‘AI for Data’ simple to learn and simple to use. We enable our customers to easily deliver trusted AI insights, innovations, and productivity for all their data, everywhere, including both operational systems and analytic data lakes.”

One of the standout features is the ability to run agentic AI workflows inside your database. The AI agents can tackle complex questions by combining your enterprise’s private, sensitive data with public information, all without ever having to move that private data outside your secure environment. This is made possible by features like a Unified Hybrid Vector Search, which lets the AI look for context across all your data types, whether it’s in a relational table, a JSON file, or a spatial map.

Oracle is also clearly thinking about the long game with security. The new database implements NIST-approved quantum-resistant algorithms for data both in-flight and at-rest. It’s a defence against “harvest now, decrypt later” attacks, where hackers steal encrypted data today with the hope of breaking it with future quantum computers.

Holger Mueller, VP and Principal Analyst at Constellation Research, commented: “Great AI needs great data. With Oracle AI Database 26ai, customers get both. It’s the single place where their business data lives—current, consistent, and secure. And it’s the best place to use AI on that data without moving it.

“To help simplify and accelerate AI adoption, AI Database 26ai includes impressive new AI features that go beyond AI Vector Search. A highlight is Oracle’s architecting agentic AI into the database, enabling customers to build, deploy, and manage their own in-database AI agents using a no-code visual platform that includes pre-built agents.”

The new database is designed to work with NVIDIA’s toolset. Its programming interfaces can now plug directly into NVIDIA NeMo Retriever, a collection of microservices that handle the complicated plumbing of modern AI for an enterprise.

This makes it far easier for developers to implement things like retrieval-augmented generation, or RAG. In simple terms, RAG allows a language model to look up relevant facts in your company documents before it answers a question, making its responses far more accurate and useful.

The Oracle Private AI Services Container will also get a GPU-powered boost. This container lets businesses run AI models in their own secure environment. Soon, it will be able to offload the heavy lifting of creating vector embeddings – a core task for AI search – to powerful NVIDIA GPUs using the cuVS library. This promises to slash the time it takes to prepare data for AI applications.

Democratising enterprise AI

Beyond the database, the partnership aims to simplify the entire AI pipeline. The new Oracle AI Data Platform now includes a built-in NVIDIA GPU option and the NVIDIA RAPIDS Accelerator for Apache Spark. For data scientists and engineers, this is a big deal. It means they can speed up their data processing and machine learning workflows using GPUs, often without having to change a single line of their existing code.

All of these tools and capabilities are being consolidated within the Oracle AI Hub. The idea is to give organisations a single place to build, deploy, and manage their AI solutions. From the hub, users can deploy NVIDIA’s NIM microservices – which are like pre-packaged AI skills – through a simple, no-code interface.

To lower the barrier to entry even further, the full NVIDIA AI Enterprise software suite is now natively available within the OCI Console. This means that a developer can spin up a GPU instance and enable all the necessary NVIDIA tools with a few clicks, rather than going through a separate procurement process. It’s a small change that makes a big difference in how quickly teams can get started.

It’s clear that this collaboration is aimed at solving the real-world challenges businesses face when trying to adopt AI. By bringing the hardware, the data, and the software tools into one cohesive ecosystem, Oracle and NVIDIA are making a case that the era of practical, secure, and scalable enterprise AI has well and truly arrived.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post NVIDIA GPUs to power Oracle’s next-gen enterprise AI services appeared first on AI News.

Vibe analytics for data insights that are simple to surface

Muhammad Zulhusni — Mon, 13 Oct 2025 10:00:00 +0000

Every business, big or small, has a wealth of valuable data that can inform impactful decisions. But to extract insights, there’s usually a good deal of manual work that needs to be done on raw data, either by semitechnical users (such as founders and product leaders), or dedicated – and expensive – data specialists.

Either way, to produce real value, information has to be collected, shepherded, altered, and drawn from dozens of spreadsheets and different business platforms: the organisation’s CRM, its martech stack, e-commerce system, and website data, to name a few common examples. Clearly, that’s a time consuming process, and the outcomes can be old news, rather than up-to-the-minute insights.

Introducing vibe analytics

The ideal business solution would be querying real-time data using natural language (vs writing code in SQL or Python), with smart systems working in the background to correlate and parse different data sources and formats. This is vibe analysis, where users can simply ask questions in plain language and let AI do the heavy lifting. Instead of manual data-wrestling and business users spending hours uncovering insights hidden deep in datasets, they get results fast — in text, graphics, summaries, and, where needed, detailed breakdowns.

Fast and accurate data analysis is important to every organisation, but for many, real-time insights are crucial. In the agricultural sector, for example, Lumo uses Fabi.ai’s platform to manage large fleets of IoT devices, collecting telemetry data continuously and adjusting its systems based on collated, normalised, and parsed information.

Using vibe analysis, Lumo sees device performance immediately, as well as trends that develop over time. It pulls in weather data, and correlates the device fleet’s performance metrics with environmental factors. The data dashboards Lumo has built are not the result of many months of work writing data integration routines and front-end coding, but are a result of vibe analysis.

Getting under the hood

Sceptics of AI’s abilities often point to vibe-coding as an example of where things can go wrong, raising concerns about quality control and the “black box” nature of AI-driven analysis. Many users want visibility into how results are generated, with the option to inspect logic, tweak queries, or adjust API calls to ensure accuracy. When done well, vibe analytics addresses these concerns by combining transparency with rigour. Natural language inputs and modular build methods make it accessible to semitechnical users (such as founders and product leaders), while the underlying systems meet the accuracy and reliability standards expected by technical teams. This means users can trust the output whether they’re working independently or in collaboration with data scientists and developers.

Designed specifically for both data experts and semitechnical data users, Fabi is a generative BI platform that brings vibe analysis done right to life. The code it produces can be hidden away entirely, or shown verbatim and edited in place, giving semitechnical users a chance to understand how the analysis works under the hood, while allowing technical teams to verify and fine-tune the system’s output. Data flows from an organisation’s systems (the platform mediates connections) or is uploaded. The resultant actionable insights can be pushed/scheduled to email, slack, google sheets, displayed in graphics, text, or a mixture of both.

Fabi: A generative BI platform

Co-founder and CEO of Fabi, Marc Dupuis, describes how many organisations start using the analysis platform by testing workflows and queries on sample data before progressing to real-world analysis. As users delve into data troves and test their work, they can check its veracity, often in collaboration with someone more technically astute, thanks to the platform’s open, transparent view of Smartbooks to show what’s happening under the hood. It works the other way, too: semitechnical data users can confirm that the data being processed is relevant and accurate.

To address common concerns about quality control and “black-box” AI, Fabi limits vibe analysis to internally controlled, carefully accessed data sources, with built-in guardrails. Code can be shown verbatim and edited in place, giving semitechnical users visibility into how results are produced, while allowing technical teams to audit, verify, and fine-tune outputs. Collaborative sharing of reports, findings, and working code helps teams validate results without working outside their areas of expertise.

Typical workflows include real-time KPI dashboards; natural-language Q&A over operational and product data; correlation analyses (for example, device performance against weather conditions); cohort and trend exploration; A/B test readouts and experiment summaries; and scheduled, shareable reports that mix text, graphics, summaries, and detailed breakdowns. These collaborative workflows are designed to be efficient and intuitive, so, whether working collectively or solo, users can unlock insights from even the most complex data arrangements.

Fabi landed its first round of backing from Eniac Ventures in 2023, so it’s a company on the move. The team continues to expand its capabilities, with plans to make vibe analysis even more seamless for both semitechnical and technical users. Organisations interested in exploring the platform can start by testing workflows on sample data, then scale up to real-world use cases as they grow more confident in the system’s transparency and accuracy.

(Photo by Alina Grubnyak)

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Vibe analytics for data insights that are simple to surface appeared first on AI News.

Christian Spindeldreher, Dell Technologies: Powering AI at scale

Muhammad Zulhusni — Fri, 19 Sep 2025 08:24:42 +0000

Dell Technologies is betting on AI as companies move from small pilots to full-scale deployments, with the focus now on how organisations can turn AI into measurable results. But scaling AI is not easy. It demands strong infrastructure, reliable data management, and the ability to deploy models quickly in different workflows.

Dell has positioned itself to help companies make the leap. Its AI Factory, Data Lakehouse, and AI Data Platform – developed with support from NVIDIA and other partners – aim to give enterprises the building blocks needed to turn experiments into production systems.

AI News spoke with Christian Spindeldreher, EMEA Field Technology Officer for Data Management and AI at Dell Technologies, about what this shift looks like in practice, and how Dell’s latest developments are being used.

From pilots to measurable outcomes

Spindeldreher explained that Dell’s AI Factory and AI Data Platform, built on the Data Lakehouse, provide a unified foundation for scaling.

Christian Spindeldreher, EMEA Field Technology Officer for Data Management and AI at Dell Technologies.

“By integrating high-performance infrastructure with streamlined data management and accelerated model development, organisations can move beyond experimentation and deploy AI rapidly in workflows,” he said. The platform also simplifies access, governance, and analytics, giving teams the tools to generate value at scale.

A close partnership with NVIDIA brings compute and software tuned for demanding AI workloads, helping enterprises tackle more complex use cases without losing speed.

Dell’s AI Data Platform unlocks unstructured data

Dell recently added new capabilities to the AI Data Platform, including an unstructured data engine developed with Elastic and GPU-accelerated PowerEdge servers. The allows companies to handle the vast amount of information locked in documents, videos, and images.

“The Elastic-powered unstructured data engine enables real-time semantic and hybrid search, rapid content indexing, and secure access to massive volumes of unstructured data,” Spindeldreher said. This unlocks use cases like AI-driven knowledge retrieval, advanced digital assistants, recommendation systems, and real-time compliance checks.

GPU acceleration, powered by Dell PowerEdge servers and NVIDIA RTX PRO 6000 Blackwell GPUs, means enterprises can now run agentic AI workflows and multimodal analytics directly on these large datasets. Tasks like video summarisation, synthetic data generation, and generative AI asset management become more practical. “The updates deliver up to six times the token throughput for LLMs, support for more concurrent users, and make high-performance AI compute more accessible,” he said.

Tackling data gravity

One of the challenges in scaling AI is that data often sits in different places, making it expensive and slow to move. Dell’s Data Lakehouse aims to solve this by supporting federated queries in multiple sources.

This means organisations don’t need to create multiple copies of the same dataset. Integrated into a broader Data Fabric, the system ensures consistent access while also supporting domain-oriented Data Mesh principles that give teams autonomy over their own data. The end result, said Spindeldreher, is faster insights without duplication or unnecessary movement.

Dell’s AI Factory drives faster AI adoption

Dell’s AI Factory model has also helped speed adoption in industries where data sensitivity is a major concern. By keeping workloads on-premise, organisations avoid the delays and risks tied to cloud migration and compliance.

“Healthcare, finance, and government have seen faster time-to-value by using advanced AI tools while upholding strict privacy and residency requirements,” Spindeldreher said. Dell also provides services that cover everything from strategy to operations, giving customers a more straightforward path to adoption while managing complexity and risk.

Scaling infrastructure with partners

Partnerships are another part of Dell’s approach. The company is supplying servers for CoreWeave’s rollout of NVIDIA Blackwell Ultra GPUs, a project that demands high performance and efficient cooling.

“The platforms support the most demanding AI workflows,” Spindeldreher explained. “Scalability is key here – combined with efficient cooling to support maximum performance from rack to full data centre scale.”

Building a unified ecosystem

Behind these updates and partnerships is a broader strategy of integration. Dell’s goal, according to Spindeldreher, is simple: “faster time to value.”

The AI Factory helps customers identify the right use cases, while the Data Platform adds features for data processing, analytics, and secure consumption. Together, they allow organisations to spend less time designing platforms and more time applying AI.

Governance and responsible scaling

As AI spreads in industries, the risks around governance and security are also growing. Spindeldreher highlighted Dell’s work in embedding these principles into its platforms.

“The use of data products and data federation (even in clusters and locations) allows us to consolidate and secure data access,” he said. But he also pointed out that technology alone is not enough – enterprises need data strategies and supporting tools like Data Catalogs to manage compliance in multi-cloud environments.

The future of Dell and AI: what’s next

Looking ahead, Spindeldreher expects enterprises to move deeper into operational AI. Agentic AI, edge AI, and multi-modal systems will play a larger role, supported by new generations of compute, accelerators, and networking.

Dell also sees a place for AI closer to end users. “And not to forget,” he said, “the increasing use of AI on personal devices like AI-enabled PCs and laptops.”

Christian Spindeldreher and the Dell Technologies team will be sharing more insights at this year’s AI & Big Data Expo Europe in Amsterdam on 24-25 September 2025. Spindeldreher will be speaking as part of a presentation session titled ‘The Dell AI and Data Journey’ on day one of the leading industry event.

(Photo by Jay Prajapati)

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Christian Spindeldreher, Dell Technologies: Powering AI at scale appeared first on AI News.

Suvianna Grecu, AI for Change: Without rules, AI risks ‘trust crisis’

Ryan Daws — Fri, 08 Aug 2025 16:04:09 +0000

The world is in a race to deploy AI, but a leading voice in technology ethics warns prioritising speed over safety risks a “trust crisis.”

Suvianna Grecu, Founder of the AI for Change Foundation, argues that without immediate and strong governance, we are on a path to “automating harm at scale.”

Speaking on the integration of AI into critical sectors, Grecu believes that the most pressing ethical danger isn’t the technology itself, but the lack of structure surrounding its rollout.

Powerful systems are increasingly making life-altering decisions about everything from job applications and credit scores to healthcare and criminal justice, often without sufficient testing for bias or consideration of their long-term societal impact.

For many organisations, AI ethics remains a document of lofty principles rather than a daily operational reality. Grecu insists that genuine accountability only begins when someone is made truly responsible for the outcomes. The gap between intention and implementation is where the real risk lies.

Grecu’s foundation champions a shift from abstract ideas to concrete action. This involves embedding ethical considerations directly into development workflows through practical tools like design checklists, mandatory pre-deployment risk assessments, and cross-functional review boards that bring legal, technical, and policy teams together.

According to Grecu, the key is establishing clear ownership at every stage, building transparent and repeatable processes just as you would for any other core business function. This practical approach seeks to advance ethical AI, transforming it from a philosophical debate into a set of manageable, everyday tasks.

Partnering to build AI trust and mitigate risks

When it comes to enforcement, Grecu is clear that the responsibility can’t fall solely on government or industry. “It’s not either-or, it has to be both,” she states, advocating for a collaborative model.

In this partnership, governments must set the legal boundaries and minimum standards, particularly where fundamental human rights are at stake. Regulation provides the essential floor. However, industry possesses the agility and technical talent to innovate beyond mere compliance.

Companies are best positioned to create advanced auditing tools, pioneer new safeguards, and push the boundaries of what responsible technology can achieve.

Leaving governance entirely to regulators risks stifling the very innovation we need, while leaving it to corporations alone invites abuse. “Collaboration is the only sustainable route forward,” Grecu asserts.

Promoting a value-driven future

Looking beyond the immediate challenges, Grecu is concerned about more subtle, long-term risks that are receiving insufficient attention, namely emotional manipulation and the urgent need for value-driven technology.

As AI systems become more adept at persuading and influencing human emotion, she cautions that we are unprepared for the implications this has for personal autonomy.

A core tenet of her work is the idea that technology is not neutral. “AI won’t be driven by values, unless we intentionally build them in,” she warns. It’s a common misconception that AI simply reflects the world as it is. In reality, it reflects the data we feed it, the objectives we assign it, and the outcomes we reward.

Without deliberate intervention, AI will invariably optimise for metrics like efficiency, scale, and profit, not for abstract ideals like justice, dignity, or democracy, and that will naturally impact societal trust. This is why a conscious and proactive effort is needed to decide what values we want our technology to promote.

For Europe, this presents a critical opportunity. “If we want AI to serve humans (not just markets) we need to protect and embed European values like human rights, transparency, sustainability, inclusion and fairness at every layer: policy, design, and deployment,” Grecu explains.

This isn’t about halting progress. As she concludes, it’s about taking control of the narrative and actively “shaping it before it shapes us.”

Through her foundation’s work – including public workshops and during the upcoming AI & Big Data Expo Europe, where Grecu is a chairperson on day two of the event – she is building a coalition to guide the evolution of AI, and boost trust by keeping humanity at its very centre.

(Photo by Cash Macanaya)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Suvianna Grecu, AI for Change: Without rules, AI risks ‘trust crisis’ appeared first on AI News.

Alan Turing Institute: Humanities are key to the future of AI

Ryan Daws — Thu, 07 Aug 2025 15:18:27 +0000

A powerhouse team has launched a new initiative called ‘Doing AI Differently,’ which calls for a human-centred approach to future development.

For years, we’ve treated AI’s outputs like they’re the results of a giant math problem. But the researchers – from The Alan Turing Institute, the University of Edinburgh, AHRC-UKRI, and the Lloyd’s Register Foundation – behind this project say that’s the wrong way to look at it.

What AI is creating are basically cultural artifacts. They’re more like a novel or a painting than a spreadsheet. The problem is, AI is creating this “culture” without understanding any of it. It’s like someone who has memorised a dictionary but has no idea how to hold a real conversation.

This is why AI often fails when “nuance and context matter most,” says Professor Drew Hemment, Theme Lead for Interpretive Technologies for Sustainability at The Alan Turing Institute. The system just doesn’t have the “interpretive depth” to get what it’s really saying.

However, most of the AI in the world is built on just a handful of similar designs. The report calls this the “homogenisation problem” and future AI development must overcome this.

Imagine if every baker in the world used the exact same recipe. You’d get a lot of identical, and frankly, boring cakes. With AI, this means the same blind spots, the same biases, and the same limitations get copied and pasted into thousands of tools we use every day.

We saw this happen with social media. It was rolled out with simple goals, and we’re now living with the unintended societal consequences. The ‘Doing AI Differently’ team is sounding the alarm to make sure we don’t make that same mistake with AI.

The team has a plan to build a new kind of AI, one they call Interpretive AI. It’s about designing systems from the very beginning to work the way people do; with ambiguity, multiple viewpoints, and a deep understanding of context.

The vision is to create interpretive technologies that can offer multiple valid perspectives instead of just one rigid answer. It also means exploring alternative AI architectures to break the mould of current designs. Most importantly, the future isn’t about AI replacing us; it’s about creating human-AI ensembles where we work together, combining our creativity with AI’s processing power to solve huge challenges.

This has the potential to touch our lives in very real ways. In healthcare, for example, your experience with a doctor is a story, not just a list of symptoms. An interpretive AI could help capture that full story, improving your care and your trust in the system.

For climate action, it could help bridge the gap between global climate data and the unique cultural and political realities of a local community, creating solutions that actually work on the ground.

A new international funding call is launching to bring researchers from the UK and Canada together on this mission. But we’re at a crossroads.

“We’re at a pivotal moment for AI,” warns Professor Hemment. “We have a narrowing window to build in interpretive capabilities from the ground up”.

For partners like Lloyd’s Register Foundation, it all comes down to one thing: safety.

“As a global safety charity, our priority is to ensure future AI systems, whatever shape they take, are deployed in a safe and reliable manner,” says their Director of Technologies, Jan Przydatek.

This isn’t just about building better technology. It’s about creating an AI that can help solve our biggest challenges and, in the process, amplify the best parts of our own humanity.

(Photo by Ben Sweet)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Alan Turing Institute: Humanities are key to the future of AI appeared first on AI News.

Generative AI trends 2025: LLMs, data scaling & enterprise adoption

AI News — Wed, 06 Aug 2025 15:02:37 +0000

Generative AI is entering a more mature phase in 2025. Models are being refined for accuracy and efficiency, and enterprises are embedding them into everyday workflows.

The focus is shifting from what these systems could do to how they can be applied reliably and at scale. What’s emerging is a clearer picture of what it takes to build generative AI that is not just powerful, but dependable.

The new generation of LLMs

Large language models are shedding their reputation as resource-hungry giants. The cost of generating a response from a model has dropped by a factor of 1,000 over the past two years, bringing it in line with the cost of a basic web search. That shift is making real-time AI far more viable for routine business tasks.

Scale with control is also this year’s priority. The leading models (Claude Sonnet 4, Gemini Flash 2.5, Grok 4, DeepSeek V3) are still large, but they’re built to respond faster, reason more clearly, and run more efficiently. Size alone is no longer the differentiator. What matters is whether a model can handle complex input, support integration, and deliver reliable outputs, even when complexity increases.

Last year saw a lot of criticism of AI’s tendency to hallucinate. In one high-profile case, a New York lawyer faced sanctions for citing ChatGPT-invented legal cases. Similar failures across sensitive sectors pushed the issue into the spotlight.

This is something LLM companies have been combating this year. Retrieval-augmented generation (RAG), which combines search with generation to ground outputs in real data, has become a common approach. It helps reduce hallucinations but not eliminate them. Models can still contradict the retrieved content. New benchmarks such as RGB and RAGTruth are being used to track and quantify these failures, marking a shift toward treating hallucination as a measurable engineering problem rather than an acceptable flaw.

Navigating rapid innovation

One of the defining trends of 2025 is the speed of change. Model releases are accelerating, capabilities are shifting monthly, and what counts as state-of-the-art is constantly being redefined. For enterprise leaders, this creates a knowledge gap that can quickly turn into a competitive one.

Staying ahead means staying informed. Events like the AI and Big Data Expo Europe offer a rare chance to see where the technology is going next through real-world demos, direct conversations, and insights from those building and deploying these systems at scale.

Enterprise adoption

In 2025, the shift is toward autonomy. Many companies already use generative AI across core systems, but the focus now is on agentic AI. These are models designed to take action, not just generate content.

According to a recent survey, 78% of executives agree that digital ecosystems will need to be built for AI agents as much as for humans over the next three to five years. That expectation is shaping how platforms are designed and deployed. Here, AI is being integrated as an operator; it’s able to trigger workflows, interact with software, and handle tasks with minimal human input.

Breaking the data wall

One of the biggest barriers to progress in generative AI is data. Training large models has traditionally relied on scraping vast quantities of real-world text from the internet. But, in 2025, that well is running dry. High-quality, diverse, and ethically usable data is becoming harder to find, and more expensive to process.

This is why synthetic data is becoming a strategic asset. Rather than pulling from the web, synthetic data is generated by models to simulate realistic patterns. Until recently, it wasn’t clear whether synthetic data could support training at scale, but research from Microsoft’s SynthLLM project has confirmed that it can (if used correctly).

Their findings show that synthetic datasets can be tuned for predictable performance. Crucially, they also discovered that bigger models need less data to learn effectively; allowing teams to optimise their training approach rather than throwing resources at the problem.

Making it work

Generative AI in 2025 is growing up. Smarter LLMs, orchestrated AI agents, and scalable data strategies are now central to real-world adoption. For leaders navigating this shift, the AI & Big Data Expo Europe offers a clear view of how these technologies are being applied and what it takes to make them work.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Generative AI trends 2025: LLMs, data scaling & enterprise adoption appeared first on AI News.

Tencent releases versatile open-source Hunyuan AI models

Ryan Daws — Mon, 04 Aug 2025 14:58:20 +0000

Tencent has expanded its family of open-source Hunyuan AI models that are versatile enough for broad use. This new family of models is engineered to deliver powerful performance across computational environments, from small edge devices to demanding, high-concurrency production systems.

The release includes a comprehensive set of pre-trained and instruction-tuned models available on the developer platform Hugging Face. The models come in several sizes, specifically with parameter scales of 0.5B, 1.8B, 4B, and 7B, providing substantial flexibility for developers and businesses.

Tencent has indicated that these models were developed using training strategies similar to its more powerful Hunyuan-A13B model, allowing them to inherit its performance characteristics. This approach enables users to select the optimal model for their needs, whether it’s a smaller variant for resource-constrained edge computing or a larger model for high-throughput production workloads, all while ensuring strong capabilities.

One of the most notable features of the Hunyuan series is its native support for an ultra-long 256K context window. This allows the models to handle and maintain stable performance on long-text tasks, a vital capability for complex document analysis, extended conversations, and in-depth content generation. The models support what Tencent calls “hybrid reasoning,” which allows for both fast and slow thinking modes that users can choose between depending on their specific requirements.

The company has also placed a strong emphasis on agentic capabilities. The models have been optimised for agent-based tasks and have demonstrated leading results on established benchmarks such as BFCL-v3, τ-Bench, and C3-Bench, suggesting a high degree of proficiency in complex, multi-step problem-solving. For instance, on the C3-Bench, the Hunyuan-7B-Instruct model achieves a score of 68.5, while the Hunyuan-4B-Instruct model scores 64.3.

The series’ performance is a focus on efficient inference. Tencent’s Hunyuan models utilise Grouped Query Attention (GQA), a technique known for improving processing speed and reducing computational overhead. This efficiency is further enhanced by advanced quantisation support, a key element of the Hunyuan architecture designed to lower deployment barriers.

Tencent has developed its own compression toolset, AngleSlim, to create a more user-friendly and effective model compression solution. Using this tool, the company offers two main types of quantisation for the Hunyuan series.

The first is FP8 static quantisation, which employs an 8-bit floating-point format. This method uses a small amount of calibration data to pre-determine the quantisation scale without requiring full retraining, converting model weights and activation values into the FP8 format to boost inference efficiency.

The second method is INT4 quantisation, which achieves W4A16 quantisation through the GPTQ and AWQ algorithms:

The GPTQ approach processes model weights layer by layer, using calibration data to minimise errors in the quantised weights. This process avoids requiring model retraining and improves inference speed.

The AWQ algorithm works by statistically analysing the amplitude of activation values from a small set of calibration data. It then calculates a scaling coefficient for each weight channel, which expands the numerical range of important weights to retain more information during the compression process.

Developers can either use the AngleSlim tool themselves or download the pre-quantised models directly.

Performance benchmarks confirm the strong capabilities of the Tencent Hunyuan models across a range of tasks. The pre-trained Hunyuan-7B model, for example, achieves a score of 79.82 on the MMLU benchmark, 88.25 on GSM8K, and 74.85 on the MATH benchmark, demonstrating solid reasoning and mathematical skills.

The instruction-tuned variants show impressive results in specialised areas. In mathematics, the Hunyuan-7B-Instruct model scores 81.1 on the AIME 2024 benchmark, while the 4B version scores 78.3. In science, the 7B model reaches 76.5 on OlympiadBench, and in coding, it scores 42 on Livecodebench.

We're expanding the Tencent Hunyuan open-source LLM ecosystem with four compact models (0.5B, 1.8B, 4B, 7B)! Designed for low-power scenarios like consumer-grade GPUs, smart vehicles, smart home devices, mobile phones, and PCs, these models support cost-effective fine-tuning… pic.twitter.com/CknskVqPem
— Hunyuan (@TencentHunyuan) August 4, 2025

The quantisation benchmarks show minimal performance degradation. On the DROP benchmark, the Hunyuan-7B-Instruct model scores 85.9 in its base B16 format, 86.0 with FP8, and 85.7 with Int4 GPTQ, indicating that efficiency gains do not come at a cost to accuracy.

For deployment, Tencent recommends using established frameworks like TensorRT-LLM, vLLM, or SGLang to serve the Hunyuan models and create OpenAI-compatible API endpoints, ensuring they can be integrated smoothly into existing development workflows. This combination of performance, efficiency, and deployment flexibility positions the Hunyuan series as a continuing powerful contender in open-source AI.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Tencent releases versatile open-source Hunyuan AI models appeared first on AI News.