Reinforcement Learning - AI News

From cloud to factory – humanoid robots coming to workplaces

AI News — Fri, 09 Jan 2026 13:06:00 +0000

The partnership announced this week between Microsoft and Hexagon Robotics marks an inflection point in the commercialisation of humanoid, AI-powered robots for industrial environments. The two companies will combine Microsoft’s cloud and AI infrastructure with Hexagon’s expertise in robotics, sensors, and spatial intelligence to advance the deployment of physical AI systems in real-world settings.

At the centre of the collaboration is AEON, Hexagon’s industrial humanoid robot, a device designed to operate autonomously in environments like factories, logistics hubs, engineering plants, and inspection sites.

The partnership will focus on multimodal AI training, imitation learning, real-time data management, and integration with existing industrial systems. Initial target sectors include automotive, aerospace, manufacturing, and logistics, the companies say. It’s in these industries where labour shortages and operational complexity are already constraining financial growth.

The announcement is the sign of a maturing ecosystem: cloud platforms, physical AI, and robotics engineering’s convergence, making humanoid automation commercially viable.

Humanoid robots out of the research lab

While humanoid robots have been the subject of work at research institutions, demonstrated proudly at technology events, the last five years have seen a move to practical deployment in real-world, working environments. The main change has been the combination of improved perception, advances in reinforcement and imitation learning, and the availability of scalable cloud infrastructure.

One of the most visible examples is Agility Robotics’ Digit, a bipedal humanoid robot designed for logistics and warehouse operations. Digit has been piloted in live environments by companies like Amazon, where it performs material-handling tasks including tote movement and last-metre logistics. Such deployments tend to focus on augmenting human workers rather than replacing them, with Digit handling more physically demanding tasks.

Similarly, Tesla’s Optimus programme has moved out of the phase where concept videos were all that existed, and is now undergoing factory trials. Optimus robots are being tested on structured tasks like part handling and equipment transport inside Tesla’s automotive manufacturing facilities. While still limited in scope, these pilots demonstrate the pattern of humanoid-like machines chosen over less anthropomorphic form-factors so they can operate in human-designed and -populated spaces.

Inspection, maintenance, and hazardous environments

Industrial inspection is emerging as one of the earliest commercially viable use cases for humanoid and quasi-humanoid robots. Boston Dynamics’ Atlas, while not yet a general-purpose commercial product, has been used in live industrial trials for inspection and disaster-response environments. It can navigate uneven terrain, climb stairs, and manipulate tools in places considered unsafe for humans.

Toyota Research Institute has deployed humanoid robotics platforms for remote inspection and manipulation tasks in similar settings. Toyota’s systems rely on multimodal perception and human-in-the-loop control, the latter reinforcing an industry trend: early deployments prioritise reliability and traceability, so need human oversight.

Hexagon’s AEON aligns closely with this trend. Its emphasis on sensor fusion and spatial intelligence is relevant for inspection and quality assurance tasks, where precise understanding of physical environments is more valuable than the conversational abilities most associated with everyday use of AIs.

Cloud platforms central to robotics strategy

A defining feature of the Microsoft-Hexagon partnership is the use of cloud infrastructure in the scaling of humanoid robots. Training, updating, and monitoring physical AI systems generates large quantities of data, including video, force feedback from on-device sensors, spatial mapping (such as that derived from LIDAR), and operational telemetry. Managing this data locally has historically been a bottleneck, due to storage and processing constraints.

By using platforms like Azure and Azure IoT Operations, plus real-time intelligence services in the cloud, humanoid robots can be trained fleet-wide, not isolated units. This leads to multiple possibilities in shared learning, improvement by iteration, and greater consistency. For board-level buyers, these IT architecture shifts mean humanoid robots become viable entities that can be treated – in terms of IT requirements – more like enterprise software than machinery.

Labour shortages drive adoption

The demographic trends in manufacturing, logistics, and asset-intensive industries are increasingly unfavourable. Ageing workforces, declining interest in manual roles, and persistent skills shortages create skills gaps that conventional automation cannot fully address – at least, not without rebuilding entire facilities to be more suited to a robotic workforce. Fixed robotic systems excel in repetitive, predictable tasks but struggle in dynamic, human environments.

Humanoid robots occupy a middle ground. Not designed to replace workflows, they can stabilise operations where human availability is uncertain. Case studies show early value in night shifts, periods of peak demand, and tasks deemed too hazardous for humans.

What boards should evaluate before investing

For decision-makers considering investment in next-generation workplace robots, several issues to note have emerged from existing, real-world deployments:

Task specificity matters more than general intelligence, with the more successful pilots focusing on well-defined activities. Data governance and security continue to have to be placed front and centre when robots are put into play, especially so when it’s necessary to connect them to cloud platforms.

At a human level, workforce integration can be more challenging than sourcing, installing, and running the technology itself. Yet human oversight remains essential at this stage in AI maturity, for safety and regulatory acceptance.

A measured but irreversible shift

Humanoid robots won’t replace the human workforce, but an increasing body of evidence from live deployments and prototyping shows such devices are moving into the workplace. As of now, humanoid, AI-powered robots can perform economically-valuable tasks, and integration with existing industrial systems is immensely possible. For boards with the appetite to invest, the question could be when competitors might deploy the technology responsibly and at scale.

(Image source: Source: Hexagon Robotics)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post From cloud to factory – humanoid robots coming to workplaces appeared first on AI News.

Deep Cogito v2: Open-source AI that hones its reasoning skills

Ryan Daws — Fri, 01 Aug 2025 14:11:47 +0000

Deep Cogito has released Cogito v2, a new family of open-source AI models that sharpen their own reasoning skills.

Released under an open-source licence, the new Cogito v2 lineup includes four hybrid reasoning AI models: two mid-sized at 70B and 109B parameters, and two large-scale versions at 405B and 671B.

The largest, a 671B Mixture-of-Experts (MoE) model, is already being touted as one of the most powerful open-source AIs in the world. The company reports that it competes with the latest from DeepSeek and is closing the gap on proprietary systems like O3 and Claude 4 Opus.

But the real story isn’t just about size or power; it’s about a fundamental shift in how the AI learns. Instead of just ‘thinking’ longer at inference time to find an answer, Cogito v2 is designed to internalise its own reasoning processes.

This internalised reasoning is achieved through a technique called Iterated Distillation and Amplification (IDA), which distils the discoveries from a search back into the model’s core parameters. The goal is to build a stronger ‘intuition’, allowing the model to anticipate the outcome of its own reasoning without having to perform the entire search.

Because the open-source AI models have a better “gut feeling” for the right approach, their reasoning chains are 60% shorter than those of rivals like Deepseek R1.

This efficiency extends to the budget. Deep Cogito says that it developed all its models – from experiments to final training – for a combined total of less than $3.5 million. Still a large sum likely for you or I, but miniscule compared to the spending of many of the leading AI labs.

The flagship 671B model received special attention, trained not only to improve its final answers but to refine the thinking process itself. This approach discourages the model from “meandering” and rewards a more direct path to the solution. The performance data suggests it works, with Deep Cogito’s open-source AI model matching or exceeding the latest DeepSeek versions on key benchmarks while being close to proprietary alternatives:

Perhaps one of the most surprising outcomes is the models’ ability to reason about images; a skill they were never explicitly trained for.

The team shared an example of this reasoning where Deep Cogito’s open-source AI model compared two images of a duck and a lion, demonstrating a deep thinking process about their habitats, colours, and composition purely through transfer learning. Deep Cogito believes this emergent property could be a powerful way to bootstrap training data for future multimodal reasoning systems.

Looking ahead, the Deep Cogito team plans to “hill climb on the gains of iterative self-improvement” in its quest to build superintelligence. They have restated their commitment that all AI models they create will be open-source.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Deep Cogito v2: Open-source AI that hones its reasoning skills appeared first on AI News.

RAGEN: AI framework tackles LLM agent instability

Ryan Daws — Thu, 24 Apr 2025 16:06:47 +0000

Researchers have introduced RAGEN, an AI framework designed to counter LLM agent instability when handling complex situations.

Training these AI agents presents significant hurdles, particularly when decisions span multiple steps and involve unpredictable feedback from the environment. While reinforcement learning (RL) has shown promise in static tasks like solving maths problems or generating code, its application to dynamic, multi-turn agent training has been less explored.

Addressing this gap, a collaborative team from institutions including Northwestern University, Stanford University, Microsoft, and New York University has proposed StarPO (State-Thinking-Actions-Reward Policy Optimisation).

StarPO offers a generalised approach for training agents at the trajectory level (i.e. it optimises the entire sequence of interactions, not just individual actions.)

Accompanying this is RAGEN, a modular system built to implement StarPO. This enables the training and evaluation of LLM agents, particularly focusing on their reasoning capabilities under RL. RAGEN provides the necessary infrastructure for rollouts, reward assignment, and optimisation within multi-turn, stochastic (randomly determined) environments.

Minimalist environments, maximum insight

To isolate the core learning challenges from confounding factors like extensive pre-existing knowledge or task-specific engineering, the researchers tested LLMs using RAGEN in three deliberately minimalistic, controllable symbolic gaming environments:

Bandit: A single-turn, stochastic task testing risk-sensitive symbolic reasoning. The agent chooses between options (like ‘Phoenix’ or ‘Dragon’ arms) with different, initially unknown, reward profiles.
Sokoban: A multi-turn, deterministic puzzle requiring foresight and planning, as actions (pushing boxes) are irreversible.
Frozen Lake: A multi-turn, stochastic grid navigation task where movement attempts can randomly fail, demanding planning under uncertainty.

These environments allow for clear analysis of how agents learn decision-making policies purely through interaction.

Key findings: Stability, rollouts, and reasoning

The study yielded three significant findings concerning the training of self-evolving LLM agents:

The ‘Echo Trap’ and the need for stability

A recurring problem observed during multi-turn RL training was dubbed the “Echo Trap”. Agents would initially improve but then suffer performance collapse, overfitting to locally rewarded reasoning patterns.

This was marked by collapsing reward variance, falling entropy (a measure of randomness/exploration), and sudden spikes in gradients (indicating training instability). Early signs included drops in reward standard deviation and output entropy.

To combat this, the team developed StarPO-S, a stabilised version of the framework. StarPO-S incorporates:

Variance-based trajectory filtering: Focusing training on task instances where the agent’s behaviour shows higher uncertainty (higher reward variance), discarding low-variance, less informative rollouts. This improved stability and efficiency.
Critic incorporation: Using methods like PPO (Proximal Policy Optimisation), which employ a ‘critic’ to estimate value, generally showed better stability than critic-free methods like GRPO (Group Relative Policy Optimisation) in most tests.
Decoupled clipping and KL removal: Techniques adapted from other research (DAPO) involving asymmetric clipping (allowing more aggressive learning from positive rewards) and removing KL divergence penalties (encouraging exploration) further boosted stability and performance.

StarPO-S consistently delayed collapse and improved final task performance compared to vanilla StarPO.

Rollout quality is crucial

The characteristics of the ‘rollouts’ (simulated interaction trajectories used for training) significantly impact learning. Key factors identified include:

Task diversity: Training with a diverse set of initial states (prompts), but with multiple responses generated per prompt, aids generalisation. The sweet spot seemed to be moderate diversity enabling contrast between different outcomes in similar scenarios.
Interaction granularity: Allowing multiple actions per turn (around 5-6 proved optimal) enables better planning within a fixed turn limit, without introducing the noise associated with excessively long action sequences.
Rollout frequency: Using fresh, up-to-date rollouts that reflect the agent’s current policy is vital. More frequent sampling (approaching an ‘online’ setting) leads to faster convergence and better generalisation by reducing policy-data mismatch.

Maintaining freshness, alongside appropriate action budgets and task diversity, is key for stable training.

Reasoning requires careful reward design

Simply prompting models to ‘think’ doesn’t guarantee meaningful reasoning emerges, especially in multi-turn tasks. The study found:

Reasoning traces helped generalisation in the simpler, single-turn Bandit task, even when symbolic cues conflicted with rewards.
In multi-turn tasks like Sokoban, reasoning benefits were limited, and the length of ‘thinking’ segments consistently declined during training. Agents often regressed to direct action selection or produced “hallucinated reasoning” if rewards only tracked task success, revealing a “mismatch between thoughts and environment states.”

This suggests that standard trajectory-level rewards (often sparse and outcome-based) are insufficient.

“Without fine-grained, reasoning-aware reward signals, agent reasoning hardly emerge[s] through multi-turn RL.”

The researchers propose that future work should explore rewards that explicitly evaluate the quality of intermediate reasoning steps, perhaps using format-based penalties or rewarding explanation quality, rather than just final outcomes.

RAGEN and StarPO: A step towards self-evolving AI

The RAGEN system and StarPO framework represent a step towards training LLM agents that can reason and adapt through interaction in complex, unpredictable environments.

This research highlights the unique stability challenges posed by multi-turn RL and offers concrete strategies – like StarPO-S’s filtering and stabilisation techniques – to mitigate them. It also underscores the critical role of rollout generation strategies and the need for more sophisticated reward mechanisms to cultivate genuine reasoning, rather than superficial strategies or hallucinations.

Why does your RL training always collapse?

In our new paper of RAGEN, we explore what breaks when you train LLM *Agents* with multi-turn reinforcement learning—and possibly how to fix it.

https://t.co/z0U0612HWT
https://t.co/4DUfaees48
1/ pic.twitter.com/Oy6ilkgimd
— Zihan Wang – on RAGEN (@wzihanw) April 23, 2025

While acknowledging limitations – including the need to test on larger models and optimise for domains without easily verifiable rewards – the work opens “a scalable and principled path for building AI systems” in areas demanding complex interaction and verifiable outcomes, such as theorem proving, software engineering, and scientific discovery.

(Image by Gerd Altmann)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post RAGEN: AI framework tackles LLM agent instability appeared first on AI News.

Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase

Ryan Daws — Thu, 06 Mar 2025 09:14:13 +0000

The Qwen team at Alibaba has unveiled QwQ-32B, a 32 billion parameter AI model that demonstrates performance rivalling the much larger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Learning (RL) on robust foundation models.

The Qwen team have successfully integrated agent capabilities into the reasoning model, enabling it to think critically, utilise tools, and adapt its reasoning based on environmental feedback.

“Scaling RL has the potential to enhance model performance beyond conventional pretraining and post-training methods,” the team stated. “Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models.”

QwQ-32B achieves performance comparable to DeepSeek-R1, which boasts 671 billion parameters (with 37 billion activated), a testament to the effectiveness of RL when applied to robust foundation models pretrained on extensive world knowledge. This remarkable outcome underscores the potential of RL to bridge the gap between model size and performance.

The model has been evaluated across a range of benchmarks, including AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL, designed to assess its mathematical reasoning, coding proficiency, and general problem-solving capabilities.

The results highlight QwQ-32B’s performance in comparison to other leading models, including DeepSeek-R1-Distilled-Qwen-32B, DeepSeek-R1-Distilled-Llama-70B, o1-mini, and the original DeepSeek-R1.

Benchmark results:

AIME24: QwQ-32B achieved 79.5, slightly behind DeepSeek-R1-6718’s 79.8, but significantly ahead of OpenAl-o1-mini’s 63.6 and the distilled models.
LiveCodeBench: QwQ-32B scored 63.4, again closely matched by DeepSeek-R1-6718’s 65.9, and surpassing the distilled models and OpenAl-o1-mini’s 53.8.
LiveBench: QwQ-32B achieved 73.1, with DeepSeek-R1-6718 scoring 71.6, and outperforming the distilled models and OpenAl-o1-mini’s 57.5.
IFEval: QwQ-32B scored 83.9, very close to DeepSeek-R1-6718’s 83.3, and leading the distilled models and OpenAl-o1-mini’s 59.1.
BFCL: QwQ-32B achieved 66.4, with DeepSeek-R1-6718 scoring 62.8, demonstrating a lead over the distilled models and OpenAl-o1-mini’s 49.3.

The Qwen team’s approach involved a cold-start checkpoint and a multi-stage RL process driven by outcome-based rewards. The initial stage focused on scaling RL for math and coding tasks, utilising accuracy verifiers and code execution servers. The second stage expanded to general capabilities, incorporating rewards from general reward models and rule-based verifiers.

“We find that this stage of RL training with a small amount of steps can increase the performance of other general capabilities, such as instruction following, alignment with human preference, and agent performance, without significant performance drop in math and coding,” the team explained.

QwQ-32B is open-weight and available on Hugging Face and ModelScope under the Apache 2.0 license, and is also accessible via Qwen Chat. The Qwen team views this as an initial step in scaling RL to enhance reasoning capabilities and aims to further explore the integration of agents with RL for long-horizon reasoning.

“As we work towards developing the next generation of Qwen, we are confident that combining stronger foundation models with RL powered by scaled computational resources will propel us closer to achieving Artificial General Intelligence (AGI),” the team stated.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase appeared first on AI News.

DeepSeek-R1 reasoning models rival OpenAI in performance

Ryan Daws — Mon, 20 Jan 2025 14:36:16 +0000

DeepSeek has unveiled its first-generation DeepSeek-R1 and DeepSeek-R1-Zero models that are designed to tackle complex reasoning tasks.

DeepSeek-R1-Zero is trained solely through large-scale reinforcement learning (RL) without relying on supervised fine-tuning (SFT) as a preliminary step. According to DeepSeek, this approach has led to the natural emergence of “numerous powerful and interesting reasoning behaviours,” including self-verification, reflection, and the generation of extensive chains of thought (CoT).

“Notably, [DeepSeek-R1-Zero] is the first open research to validate that reasoning capabilities of LLMs can be incentivised purely through RL, without the need for SFT,” DeepSeek researchers explained. This milestone not only underscores the model’s innovative foundations but also paves the way for RL-focused advancements in reasoning AI.

However, DeepSeek-R1-Zero’s capabilities come with certain limitations. Key challenges include “endless repetition, poor readability, and language mixing,” which could pose significant hurdles in real-world applications. To address these shortcomings, DeepSeek developed its flagship model: DeepSeek-R1.

Introducing DeepSeek-R1

DeepSeek-R1 builds upon its predecessor by incorporating cold-start data prior to RL training. This additional pre-training step enhances the model’s reasoning capabilities and resolves many of the limitations noted in DeepSeek-R1-Zero.

Notably, DeepSeek-R1 achieves performance comparable to OpenAI’s much-lauded o1 system across mathematics, coding, and general reasoning tasks, cementing its place as a leading competitor.

DeepSeek has chosen to open-source both DeepSeek-R1-Zero and DeepSeek-R1 along with six smaller distilled models. Among these, DeepSeek-R1-Distill-Qwen-32B has demonstrated exceptional results—even outperforming OpenAI’s o1-mini across multiple benchmarks.

MATH-500 (Pass@1): DeepSeek-R1 achieved 97.3%, eclipsing OpenAI (96.4%) and other key competitors.
LiveCodeBench (Pass@1-COT): The distilled version DeepSeek-R1-Distill-Qwen-32B scored 57.2%, a standout performance among smaller models.
AIME 2024 (Pass@1): DeepSeek-R1 achieved 79.8%, setting an impressive standard in mathematical problem-solving.

DeepSeek-R1 is here!

Performance on par with OpenAI-o1
Fully open-source model & technical report
MIT licensed: Distill & commercialize freely!

Website & API are live now! Try DeepThink at https://t.co/v1TFy7LHNy today!

1/n pic.twitter.com/7BlpWAPu6y
— DeepSeek (@deepseek_ai) January 20, 2025

A pipeline to benefit the wider industry

DeepSeek has shared insights into its rigorous pipeline for reasoning model development, which integrates a combination of supervised fine-tuning and reinforcement learning.

According to the company, the process involves two SFT stages to establish the foundational reasoning and non-reasoning abilities, as well as two RL stages tailored for discovering advanced reasoning patterns and aligning these capabilities with human preferences.

“We believe the pipeline will benefit the industry by creating better models,” DeepSeek remarked, alluding to the potential of their methodology to inspire future advancements across the AI sector.

One standout achievement of their RL-focused approach is the ability of DeepSeek-R1-Zero to execute intricate reasoning patterns without prior human instruction—a first for the open-source AI research community.

Importance of distillation

DeepSeek researchers also highlighted the importance of distillation—the process of transferring reasoning abilities from larger models to smaller, more efficient ones, a strategy that has unlocked performance gains even for smaller configurations.

Smaller distilled iterations of DeepSeek-R1 – such as the 1.5B, 7B, and 14B versions – were able to hold their own in niche applications. The distilled models can outperform results achieved via RL training on models of comparable sizes.

Bonus: Open-Source Distilled Models!

Distilled from DeepSeek-R1, 6 small models fully open-sourced
32B & 70B models on par with OpenAI-o1-mini
Empowering the open-source community

Pushing the boundaries of **open AI**!

2/n pic.twitter.com/tfXLM2xtZZ
— DeepSeek (@deepseek_ai) January 20, 2025

For researchers, these distilled models are available in configurations spanning from 1.5 billion to 70 billion parameters, supporting Qwen2.5 and Llama3 architectures. This flexibility empowers versatile usage across a wide range of tasks, from coding to natural language understanding.

DeepSeek has adopted the MIT License for its repository and weights, extending permissions for commercial use and downstream modifications. Derivative works, such as using DeepSeek-R1 to train other large language models (LLMs), are permitted. However, users of specific distilled models should ensure compliance with the licences of the original base models, such as Apache 2.0 and Llama3 licences.

(Photo by Prateek Katyal)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post DeepSeek-R1 reasoning models rival OpenAI in performance appeared first on AI News.

New AI training techniques aim to overcome current challenges

Joe Green — Thu, 28 Nov 2024 11:58:28 +0000

OpenAI and other leading AI companies are developing new training techniques to overcome limitations of current methods. Addressing unexpected delays and complications in the development of larger, more powerful language models, these fresh techniques focus on human-like behaviour to teach algorithms to ‘think.

Reportedly led by a dozen AI researchers, scientists, and investors, the new training techniques, which underpin OpenAI’s recent ‘o1’ model (formerly Q* and Strawberry), have the potential to transform the landscape of AI development. The reported advances may influence the types or quantities of resources AI companies need continuously, including specialised hardware and energy to aid the development of AI models.

The o1 model is designed to approach problems in a way that mimics human reasoning and thinking, breaking down numerous tasks into steps. The model also utilises specialised data and feedback provided by experts in the AI industry to enhance its performance.

Since ChatGPT was unveiled by OpenAI in 2022, there has been a surge in AI innovation, and many technology companies claim existing AI models require expansion, be it through greater quantities of data or improved computing resources. Only then can AI models consistently improve.

Now, AI experts have reported limitations in scaling up AI models. The 2010s were a revolutionary period for scaling, but Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, says that the training of AI models, particularly in the understanding language structures and patterns, has levelled off.

“The 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again. Scaling the right thing matters more now,” they said.

In recent times, AI lab researchers have experienced delays in and challenges to developing and releasing large language models (LLM) that are more powerful than OpenAI’s GPT-4 model.

First, there is the cost of training large models, often running into tens of millions of dollars. And, due to complications that arise, like hardware failing due to system complexity, a final analysis of how these models run can take months.

In addition to these challenges, training runs require substantial amounts of energy, often resulting in power shortages that can disrupt processes and impact the wider electriciy grid. Another issue is the colossal amount of data large language models use, so much so that AI models have reportedly used up all accessible data worldwide.

Researchers are exploring a technique known as ‘test-time compute’ to improve current AI models when being trained or during inference phases. The method can involve the generation of multiple answers in real-time to decide on a range of best solutions. Therefore, the model can allocate greater processing resources to difficult tasks that require human-like decision-making and reasoning. The aim – to make the model more accurate and capable.

Noam Brown, a researcher at OpenAI who helped develop the o1 model, shared an example of how a new approach can achieve surprising results. At the TED AI conference in San Francisco last month, Brown explained that “having a bot think for just 20 seconds in a hand of poker got the same boosting performance as scaling up the model by 100,000x and training it for 100,000 times longer.”

Rather than simply increasing the model size and training time, this can change how AI models process information and lead to more powerful, efficient systems.

It is reported that other AI labs have been developing versions of the o1 technique. The include xAI, Google DeepMind, and Anthropic. Competition in the AI world is nothing new, but we could see a significant impact on the AI hardware market as a result of new techniques. Companies like Nvidia, which currently dominates the supply of AI chips due to the high demand for their products, may be particularly affected by updated AI training techniques.

Nvidia became the world’s most valuable company in October, and its rise in fortunes can be largely attributed to its chips’ use in AI arrays. New techniques may impact Nvidia’s market position, forcing the company to adapt its products to meet the evolving AI hardware demand. Potentially, this could open more avenues for new competitors in the inference market.

A new age of AI development may be on the horizon, driven by evolving hardware demands and more efficient training methods such as those deployed in the o1 model. The future of both AI models and the companies behind them could be reshaped, unlocking unprecedented possibilities and greater competition.

The post New AI training techniques aim to overcome current challenges appeared first on AI News.

How cold hard data science harnesses AI with Wolfram Research

AI News — Mon, 30 Sep 2024 14:34:30 +0000

It’s sometimes difficult to distinguish the reality of technology from the hype and marketing messages that bombard our inboxes daily. In just the last five years, we’ve probably heard too much about the metaverse, blockchain and virtual reality, for example. At present, we’re in the midst of a furore about the much-abused term ‘AI’, and time will tell whether this particular storm will be seen as a teacup resident.

Artificial Intelligence News spoke exclusively to Jon McLoone, the Director of Technical Communication and Strategy at of one the most mature organisations in the computational intelligence and scientific innovation space, Wolfram Research, to help us put our present concepts of AI and their practical uses into a deeper context.

Jon has worked at Wolfram Research for 32 years in various roles, currently leading the European Technical Services team. A mathematician by training and a skilled practitioner in many aspects of data analysis, we began our interview by having him describe Wolfram’s work in an elevator pitch format.

Jon McLoone

“Our value proposition is that we know computation and Wolfram technology. We tailor our technology to the problem that an organisation has. That’s across a broad range of things. So, we don’t have a typical customer. What they have in common is they’re doing something innovative.”

“We’re doing problem-solving, the type of things that use computation and data science. We’re building out a unified platform for computation, and when we talk about computation, we mean the kinds of technical computing, like engineering calculations, data science and machine learning. It’s things like social network analysis, biosciences, actuarial science, and financial computations. Abstractly, these are all fundamentally mathematical things.”

“Our world is all those structured areas where we’ve spent 30 years building out different ontologies. We have a symbolic representation of the maths, but also things like graphs and networks, documents, videos, images, audio, time series, entities in the real world, like cities, rivers, and mountains. My team is doing the fun stuff of actually making it do something useful!”

“AI we just see as another kind of computation. There were different algorithms that have been developed over years, some of them hundreds of years ago, some of them only tens of years ago. Gen AI just adds to this list.”

Claims made about AI in 2024 can sometimes be overoptimistic, so we need to be realistic about its capabilities and consider what it excels at and where it falls short.

“There’s still human intelligence, which still remains as the strategic element. You’re not going to say, in the next five years AI will run my company and make decisions. Generative AI is very fluent but is unreliable. Its job is to be plausible, not to be correct. And particularly when you get into the kinds of things Wolfram does, it’s terrible because it will tell you the kinds of things that your mathematical answer would look like.” (Artificial Intelligence News‘ italics.)

The work of Wolfram Research in this context focuses on what Jon terms ‘symbolic AI’. To differentiate generative and symbolic AI, he gave us the analogy of modelling the trajectory of a thrown ball. A generative AI would learn how the ball travels by examining many thousands of such throws and then be able to produce a description of the trajectory. “That description would be plausible. That kind of model is data-rich, understanding poor.”

A symbolic representation of the thrown ball, on the other hand, would involve differential equations for projectile motion and representations of elements: mass, viscosity of the atmosphere, friction, and many other factors. “It could then be asked, ‘What happens if I throw the ball on Mars?’ It’ll say something accurate. It’s not going to fail.”

The ideal way to solve business (or scientific, medical, or engineering) problems is a combination of human intelligence, symbolic reasoning, as epitomised in Wolfram Language, and what we now term AI acting as the glue between them. AI is a great technology for interpreting meaning and acting as an interface between the component parts.

“Some of the interesting crossovers are where we take natural language and turn that into some structured information that you can then compute with. Human language is very messy and ambiguous, and generative AI is very good at mapping that to some structure. Once you’re in a structured world of something that is syntactically formal, then you can do things on it.”

A recent example of combining ‘traditional’ AI with the work of Wolfram involved medical records:

“We did a project recently taking medical reports, which were handwritten, typed and digital. But they contain words, and trying to do statistics on those isn’t possible. And so, you’ve got to use the generative AI part for mapping all of these words to things like classes: was this an avoidable death? Yes. No. That’s a nice, structured key value pair. And then once we’ve got that information in structured form (for example a piece of JSON or XML, or whatever your chosen structure), we can then do classical statistics to start saying, ‘Is there a trend? Can we project? Was there an impact from COVID on hospital harms?’ Clear-cut questions that you can approach symbolically with things like means and medians and models.”

During our interview, Jon also gave a précis of a presentation, which took as its example of his organisation’s work, an imaginary peanut butter cup manufacturing plant. What might be the effects of changing out a particular ingredient or altering some detail of the recipe and the effects of that change on the product’s shelf life?

“LLMs (large language models) will say, ‘Oh, they’ll probably last a few weeks because peanut butter cups usually sit on the shelf a few weeks. But going to a computational model that can plug into the ingredients, and compute, and you’ll know this thing should last for eight weeks before it goes off. Or what that change might do to the manufacturing process? A computational model can connect to the digital twin of your manufacturing plant and learn, ‘That will slow things down by 3%, so your productivity will fall by 20% because it creates a bottleneck here.’ LLMs are great at connecting you and your question to the model, maths, data science or the database. And that’s really an interesting three-way meeting of minds.”

You can catch Wolfram Research at the upcoming TechEx event in Amsterdam, October 1-2, at stand 166 of the AI & Big Data strand. We can’t guarantee any peanut butter-related discussion at the event, but to discover how powerful modelling and generative AI can be harnessed to solve your specific problems and quandaries, contact the company via its website.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post How cold hard data science harnesses AI with Wolfram Research appeared first on AI News.

Enhancing healthcare documentation with IDP

AI News — Thu, 26 Sep 2024 16:01:57 +0000

Healthcare documentation is an integral part of the sector that ensures the delivery of high-quality care and maintains the continuity of patient information. However, as healthcare providers have to deal with excessive amounts of data, managing it can feel overwhelming. With the advent of intelligent document processing technology, a new solution can now be implemented. This article explores how such technology works, its role in healthcare documentation, and its benefits, limitations, and implications for the future.

Intelligent document processing and its importance

Intelligent document processing is a more advanced type of automation based on AI technology, machine learning, natural language processing, and optical character recognition to collect, process, and organise data from multiple forms of paperwork. Unlike traditional document systems, IDP can handle unstructured and semi-structured data for multiple healthcare documents, which can exist in various forms. As such data is based on advanced, permanent algorithms and artificial intelligence tools, IDP can enhance the functions of healthcare providers and assist them in the care delivery process.

IDP’s role in healthcare documentation

Multiple forms of documents, like health, employment, or insurance records, reports, notes, forms, and social documents, have to be dealt with by multiple providers daily. IDP can reduce the need for inefficient data management processes through:

Automating the data extraction process by automatically capturing the essential information from the documents. Thus, it reduces the human factor and enhance performance,
Establishing more accurate data With AI algorithms. IDP ensures that the data captured is accurate and consistent; crucial for patient safety and care quality,
Organising data in a searchable format to allow better data access.
Ensuring compliance with regulations like HIPAA by securely managing sensitive patient data and providing audit trails.

Benefits of IDP in healthcare

The implementation of IDP in healthcare comes with several benefits:

Increased efficiency: By automating routine tasks, healthcare providers can focus more on patient care rather than paperwork,
Cost reduction: IDP reduces the need for manual data entry and paper-based processes, leading to significant cost savings,
Better patient experience: Quick access to patient history and records leads to more informed decision-making and personalised care,
Scalability: As healthcare facilities grow, IDP systems can easily scale to manage increased data volumes without compromising performance.

Challenges in implementing IDP

While IDP offers many advantages, there are challenges to its adoption:

Integration with existing systems: Integrating IDP with current healthcare IT ecosystems can be complex and requires careful planning,
Data privacy concerns: Protecting patient data is paramount, and IDP must adhere to stringent security standards,
Change management: Staff may resist shifting from manual to automated processes, necessitating adequate training and change management strategies.

Future of IDP in healthcare

In the future, IDP is likely to increase its impact in the healthcare field. Given the rise of AI and machine learning, the corresponding systems will become increasingly sophisticated, likely providing predictive analytics and decision support services. This could help improve diagnostic precision and create a more personalised patient treatment plan, eventually leading to better outcomes. In addition, IDP may facilitate data exchange between different healthcare systems.

Conclusion

Intelligent document processing is a typical solution that is bound to become increasingly impactful in healthcare. It may help healthcare professionals deal more effectively with the contemporary challenges of patient data. Although challenges exist, the potential results of improved client care, decreased expenses, and more precise data make IDP an invaluable asset. Thus, it can be concluded that Intelligent Document Processing should be considered one of the healthcare industry’s future solutions in its quest toward digitalisation.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Enhancing healthcare documentation with IDP appeared first on AI News.

Primate Labs launches Geekbench AI benchmarking tool

Ryan Daws — Fri, 16 Aug 2024 09:13:49 +0000

Primate Labs has officially launched Geekbench AI, a benchmarking tool designed specifically for machine learning and AI-centric workloads.

The release of Geekbench AI 1.0 marks the culmination of years of development and collaboration with customers, partners, and the AI engineering community. The benchmark, previously known as Geekbench ML during its preview phase, has been rebranded to align with industry terminology and ensure clarity about its purpose.

Geekbench AI is now available for Windows, macOS, and Linux through the Primate Labs website, as well as on the Google Play Store and Apple App Store for mobile devices.

Primate Labs’ latest benchmarking tool aims to provide a standardised method for measuring and comparing AI capabilities across different platforms and architectures. The benchmark offers a unique approach by providing three overall scores, reflecting the complexity and heterogeneity of AI workloads.

“Measuring performance is, put simply, really hard,” explained Primate Labs. “That’s not because it’s hard to run an arbitrary test, but because it’s hard to determine which tests are the most important for the performance you want to measure – especially across different platforms, and particularly when everyone is doing things in subtly different ways.”

The three-score system accounts for the varied precision levels and hardware optimisations found in modern AI implementations. This multi-dimensional approach allows developers, hardware vendors, and enthusiasts to gain deeper insights into a device’s AI performance across different scenarios.

A notable addition to Geekbench AI is the inclusion of accuracy measurements for each test. This feature acknowledges that AI performance isn’t solely about speed but also about the quality of results. By combining speed and accuracy metrics, Geekbench AI provides a more holistic view of AI capabilities, helping users understand the trade-offs between performance and precision.

Geekbench AI 1.0 introduces support for a wide range of AI frameworks, including OpenVINO on Linux and Windows, and vendor-specific TensorFlow Lite delegates like Samsung ENN, ArmNN, and Qualcomm QNN on Android. This broad framework support ensures that the benchmark reflects the latest tools and methodologies used by AI developers.

The benchmark also utilises more extensive and diverse datasets, which not only enhance the accuracy evaluations but also better represent real-world AI use cases. All workloads in Geekbench AI 1.0 run for a minimum of one second, allowing devices to reach their maximum performance levels during testing while still reflecting the bursty nature of real-world applications.

Primate Labs has published detailed technical descriptions of the workloads and models used in Geekbench AI 1.0, emphasising their commitment to transparency and industry-standard testing methodologies. The benchmark is integrated with the Geekbench Browser, facilitating easy cross-platform comparisons and result sharing.

The company anticipates regular updates to Geekbench AI to keep pace with market changes and emerging AI features. However, Primate Labs believes that Geekbench AI has already reached a level of reliability that makes it suitable for integration into professional workflows, with major tech companies like Samsung and Nvidia already utilising the benchmark.

(Image Credit: Primate Labs)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Primate Labs launches Geekbench AI benchmarking tool appeared first on AI News.

Qwen2-Math: A new era for AI maths whizzes

Ryan Daws — Fri, 09 Aug 2024 12:46:18 +0000

Alibaba Cloud’s Qwen team has unveiled Qwen2-Math, a series of large language models specifically designed to tackle complex mathematical problems.

These new models – built upon the existing Qwen2 foundation – demonstrate remarkable proficiency in solving arithmetic and mathematical challenges, and outperform former industry leaders.

The Qwen team crafted Qwen2-Math using a vast and diverse Mathematics-specific Corpus. This corpus comprises a rich tapestry of high-quality resources, including web texts, books, code, exam questions, and synthetic data generated by Qwen2 itself.

Rigorous evaluation on both English and Chinese mathematical benchmarks – including GSM8K, Math, MMLU-STEM, CMATH, and GaoKao Math – revealed the exceptional capabilities of Qwen2-Math. Notably, the flagship model, Qwen2-Math-72B-Instruct, surpassed the performance of proprietary models such as GPT-4o and Claude 3.5 in various mathematical tasks.

“Qwen2-Math-Instruct achieves the best performance among models of the same size, with RM@8 outperforming Maj@8, particularly in the 1.5B and 7B models,” the Qwen team noted.

This superior performance is attributed to the effective implementation of a math-specific reward model during the development process.

Further showcasing its prowess, Qwen2-Math demonstrated impressive results in challenging mathematical competitions like the American Invitational Mathematics Examination (AIME) 2024 and the American Mathematics Contest (AMC) 2023.

To ensure the model’s integrity and prevent contamination, the Qwen team implemented robust decontamination methods during both the pre-training and post-training phases. This rigorous approach involved removing duplicate samples and identifying overlaps with test sets to maintain the model’s accuracy and reliability.

Looking ahead, the Qwen team plans to expand Qwen2-Math’s capabilities beyond English, with bilingual and multilingual models in the pipeline. This commitment to inclusivity aims to make advanced mathematical problem-solving accessible to a global audience.

“We will continue to enhance our models’ ability to solve complex and challenging mathematical problems,” affirmed the Qwen team.

You can find the Qwen2 models on Hugging Face here.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Qwen2-Math: A new era for AI maths whizzes appeared first on AI News.

IBM Research unveils breakthrough analog AI chip for efficient deep learning

Ryan Daws — Fri, 11 Aug 2023 11:02:50 +0000

IBM Research has unveiled a groundbreaking analog AI chip that demonstrates remarkable efficiency and accuracy in performing complex computations for deep neural networks (DNNs).

This breakthrough, published in a recent paper in Nature Electronics, signifies a significant stride towards achieving high-performance AI computing while substantially conserving energy.

The traditional approach of executing deep neural networks on conventional digital computing architectures poses limitations in terms of performance and energy efficiency. These digital systems entail constant data transfer between memory and processing units, slowing down computations and reducing energy optimisation.

To tackle these challenges, IBM Research has harnessed the principles of analog AI, which emulates the way neural networks function in biological brains. This approach involves storing synaptic weights using nanoscale resistive memory devices, specifically Phase-change memory (PCM).

PCM devices alter their conductance through electrical pulses, enabling a continuum of values for synaptic weights. This analog method mitigates the need for excessive data transfer, as computations are executed directly in the memory—resulting in enhanced efficiency.

The newly introduced chip is a cutting-edge analog AI solution composed of 64 analog in-memory compute cores.

Each core integrates a crossbar array of synaptic unit cells alongside compact analog-to-digital converters, seamlessly transitioning between analog and digital domains. Furthermore, digital processing units within each core manage nonlinear neuronal activation functions and scaling operations. The chip also boasts a global digital processing unit and digital communication pathways for interconnectivity.

The research team demonstrated the chip’s prowess by achieving an accuracy of 92.81 percent on the CIFAR-10 image dataset—an unprecedented level of precision for analog AI chips.

The throughput per area, measured in Giga-operations per second (GOPS) by area, underscored its superior compute efficiency compared to previous in-memory computing chips. This innovative chip’s energy-efficient design coupled with its enhanced performance makes it a milestone achievement in the field of AI hardware.

The analog AI chip’s unique architecture and impressive capabilities lay the foundation for a future where energy-efficient AI computation is accessible across a diverse range of applications.

IBM Research’s breakthrough marks a pivotal moment that will help to catalyse advancements in AI-powered technologies for years to come.

(Image Credit: IBM Research)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The event is co-located with Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post IBM Research unveils breakthrough analog AI chip for efficient deep learning appeared first on AI News.

Azure and NVIDIA deliver next-gen GPU acceleration for AI

Ryan Daws — Wed, 09 Aug 2023 15:47:51 +0000

Microsoft Azure users are now able to harness the latest advancements in NVIDIA’s accelerated computing technology, revolutionising the training and deployment of their generative AI applications.

The integration of Azure ND H100 v5 virtual machines (VMs) with NVIDIA H100 Tensor Core GPUs and Quantum-2 InfiniBand networking promises seamless scaling of generative AI and high-performance computing applications, all at the click of a button.

This cutting-edge collaboration comes at a pivotal moment when developers and researchers are actively exploring the potential of large language models (LLMs) and accelerated computing to unlock novel consumer and business use cases.

NVIDIA’s H100 GPU achieves supercomputing-class performance through an array of architectural innovations. These include fourth-generation Tensor Cores, a new Transformer Engine for enhanced LLM acceleration, and NVLink technology that propels inter-GPU communication to unprecedented speeds of 900GB/sec.

The integration of the NVIDIA Quantum-2 CX7 InfiniBand – boasting 3,200 Gbps cross-node bandwidth – ensures flawless performance across GPUs, even at massive scales. This capability positions the technology on par with the computational capabilities of the world’s most advanced supercomputers.

The newly introduced ND H100 v5 VMs hold immense potential for training and inferring increasingly intricate LLMs and computer vision models. These neural networks power the most complex and compute-intensive generative AI applications, spanning from question answering and code generation to audio, video, image synthesis, and speech recognition.

A standout feature of the ND H100 v5 VMs is their ability to achieve up to a 2x speedup in LLM inference, notably demonstrated by the BLOOM 175B model when compared to previous generation instances. This performance boost underscores their capacity to optimise AI applications further, fueling innovation across industries.

The synergy between NVIDIA H100 Tensor Core GPUs and Microsoft Azure empowers enterprises with unparalleled AI training and inference capabilities. This partnership also streamlines the development and deployment of production AI, bolstered by the integration of the NVIDIA AI Enterprise software suite and Azure Machine Learning for MLOps.

The combined efforts have led to groundbreaking AI performance, as validated by industry-standard MLPerf benchmarks:

The integration of the NVIDIA Omniverse platform with Azure extends the reach of this collaboration further, providing users with everything they need for industrial digitalisation and AI supercomputing.

(Image Credit: Uwe Hoh from Pixabay)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Azure and NVIDIA deliver next-gen GPU acceleration for AI appeared first on AI News.