Multimodal AI - AI News

From cloud to factory – humanoid robots coming to workplaces

AI News — Fri, 09 Jan 2026 13:06:00 +0000

The partnership announced this week between Microsoft and Hexagon Robotics marks an inflection point in the commercialisation of humanoid, AI-powered robots for industrial environments. The two companies will combine Microsoft’s cloud and AI infrastructure with Hexagon’s expertise in robotics, sensors, and spatial intelligence to advance the deployment of physical AI systems in real-world settings.

At the centre of the collaboration is AEON, Hexagon’s industrial humanoid robot, a device designed to operate autonomously in environments like factories, logistics hubs, engineering plants, and inspection sites.

The partnership will focus on multimodal AI training, imitation learning, real-time data management, and integration with existing industrial systems. Initial target sectors include automotive, aerospace, manufacturing, and logistics, the companies say. It’s in these industries where labour shortages and operational complexity are already constraining financial growth.

The announcement is the sign of a maturing ecosystem: cloud platforms, physical AI, and robotics engineering’s convergence, making humanoid automation commercially viable.

Humanoid robots out of the research lab

While humanoid robots have been the subject of work at research institutions, demonstrated proudly at technology events, the last five years have seen a move to practical deployment in real-world, working environments. The main change has been the combination of improved perception, advances in reinforcement and imitation learning, and the availability of scalable cloud infrastructure.

One of the most visible examples is Agility Robotics’ Digit, a bipedal humanoid robot designed for logistics and warehouse operations. Digit has been piloted in live environments by companies like Amazon, where it performs material-handling tasks including tote movement and last-metre logistics. Such deployments tend to focus on augmenting human workers rather than replacing them, with Digit handling more physically demanding tasks.

Similarly, Tesla’s Optimus programme has moved out of the phase where concept videos were all that existed, and is now undergoing factory trials. Optimus robots are being tested on structured tasks like part handling and equipment transport inside Tesla’s automotive manufacturing facilities. While still limited in scope, these pilots demonstrate the pattern of humanoid-like machines chosen over less anthropomorphic form-factors so they can operate in human-designed and -populated spaces.

Inspection, maintenance, and hazardous environments

Industrial inspection is emerging as one of the earliest commercially viable use cases for humanoid and quasi-humanoid robots. Boston Dynamics’ Atlas, while not yet a general-purpose commercial product, has been used in live industrial trials for inspection and disaster-response environments. It can navigate uneven terrain, climb stairs, and manipulate tools in places considered unsafe for humans.

Toyota Research Institute has deployed humanoid robotics platforms for remote inspection and manipulation tasks in similar settings. Toyota’s systems rely on multimodal perception and human-in-the-loop control, the latter reinforcing an industry trend: early deployments prioritise reliability and traceability, so need human oversight.

Hexagon’s AEON aligns closely with this trend. Its emphasis on sensor fusion and spatial intelligence is relevant for inspection and quality assurance tasks, where precise understanding of physical environments is more valuable than the conversational abilities most associated with everyday use of AIs.

Cloud platforms central to robotics strategy

A defining feature of the Microsoft-Hexagon partnership is the use of cloud infrastructure in the scaling of humanoid robots. Training, updating, and monitoring physical AI systems generates large quantities of data, including video, force feedback from on-device sensors, spatial mapping (such as that derived from LIDAR), and operational telemetry. Managing this data locally has historically been a bottleneck, due to storage and processing constraints.

By using platforms like Azure and Azure IoT Operations, plus real-time intelligence services in the cloud, humanoid robots can be trained fleet-wide, not isolated units. This leads to multiple possibilities in shared learning, improvement by iteration, and greater consistency. For board-level buyers, these IT architecture shifts mean humanoid robots become viable entities that can be treated – in terms of IT requirements – more like enterprise software than machinery.

Labour shortages drive adoption

The demographic trends in manufacturing, logistics, and asset-intensive industries are increasingly unfavourable. Ageing workforces, declining interest in manual roles, and persistent skills shortages create skills gaps that conventional automation cannot fully address – at least, not without rebuilding entire facilities to be more suited to a robotic workforce. Fixed robotic systems excel in repetitive, predictable tasks but struggle in dynamic, human environments.

Humanoid robots occupy a middle ground. Not designed to replace workflows, they can stabilise operations where human availability is uncertain. Case studies show early value in night shifts, periods of peak demand, and tasks deemed too hazardous for humans.

What boards should evaluate before investing

For decision-makers considering investment in next-generation workplace robots, several issues to note have emerged from existing, real-world deployments:

Task specificity matters more than general intelligence, with the more successful pilots focusing on well-defined activities. Data governance and security continue to have to be placed front and centre when robots are put into play, especially so when it’s necessary to connect them to cloud platforms.

At a human level, workforce integration can be more challenging than sourcing, installing, and running the technology itself. Yet human oversight remains essential at this stage in AI maturity, for safety and regulatory acceptance.

A measured but irreversible shift

Humanoid robots won’t replace the human workforce, but an increasing body of evidence from live deployments and prototyping shows such devices are moving into the workplace. As of now, humanoid, AI-powered robots can perform economically-valuable tasks, and integration with existing industrial systems is immensely possible. For boards with the appetite to invest, the question could be when competitors might deploy the technology responsibly and at scale.

(Image source: Source: Hexagon Robotics)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and co-located with other leading technology events. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post From cloud to factory – humanoid robots coming to workplaces appeared first on AI News.

Roblox brings AI into the Studio to speed up game creation

Muhammad Zulhusni — Wed, 17 Dec 2025 10:00:00 +0000

Roblox is often seen as a games platform, but its day-to-day reality looks closer to a production studio. Small teams release new experiences on a rolling basis and then monetise them at scale. That pace creates two persistent problems: time lost to repeatable production work, and friction when moving outputs between tools. Roblox’s 2025 updates point to how AI can reduce both, without drifting away from clear business outcomes.

Roblox keeps AI where the work happens

Rather than pushing creators toward separate AI products, Roblox has embedded AI inside Roblox Studio, the environment where creators already build, test, and iterate. In its September 2025 RDC update, Roblox outlined “AI tools and an Assistant” designed to improve creator productivity, with an emphasis on small teams. Its annual economic impact report adds that Studio features such as Avatar Auto-Setup and Assistant already include “new AI capabilities” to “accelerate content creation”.

The language matters—Roblox frames AI in terms of cycle time and output, not abstract claims about transformation or innovation. That framing makes it easier to judge whether the tools are doing their job.

One of the more practical updates focuses on asset creation. Roblox described an AI capability that goes beyond static generation, allowing creators to produce “fully functional objects” from a prompt. The initial rollout covers selected vehicle and weapons categories, returning interactive assets that can be extended inside Studio.

This addresses a common bottleneck where drafting an idea is rarely the slow part; turning it into something that behaves correctly inside a live system is. By narrowing that gap, Roblox reduces the time spent translating concepts into working components.

The company also highlighted language tools delivered through APIs, including Text-to-Speech, Speech-to-Text, and real-time voice chat translation across multiple languages. These features lower the effort required to localise content and reach broader audiences. Similar tooling plays a role in training and support in other industries.

Roblox treats AI as connective tissue between tools

Roblox also put emphasis on how tools connect to one another. Its RDC post describes integrating the Model Context Protocol (MCP) into Studio’s Assistant, allowing creators to coordinate multi-step work across third-party tools that support MCP. Roblox points to practical examples, such as designing a UI in Figma or generating a skybox elsewhere, then importing the result directly into Studio.

This matters because many AI initiatives slow down at the workflow level. Teams spend time copying outputs, fixing formats, or reworking assets that do not quite fit. Orchestration reduces that overhead by turning AI into a bridge between tools, rather than another destination in the process.

Linking productivity to revenue

Roblox ties these workflow gains directly to economics. In its RDC post, the company reported that creators earned over $1 billion through its Developer Exchange programme over the past year, and it set a goal for 10% of gaming content revenue to flow through its ecosystem. It also announced an increased exchange rate so creators “earn 8.5% more” when converting Robux into cash.

The economic impact report makes the connection explicit. Alongside AI upgrades in Studio, Roblox highlights monetisation tools such as price optimisation and regional pricing. Even outside a marketplace model, the takeaway is clear: when AI productivity is paired with a financial lever, teams are more likely to treat new tooling as part of core operations rather than an experiment.

Roblox uses operational AI to scale safety systems

While creative tools attract attention, operational AI often determines whether growth is sustainable. In November 2025, Roblox published a technical post on its PII Classifier, an AI model used to detect attempts to share personal information in chat. Roblox reports handling an average of 6.1 billion chat messages per day, and says the classifier has been in production since late 2024, with a reported 98% recall on an internal test set at a 1% false positive rate.

This is a quieter form of efficiency. Automation at this level reduces the need for manual review and supports consistent policy enforcement, which helps prevent scale from becoming a liability.

What carries across, and what several patterns stand out:

Put AI where decisions are already made. Roblox focuses on the build-and-review loop, rather than inserting a separate AI step.
Reduce tool friction early. Orchestration matters because it cuts down on context switching and rework.
Tie AI to something measurable. Creation speed is linked to monetisation and payout incentives.
Keep adapting the system. Roblox describes ongoing updates to address new adversarial behaviour in safety models.

Roblox’s tools will not translate directly to every sector. The underlying approach will. AI tends to pay for itself when it shortens the path from intent to usable output, and when that output is clearly connected to real economic value.

(Photo by Oberon Copeland @veryinformed.com)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Roblox brings AI into the Studio to speed up game creation appeared first on AI News.

Baidu ERNIE multimodal AI beats GPT and Gemini in benchmarks

Ryan Daws — Wed, 12 Nov 2025 16:09:44 +0000

Baidu’s latest ERNIE model, a super-efficient multimodal AI, is beating GPT and Gemini on key benchmarks and targets enterprise data often ignored by text-focused models.

For many businesses, valuable insights are locked in engineering schematics, factory-floor video feeds, medical scans, and logistics dashboards. Baidu’s new model, ERNIE-4.5-VL-28B-A3B-Thinking, is designed to fill this gap.

What’s interesting to enterprise architects is not just its multimodal capability, but its architecture. It’s described as a “lightweight” model, activating only three billion parameters during operation. This approach targets the high inference costs that often stall AI-scaling projects. Baidu is betting on efficiency as a path to adoption, training the system as a foundation for “multimodal agents” that can reason and act, not just perceive.

Complex visual data analysis capabilities supported by AI benchmarks

Baidu’s multimodal ERNIE AI model excels at handling dense, non-text data. For example, it can interpret a “Peak Time Reminder” chart to find optimal visiting hours, a task that reflects the resource-scheduling challenges in logistics or retail.

ERNIE 4.5 also shows capability in technical domains, like solving a bridge circuit diagram by applying Ohm’s and Kirchhoff’s laws. For R&D and engineering arms, a future assistant could validate designs or explain complex schematics to new hires.

This capability is supported by Baidu’s benchmarks, which show ERNIE-4.5-VL-28B-A3B-Thinking outperforming competitors like GPT-5-High and Gemini 2.5 Pro on some key tests:

MathVista: ERNIE (82.5) vs Gemini (82.3) and GPT (81.3)

ChartQA: ERNIE (87.1) vs Gemini (76.3) and GPT (78.2)

VLMs Are Blind: ERNIE (77.3) vs Gemini (76.5) and GPT (69.6)

It’s worth noting, of course, that AI benchmarks provide a guide but can be flawed. Always perform internal tests for your needs before deploying any AI model for mission-critical applications.

Baidu shifts from perception to automation with its latest ERNIE AI model

The primary hurdle for enterprise AI is moving from perception (“what is this?”) to automation (“what now?”). ERNIE 4.5 claims to address this by integrating visual grounding with tool use.

Asking the multimodal AI to find all people wearing suits in an image and return their coordinates in JSON format works. The model generates the structured data, a function easily transferable to a production line for visual inspection or to a system auditing site images for safety compliance.

The model also manages external tools and can autonomously zoom in on a photograph to read small text. If it faces an unknown object, it can trigger an image search to identify it. This represents a less passive form of AI that could power an agent to not only flag a data centre error, but also zoom in on the code, search the internal knowledge base, and suggest the fix.

Unlocking business intelligence with multimodal AI

Baidu’s latest ERNIE AI model also targets corporate video archives from training sessions and meetings to security footage. It can extract all on-screen subtitles and map them to their precise timestamps.

It also demonstrates temporal awareness, finding specific scenes (like those “filmed on a bridge”) by analysing visual cues. The clear end-goal is making vast video libraries searchable, allowing an employee to find the exact moment a specific topic was discussed in a two-hour webinar they may have dozed off a couple of times during.

Baidu provides deployment guidance for several paths, including transformers, vLLM, and FastDeploy. However, the hardware requirements are a major barrier. A single-card deployment needs 80GB of GPU memory. This is not a tool for casual experimentation, but for organisations with existing and high-performance AI infrastructure.

For those with the hardware, Baidu’s ERNIEKit toolkit allows fine-tuning on proprietary data; a necessity for most high-value use cases. Baidu is providing its latest ERNIE AI model with an Apache 2.0 licence that permits commercial use, which is essential for adoption.

The market is finally moving toward multimodal AI that can see, read, and act within a specific business context, and the benchmarks suggest it’s doing so with impressive capability. The immediate task is to identify high-value visual reasoning jobs within your own operation and weigh them against the substantial hardware and governance costs.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Baidu ERNIE multimodal AI beats GPT and Gemini in benchmarks appeared first on AI News.

Meta and Oracle choose NVIDIA Spectrum-X for AI data centres

Muhammad Zulhusni — Mon, 13 Oct 2025 15:00:00 +0000

Meta and Oracle are upgrading their AI data centres with NVIDIA’s Spectrum-X Ethernet networking switches — technology built to handle the growing demands of large-scale AI systems. Both companies are adopting Spectrum-X as part of an open networking framework designed to improve AI training efficiency and accelerate deployment across massive compute clusters.

Jensen Huang, NVIDIA’s founder and CEO, said trillion-parameter models are transforming data centres into “giga-scale AI factories,” adding that Spectrum-X acts as the “nervous system” connecting millions of GPUs to train the largest models ever built.

Oracle plans to use Spectrum-X Ethernet with its Vera Rubin architecture to build large-scale AI factories. Mahesh Thiagarajan, Oracle Cloud Infrastructure’s executive vice president, said the new setup will allow the company to connect millions of GPUs more efficiently, helping customers train and deploy new AI models faster.

Meta, meanwhile, is expanding its AI infrastructure by integrating Spectrum Ethernet switches into the Facebook Open Switching System (FBOSS), its in-house platform for managing network switches at scale. According to Gaya Nagarajan, Meta’s vice president of networking engineering, the company’s next-generation network must be open and efficient to support ever-larger AI models and deliver services to billions of users.

Building flexible AI systems

According to Joe DeLaere, who leads NVIDIA’s Accelerated Computing Solution Portfolio for Data Centre, flexibility is key as data centres grow more complex. He explained that NVIDIA’s MGX system offers a modular, building-block design that lets partners combine different CPUs, GPUs, storage, and networking components as needed.

The system also promotes interoperability, allowing organisations to use the same design across multiple generations of hardware. “It offers flexibility, faster time to market, and future readiness,” DeLaere said to the media.

As AI models become larger, power efficiency has become a central challenge for data centres. DeLaere said NVIDIA is working “from chip to grid” to improve energy use and scalability, collaborating closely with power and cooling vendors to maximise performance per watt.

One example is the shift to 800-volt DC power delivery, which reduces heat loss and improves efficiency. The company is also introducing power-smoothing technology to reduce spikes on the electrical grid — an approach that can cut maximum power needs by up to 30 per cent, allowing more compute capacity within the same footprint.

Scaling up, out, and across

NVIDIA’s MGX system also plays a role in how data centres are scaled. Gilad Shainer, the company’s senior vice president of networking, told the media that MGX racks host both compute and switching components, supporting NVLink for scale-up connectivity and Spectrum-X Ethernet for scale-out growth.

He added that MGX can connect multiple AI data centres together as a unified system — what companies like Meta need to support massive distributed AI training operations. Depending on distance, they can link sites through dark fibre or additional MGX-based switches, enabling high-speed connections across regions.

Meta’s AI adoption of Spectrum-X reflects the growing importance of open networking. Shainer said the company will use FBOSS as its network operating system but noted that Spectrum-X supports several others, including Cumulus, SONiC, and Cisco’s NOS through partnerships. This flexibility allows hyperscalers and enterprises to standardise their infrastructure using the systems that best fit their environments.

Expanding the AI ecosystem

NVIDIA sees Spectrum-X as a way to make AI infrastructure more efficient and accessible across different scales. Shainer said the Ethernet platform was designed specifically for AI workloads like training and inference, offering up to 95 percent effective bandwidth and outperforming traditional Ethernet by a wide margin.

He added that NVIDIA’s partnerships with companies such as Cisco, xAI, Meta, and Oracle Cloud Infrastructure are helping to bring Spectrum-X to a broader range of environments — from hyperscalers to enterprises.

Preparing for Vera Rubin and beyond

DeLaere said NVIDIA’s upcoming Vera Rubin architecture is expected to be commercially available in the second half of 2026, with the Rubin CPX product arriving by year’s end. Both will work alongside Spectrum-X networking and MGX systems to support the next generation of AI factories.

He also clarified that Spectrum-X and XGS share the same core hardware but use different algorithms for varying distances — Spectrum-X for inside data centres and XGS for inter–data centre communication. This approach minimises latency and allows multiple sites to operate together as a single large AI supercomputer.

Collaborating across the power chain

To support the 800-volt DC transition, NVIDIA is working with partners from chip level to grid. The company is collaborating with Onsemi and Infineon on power components, with Delta, Flex, and Lite-On at the rack level, and with Schneider Electric and Siemens on data centre designs. A technical white paper detailing this approach will be released at the OCP Summit.

DeLaere described this as a “holistic design from silicon to power delivery,” ensuring all systems work seamlessly together in high-density AI environments that companies like Meta and Oracle operate.

Performance advantages for hyperscalers

Spectrum-X Ethernet was built specifically for distributed computing and AI workloads. Shainer said it offers adaptive routing and telemetry-based congestion control to eliminate network hotspots and deliver stable performance. These features enable higher training and inference speeds while allowing multiple workloads to run simultaneously without interference.

He added that Spectrum-X is the only Ethernet technology proven to scale at extreme levels, helping organisations get the best performance and return on their GPU investments. For hyperscalers such as Meta, that scalability helps manage growing AI training demands and keep infrastructure efficient.

Hardware and software working together

While NVIDIA’s focus is often on hardware, DeLaere said software optimisation is equally important. The company continues to improve performance through co-design — aligning hardware and software development to maximise efficiency for AI systems.

NVIDIA is investing in FP4 kernels, frameworks such as Dynamo and TensorRT-LLM, and algorithms like speculative decoding to improve throughput and AI model performance. These updates, he said, ensure that systems like Blackwell continue to deliver better results over time for hyperscalers such as Meta that rely on consistent AI performance.

Networking for the trillion-parameter era

The Spectrum-X platform — which includes Ethernet switches and SuperNICs — is NVIDIA’s first Ethernet system purpose-built for AI workloads. It’s designed to link millions of GPUs efficiently while maintaining predictable performance across AI data centres.

With congestion-control technology achieving up to 95 per cent data throughput, Spectrum-X marks a major leap over standard Ethernet, which typically reaches only about 60 per cent due to flow collisions. Its XGS technology also supports long-distance AI data centre links, connecting facilities across regions into unified “AI super factories.”

By tying together NVIDIA’s full stack — GPUs, CPUs, NVLink, and software — Spectrum-X provides the consistent performance needed to support trillion-parameter models and the next wave of generative AI workloads.

(Photo by Nvidia)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

The post Meta and Oracle choose NVIDIA Spectrum-X for AI data centres appeared first on AI News.

SoundHound is giving its AI the power of sight

Ryan Daws — Tue, 12 Aug 2025 10:06:54 +0000

SoundHound AI, already a major player in voice assistants, is now giving its technology a pair of eyes.

Imagine driving past a landmark and, without pulling out your phone, asking your car, “What’s that building over there?” and getting an instant answer. That’s what SoundHound AI is building.

With the launch of Vision AI, SoundHound’s new system combines sight with sound to create a much smarter and more natural way to interact with technology. The idea is to mimic how we as humans operate; we don’t just listen to someone, we also see their gestures and what they’re looking at.

By bringing this same contextual understanding to AI, SoundHound hopes to smooth over the clunky and often frustrating experience we have with many of today’s smart devices. The company is targeting real-world applications where this combined sense could make a huge difference, whether that’s in your next car, at the restaurant drive-thru, or a factory floor.

Keyvan Mohajer, CEO of SoundHound AI, said: “At SoundHound, we believe the future of AI isn’t just multimodal—it’s deeply integrated, responsive, and built for real-world impact.

“With Vision AI, we’re extending our leadership in voice and conversational AI to redefine how humans interact with products and services offered and used by businesses.”

So, how does it work? Vision AI takes a live feed from a camera and fuses it with the company’s voice technology, which already excels at understanding natural speech. By processing what it sees and what it hears at the exact same time, the system can grasp the user’s true intent in a way a simple voice assistant never could.

Think of a mechanic wearing smart glasses who can simply look at an engine part and ask for instructions, receiving instant visual and audio guidance without ever putting down their tools. In a shop, a staff member could scan shelves just by looking at them to get a real-time inventory count. For the rest of us, it might mean a drive-thru kiosk that visually confirms our order on screen the moment we say it.

One of the biggest technical problems in creating such a system is ensuring the audio and visual elements are perfectly synchronised. Any lag would shatter the illusion of a natural conversation.

Pranav Singh, VP of Engineering at SoundHound AI, commented: “With Vision AI, we are fusing visual recognition and conversational intelligence into a single, synchronised flow. Every frame, every utterance, every intent is interpreted within the same ecosystem—ensuring faster, more natural user experiences that scale across surfaces from kiosks to embedded devices.

“This is innovation at the intersection of intelligence and execution, delivering AI that sees what you see, hears what you say, and responds in the moment.”

For the businesses adopting this tech, the promise is to provide faster service, fewer mistakes, and happier customers. It’s about removing friction and making technology feel less like a tool you have to operate and more like a partner that helps you get things done.

This new visual capability isn’t the only upgrade SoundHound is rolling out. The company also recently improved the “brain” of its system with a new update, Amelia 7.1. This enhancement makes its AI agents faster, more accurate, and gives businesses more control and transparency over how they work.

By combining sight and sound, SoundHound is aiming to push us closer to a world where interacting with AI feels as easy and intuitive as talking to another person.

(Photo by Christian Lue)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post SoundHound is giving its AI the power of sight appeared first on AI News.

Inside Tim Cook’s push to get Apple back in the AI race

Muhammad Zulhusni — Wed, 06 Aug 2025 09:21:51 +0000

While other tech companies push out AI tools at full speed, Apple is taking its time. Its Apple Intelligence features – shown off at WWDC – won’t reach most users until at least 2025 or even 2026. Some see this as Apple falling behind, but the company’s track record suggests it prefers to launch only when products are ready.

In contrast, competitors like Microsoft, OpenAI, and Google have already shipped AI features widely – often with bugs and unreliable results, and usually whether or not users ask for them. AI assistants today still struggle with accuracy, consistency, and usefulness in many tasks.

Apple seems to be watching from the sidelines, waiting for the tech to mature. Instead of flooding iOS with half-working tools, it’s holding back. That strategy may pay off if users lose patience with AI that overpromises and underdelivers.

Apple has done this before – launching smartwatches and tablets late, but with stronger products. And since it already owns the hardware and software, and controls its own app store, it can afford to wait.

If current AI tools don’t improve soon, Apple’s slower, more cautious rollout might look less like hesitation and more like smart planning.

That measured approach doesn’t mean Apple is sitting still. Behind the scenes, the company is ramping up investment, hiring, and internal coordination to prepare for an AI shift. That strategy was on full display during a recent all-hands meeting at Apple’s headquarters, where CEO Tim Cook rallied employees and laid out the company’s AI ambitions.

Apple is getting serious about artificial intelligence, and Cook wants everyone at the company on board. As reported by Bloomberg, during a rare all-company gathering at its Cupertino HQ, he spoke directly to employees about what’s next. His message was clear: Apple has to win in AI – and now is the time to make that happen.

Cook called AI a once-in-a-generation shift, comparing its impact to that of the internet, smartphones, and cloud computing. “Apple must do this. Apple will do this. This is sort of ours to grab,” he said, according to people who were there. He promised Apple would spend what it takes to compete.

The company has been slower than others to roll out AI tools. Apple Intelligence – its main AI offering – was introduced long after companies like OpenAI, Google, and Microsoft launched its own products. And even when Apple finally announced its plans, the reaction was underwhelming.

But Cook pointed out that Apple has often shown up late to new technology – only to redefine it. “There was a PC before the Mac; there was a smartphone before the iPhone,” he reminded employees. “There were many tablets before the iPad.” Apple didn’t invent those categories, he said, it just made them work better.

Building the future of Siri

Much of the company’s current AI work centres on Siri, its voice assistant. Apple had originally planned a major overhaul as part of Apple Intelligence, adding features powered by large language models. But that rollout was delayed, leading to internal shakeups and a rethink of the entire system.

Craig Federighi, Apple’s software chief, told employees that trying to merge old and new versions of Siri didn’t work. The team tried to keep the original system for basic tasks like setting timers, while adding generative AI features for more complex requests. But that hybrid setup didn’t meet Apple’s standards. “We realised that approach wasn’t going to get us to Apple quality,” he said.

Now, the team is rebuilding Siri from the ground up. A completely new version is in the works, expected as early as spring 2026. Federighi said the results so far have been strong and could lead to more improvements than originally planned. “There is no project people are taking more seriously,” he told staff.

A key figure behind this new direction is Mike Rockwell, the executive who led development on Apple’s Vision Pro headset. Rockwell and his software team are now leading Siri’s redesign. Federighi said they’ve “supercharged” the work and brought a new level of focus.

Investing in AI talent and tools

Apple is also expanding its AI team quickly. Cook said the company hired 12,000 people in the past year, with 40% of them joining research and development, with many of those hires are focused on AI.

Part of the work involves hardware. Apple is building new chips specifically designed for AI, including a more powerful server chip known internally as “Baltra.” The company is also opening an AI server farm in Houston to support future projects.

Beyond Siri, Apple is quietly building what could become a major AI tool. According to Bloomberg‘s Mark Gurman, Apple has formed a team called “Answers, Knowledge, and Information” (AKI). The group’s job is to create search that works more like ChatGPT – giving direct answers rather than just showing links.

The AKI team is led by Robby Walker, who reports to AI chief John Giannandrea, and Apple has already started hiring engineers for the group. While details are still limited, the project appears to include backend systems, search algorithms, and potentially even a standalone app.

A push to move faster

Cook also encouraged employees to start using AI more in their work. “All of us are using AI in a significant way already, and we must use it as a company as well,” he said. He told employees to bring ideas to their managers and find ways to get AI tools into products faster.

The sense of urgency was echoed during Apple’s recent earnings call. The company posted strong results, with nearly 10% growth in the June quarter – enough to ease concerns about slowing iPhone sales and weak results from the Chinese market. Cook told investors Apple would “significantly” increase its spending on AI.

Yet challenges remain. Apple expects to face a $1.1 billion hit from tariffs this quarter and continues to deal with antitrust pressures in the US and Europe, where regulators are watching closely to see how the company runs its App Store and handles user data.

Cook acknowledged these issues at the staff meeting, saying Apple would continue pushing regulators to adopt rules that don’t hurt privacy or user experience. “We need to continue to push on the intention of the regulation,” he said, “instead of these things that destroy the user experience and user privacy and security.”

New stores, new markets

Beyond AI, Cook touched on Apple’s retail strategy. The company plans to open new stores in emerging markets, including India, the United Arab Emirates, and China. A store in Saudi Arabia is also on the way. Apple is also putting more focus on its online store.

“We need to be in more countries,” Cook said, adding that most of Apple’s future growth will come from new markets. That doesn’t mean existing regions will be ignored, but the company sees more opportunity in expanding its global footprint.

What’s next for Apple products

While Cook didn’t reveal any product details, he said, “I have never felt so much excitement and so much energy before as right now.”

Reports suggest Apple is working on several new devices, including a foldable iPhone, new smart glasses, updated home devices, and robotics. A major iPhone redesign is also rumoured for its 20th anniversary next year.

Cook didn’t confirm any of this, but he hinted at big things ahead. “The product pipeline, which I can’t talk about: It’s amazing, guys. It’s amazing,” he said. “Some of it you’ll see soon, some of it will come later, but there’s a lot to see.”

Cautious but confident

Apple’s cautious approach to AI may have slowed it down, but internally, the company seems to believe that slow and steady might win the race. Cook’s message to employees was clear: Apple can still define what useful, responsible AI looks like – and it’s all hands on deck to get there.

(Photo by: Apple via YouTube)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Inside Tim Cook’s push to get Apple back in the AI race appeared first on AI News.

Mistral AI gives Le Chat voice recognition and deep research tools

Ryan Daws — Thu, 17 Jul 2025 15:50:40 +0000

Mistral AI has updated Le Chat with voice recognition, deep research tools, and other features to make the chatbot a more helpful assistant.

The company believes that the best AI assistants should help you dive deeper into your thoughts and maintain the flow of conversation. As Mistral AI put it, chatbots are at their best when they “let you go deeper in your thinking, keep your conversation flowing, and maintain contextual continuity.”

A standout feature, albeit somewhat playing catch-up with rivals, is the ‘Deep Research’ mode. Think of it as turning Le Chat into your personal research assistant.

When you ask a complex question, the Deep Research tool breaks it down, finds credible sources, and then builds a structured report with references, making it easy to follow. Mistral designed it to feel like you’re working with a highly organised partner, helping you tackle everything from market trends to scientific topics.

If you prefer talking over typing, the new ‘Vocal’ mode is for you.

Powered by Mistral AI’s powerful new voice model called Voxtral, the Vocal mode allows for natural, low-latency conversations—meaning you can talk to Le Chat without awkward pauses. Mistral says it’s perfect for brainstorming ideas while on a walk, getting quick answers when your hands are full, or transcribing a meeting.

For those really complex Le Chat questions, ‘Think’ mode taps into Mistral AI’s reasoning model, Magistral, to provide clear and thoughtful answers.

One of the most impressive capabilities of Think mode is native multilingual ability. You can draft a proposal in Spanish, explore a legal concept in Japanese, or just think through an idea in whatever language feels most comfortable. Le Chat can even switch between languages mid-sentence.

To help you stay organised, the new ‘Projects’ feature lets you group related chats into focused folders. Each project remembers your settings and keeps all your conversations, uploaded files, and ideas in one tidy space. It could become the perfect area to manage everything from planning a house move to tracking a long-term work project.

Finally, in a partnership between Mistral AI and Black Forest Labs, Le Chat now includes advanced image editing. This means you can create an image and then fine-tune it with simple commands like “remove the object” or “place me in another city”.

Unlike with typical text-to-image tools, you can create and then edit your images with simple prompts like “remove the object” or “place me in another city” and the model transforms the scene while preserving characters and detail. It’s ideal for making consistent edits across a… pic.twitter.com/JmveDlQCzv
— Sophia Yang, Ph.D. (@sophiamyang) July 17, 2025

All these new features are available today in Le Chat on the web or by downloading the mobile app.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Mistral AI gives Le Chat voice recognition and deep research tools appeared first on AI News.

Details leak of Jony Ive’s ambitious OpenAI device

Ryan Daws — Thu, 22 May 2025 16:35:41 +0000

After what felt like an age of tech industry tea-leaf reading, OpenAI has officially snapped up “io,” the much-buzzed-about startup building an AI device from former Apple design guru Jony Ive and OpenAI’s chief, Sam Altman. The price tag? $6.5 billion.

OpenAI put out a video this week talking about the Ive and Altman venture in a general sort of way, but now, a few more tidbits about what they’re actually cooking have slipped out.

And what are they planning with all that cash and brainpower? Well, the eagle-eyed folks at The Washington Post spotted an internal chat between Sam Altman and OpenAI staff where he set a target of shipping 100 million AI “companions.”

Altman allegedly even told his team the OpenAI device is “the chance to do the biggest thing we’ve ever done as a company here.”

To be clear, Altman has set that 100 million number as an eventual target. “We’re not going to ship 100 million devices literally on day one,” he said. But then, in a flex that’s pure Silicon Valley, he added they’d hit that 100 million mark “faster than any company has ever shipped 100 million of something new before.”

So, what is this mysterious “companion”? The gadget is designed to be entirely aware of a user’s surroundings, and even their “life.” While they’ve mostly talked about a single device, Altman did let slip it might be more of a “family of devices.”

Jony Ive, as expected, dubbed it “a new design movement.” You can almost hear the minimalist manifesto being drafted.

Why the full-blown acquisition, though? Weren’t they just going to partner up? Originally, yes. The plan was for Ive’s startup to cook up the hardware and sell it, with OpenAI delivering the brains. But it seems the vision got bigger. This isn’t just another accessory, you see.

Altman stressed the device will be a “central facet of using OpenAI.” He even said, “We both got excited about the idea that, if you subscribed to ChatGPT, we should just mail you new computers, and you should use those.”

Frankly, they reckon our current tech – our trusty laptops, the websites we browse – just isn’t up to snuff for the kind of AI experiences they’re dreaming of. Altman was pretty blunt, saying current use of AI “is not the sci-fi dream of what AI could do to enable you in all the ways that I think the models are capable of.”

So, we know it’s not a smartphone. Altman’s also put the kibosh on it being a pair of glasses. And Jony Ive, well, he’s apparently not rushing to make another wearable, which makes sense given his design ethos.

The good news for the impatient among us (i.e., everyone in tech) is that this isn’t just vapourware. Ive’s team has an actual prototype. Altman’s even taken one home to “live with it”. As for when we might get our hands on one? Altman’s reportedly aiming for a late 2026 release.

Naturally, OpenAI is keeping the actual device under wraps, but you can always count on supply chain whispers for a few clues. The ever-reliable (well, usually!) Apple supply chain analyst Ming-Chi Kuo has thrown a few alleged design details into the ring via social media.

Kuo reckons it’ll be “slightly larger” than the Humane AI Pin, but that it will look “as compact and elegant as an iPod Shuffle.” And yes, like the Shuffle, Kuo says no screen.

According to Kuo, the device will chat with your phone and computer instead, using good old-fashioned microphones for your voice and cameras to see what’s going on around you. Interestingly, he suggests it’ll be worn around the neck, necklace-style, rather than clipped on like the AI Pin.

Kuo’s crystal ball points to mass production in 2027, but he wisely adds a pinch of salt, noting the final look and feel could still change.

So, the billion-dollar (well, £5.1 billion) question remains: will this OpenAI device be the next big thing, the gamechanger we’ve been waiting for? Or will it be another noble-but-failed attempt to break free from the smartphone’s iron grip, joining the likes of the AI Pin in the ‘great ideas that didn’t quite make it’ pile?

Altman, for one, is brimming with confidence. Having lived with the prototype, he’s gone on record saying he believes it will be “the coolest piece of technology that the world will have ever seen.”

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Details leak of Jony Ive’s ambitious OpenAI device appeared first on AI News.

Deepgram Nova-3 Medical: AI speech model cuts healthcare transcription errors

Ryan Daws — Tue, 04 Mar 2025 13:25:55 +0000

Deepgram has unveiled Nova-3 Medical, an AI speech-to-text (STT) model tailored for transcription in the demanding environment of healthcare.

Designed to integrate seamlessly with existing clinical workflows, Nova-3 Medical aims to address the growing need for accurate and efficient transcription in the UK’s public NHS and private healthcare landscape.

As electronic health records (EHRs), telemedicine, and digital health platforms become increasingly prevalent, the demand for reliable AI-powered transcription has never been higher. However, traditional speech-to-text models often struggle with the complex and specialised vocabulary used in clinical settings, leading to errors and “hallucinations” that can compromise patient care.

Deepgram’s Nova-3 Medical is engineered to overcome these challenges. The model leverages advanced machine learning and specialised medical vocabulary training to accurately capture medical terms, acronyms, and clinical jargon—even in challenging audio conditions. This is particularly crucial in environments where healthcare professionals may move away from recording devices.

“Nova‑3 Medical represents a significant leap forward in our commitment to transforming clinical documentation through AI,” said Scott Stephenson, CEO of Deepgram. “By addressing the nuances of clinical language and offering unprecedented customisation, we are empowering developers to build products that improve patient care and operational efficiency.”

One of the key features of the model is its ability to deliver structured transcriptions that integrate seamlessly with clinical workflows and EHR systems, ensuring vital patient data is accurately organised and readily accessible. The model also offers flexible, self-service customisation, including Keyterm Prompting for up to 100 key terms, allowing developers to tailor the solution to the unique needs of various medical specialties.

Versatile deployment options – including on-premises and Virtual Private Cloud (VPC) configurations – ensure enterprise-grade security and HIPAA compliance, which is crucial for meeting UK data protection regulations.

“Speech-to-text for enterprise use cases is not trivial, and there is a fundamental difference between voice AI platforms designed for enterprise use cases vs entertainment use cases,” said Kevin Fredrick, Managing Partner at OneReach.ai. “Deepgram’s Nova-3 model and Nova-3-Medical model, are leading voice AI offerings, including TTS, in terms of the accuracy, latency, efficiency, and scalability required for enterprise use cases.”

Benchmarking Nova-3 Medical: Accuracy, speed, and efficiency

Deepgram has conducted benchmarking to demonstrate the performance of Nova-3 Medical. The model claims to deliver industry-leading transcription accuracy, optimising both overall word recognition and critical medical term accuracy.

Word Error Rate (WER): With a median WER of 3.45%, Nova-3 Medical outperforms competitors, achieving a 63.6% reduction in errors compared to the next best competitor. This enhanced precision minimises manual corrections and streamlines workflows.
Keyword Error Rate (KER): Crucially, Nova-3 Medical achieves a KER of 6.79%, marking a 40.35% reduction in errors compared to the next best competitor. This ensures that critical medical terms – such as drug names and conditions – are accurately transcribed, reducing the risk of miscommunication and patient safety issues.

In addition to accuracy, Nova-3 Medical excels in real-time applications. The model transcribes speech 5-40x faster than many alternative speech recognition vendors, making it ideal for telemedicine and digital health platforms. Its scalable architecture ensures high performance even as transcription volumes increase.

Furthermore, Nova-3 Medical is designed to be cost-effective. Starting at $0.0077 per minute of streaming audio – which Deepgram claims is more than twice as affordable as leading cloud providers – it allows healthcare tech companies to reinvest in innovation and accelerate product development.

Deepgram’s Nova-3 Medical aims to empower developers to build transformative medical transcription applications, driving exceptional outcomes across healthcare.

(Photo by Alexander Sinn)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Deepgram Nova-3 Medical: AI speech model cuts healthcare transcription errors appeared first on AI News.

Top seven Voice of Customer (VoC) tools for 2025

Or Hillel — Mon, 03 Mar 2025 09:32:11 +0000

One of the powerful methods for enhancing customer experiences and building lasting relationships is through Voice of Customer (VoC) tools. These tools allow businesses to gather insights directly from their customers, helping them to improve services, products, and overall customer satisfaction.

What are voice of customer (VoC) tools?

VoC tools are specialised software applications designed to collect, analyse, and interpret customer feedback. Feedback can come from various sources, including surveys, social media, direct customer interactions, and product reviews. The primary goal of the tools is to build a comprehensive understanding of customer sentiment, pain points, and preferences.

VoC tools let organisations gather qualitative and quantitative data, translating the voice of their customers into actionable insights. By implementing these tools, businesses can achieve a deeper understanding of their customers, leading to informed decision-making and ultimately, enhanced customer loyalty.

Top 7 Voice of Customer (VoC) tools for 2025

Here are the top seven VoC tools to consider in 2025, each offering unique features and functions to help you capture the voice of your customers effectively:

1. Revuze

Revuze is an AI-driven VoC tool that focuses on extracting actionable insights from customer feedback, reviews, and surveys.

Key features:

Natural language processing to analyse open-ended responses.

Comprehensive reporting dashboards that highlight key themes.

The ability to benchmark against competitors.

Benefits: Revuze empowers businesses to turn large amounts of feedback into strategic insights, enhancing decision-making and customer engagement.

2. Satisfactory

Satisfactory is a user-friendly VoC tool that emphasises customer feedback collection through satisfaction surveys and interactive forms.

Key features:

Simple survey creation with customisable templates.

Live feedback tracking and reporting.

Integration with popular CRM systems like Salesforce.

Benefits: Satisfactory helps businesses quickly gather customer feedback, allowing for immediate action to improve customer satisfaction and experience.

3. GetFeedback

GetFeedback offers a streamlined platform for creating surveys and collecting customer insights, designed for usability across various industries.

Key features:

Easy drag-and-drop survey builder.

Real-time feedback collection via multiple channels.

Integration capabilities with other tools like Salesforce and HubSpot.

Benefits: GeTFEEDBACK provides actionable insights while ensuring an engaging experience for customers participating in surveys.

4. Chattermill

Chattermill focuses on analysing customer feedback through sophisticated AI and machine learning algorithms, turning unstructured data into actionable insights.

Key features:

Customer sentiment analysis across multiple data sources.

Automated reporting tools and dashboards.

Customisable alerts for key metrics and issues.

Benefits: Chattermill enables businesses to react quickly to customer feedback, enhancing their responsiveness and improving overall service quality.

5. Skeepers

Skeepers is designed for brands looking to amplify the customer voice by combining feedback gathering and brand advocacy functions.

Key features:

Comprehensive review management system.

Real-time customer jury feedback for products.

Customer advocacy programme integration.

Benefits: Skeepers helps brands transform customer insights into powerful endorsements, boosting brand reputation and fostering trust.

6. Medallia

Medallia is an established leader in the VoC space, providing an extensive platform for capturing feedback from various touchpoints throughout the customer journey.

Key features:

Robust analytics capabilities and AI-driven insights.

Multi-channel feedback collection, including mobile, web, and in-store.

Integration with existing systems for data flow.

Benefits: Medallia’s comprehensive suite offers valuable tools for organisations aiming to transform customer feedback into strategic opportunities.

7. InMoment

InMoment combines customer feedback across all channels, providing organisations with insights to enhance customer experience consistently.

Key features:

AI-powered analytics for deep insights and trends.

Multi-channel capabilities for collecting feedback.

Advanced reporting and visualisation tools.

Benefits: With InMoment, businesses can create a holistic view of the customer experience, driving improvements across the organisation.

Benefits of using VoC tools

Enhanced customer understanding: By capturing and analysing customer feedback, businesses gain insights into what customers truly want, their pain points, and overall satisfaction levels.

Improvement of products and services: VoC tools help organisations identify specific areas where products or services can be improved based on customer feedback, leading to increased satisfaction and loyalty.

Informed decision making: With access to real-time customer insights, organisations can make data-driven decisions, ensuring that strategies align with customer preferences.

Increased customer loyalty: When customers feel heard and valued, they are more likely to remain loyal to a brand, leading to repeat business and long-term growth.

Competitive advantage: Organisations that effectively use customer feedback can stay ahead of competitors by quickly adapting to market demands and trends.

Proactive issue resolution: VoC tools enable businesses to identify customer complaints early, allowing them to address issues proactively and improve overall customer satisfaction.

Enhanced employee engagement: A deep understanding of customer needs can help employees deliver better service, enhancing their engagement and job satisfaction.

How to choose VoC tools

Choosing the right VoC tool involves several considerations:

Define your goals: Before researching tools, clearly define what you want to achieve with VoC. Whether it’s improving product features, enhancing customer service, or understanding market trends, outlining your goals will help narrow your choices.

Assess your budget: VoC tools come with various pricing models. Determine your budget and evaluate the tools that provide the best value for your investment.

Evaluate features: Based on your goals, assess the features of each tool. Prioritise the features that align with your needs, like sentiment analysis, real-time reporting, or integration capabilities.

Check integration options: Ensure that the chosen VoC tool can easily integrate with your existing systems. Integration can save time and enhance the overall efficiency of data utilisation.

Look for scalability: As your business grows, your VoC needs may change. Choose a tool that can scale with your business and adapt to evolving customer insight demands.

Request demos and trials: Take advantage of free trials or request demos to see how the tools function in real-time. The experience can provide valuable information about usability and effectiveness.

Read reviews and case studies: Researching customer reviews, testimonials, and case studies can give you insights into how well the tool performs and its impact on businesses similar to yours.

The post Top seven Voice of Customer (VoC) tools for 2025 appeared first on AI News.

Western drivers remain sceptical of in-vehicle AI

Ryan Daws — Tue, 05 Nov 2024 12:58:15 +0000

A global study has unveiled a stark contrast in attitudes towards embracing in-vehicle AI between Eastern and Western markets, with European drivers particularly reluctant.

The research – conducted by MHP – surveyed 4,700 car drivers across China, the US, Germany, the UK, Italy, Sweden, and Poland, revealing significant geographical disparities in AI acceptance and understanding.

According to the study, while AI is becoming integral to modern vehicles, European consumers remain hesitant about its implementation and value proposition.

Regional disparities

The study found that 48 percent of Chinese respondents view in-car AI predominantly as an opportunity, while merely 23 percent of European respondents share this optimistic outlook. In Europe, 39 percent believe AI’s opportunities and risks are broadly balanced, while 24 percent take a negative stance, suggesting the risks outweigh potential benefits.

Understanding of AI technology also varies significantly by region. While over 80 percent of Chinese respondents claim to understand AI’s use in cars, this figure drops to just 54 percent among European drivers, highlighting a notable knowledge gap.

Marcus Willand, Partner at MHP and one of the study’s authors, notes: “The figures show that the prospect of greater safety and comfort due to AI can motivate purchasing decisions. However, the European respondents in particular are often hesitant and price-sensitive.”

The willingness to pay for AI features shows an equally stark divide. Just 23 percent of European drivers expressed willingness to pay for AI functions, compared to 39 percent of Chinese drivers. The study suggests that most users now expect AI features to be standard rather than optional extras.

Dr Nils Schaupensteiner, Associated Partner at MHP and study co-author, said: “Automotive companies need to create innovations with clear added value and develop both direct and indirect monetisation of their AI offerings, for example through data-based business models and improved services.”

In-vehicle AI opportunities

Despite these challenges, traditional automotive manufacturers maintain a trust advantage over tech giants. The study reveals that 64 percent of customers trust established car manufacturers with AI implementation, compared to 50 percent for technology firms like Apple, Google, and Microsoft.

The research identified several key areas where AI could provide significant value across the automotive industry’s value chain, including pattern recognition for quality management, enhanced data management capabilities, AI-driven decision-making systems, and improved customer service through AI-powered communication tools.

“It is worth OEMs and suppliers considering the opportunities offered by the new technology along their entire value chain,” explains Augustin Friedel, Senior Manager and study co-author. “However, the possible uses are diverse and implementation is quite complex.”

The study reveals that while up to 79 percent of respondents express interest in AI-powered features such as driver assistance systems, intelligent route planning, and predictive maintenance, manufacturers face significant challenges in monetising these capabilities, particularly in the European market.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Western drivers remain sceptical of in-vehicle AI appeared first on AI News.

How to use AI-driven speech analytics in contact centres

Alex Gurianov — Thu, 01 Aug 2024 12:52:21 +0000

Speech analytics driven by AI is speech recognition software that works using natural language processing and machine learning technologies. With speech analytics in call centres, you can convert live speech into text. After that, the program evaluates this text to reveal details about the needs, preferences, and sentiment of the customer.

In contact centres, speech analytics tools helps:

Analyse voice recordings.
Provide feedback for agents.
Improve customer experience.
Increase sales.

How does speech analytics driven by AI differ from the traditional one? What benefits can contact centres and businesses receive from it? Find the answers in this article.

How does AI-driven speech analytics differ from traditional?

They differ in several key aspects:

Key components of AI-driven speech analytics

Here is a list of common technologies driven by artificial intelligence. They are being used to optimise and improve the performance of contact centres and the applications they run:

Artificial intelligence is a branch of computer technology that develops computer programs to solve complex problems by simulating behavior associated with the behaviour of intelligent beings. AI is able to reason, learn, solve issues, and self-correct.

Machine learning is a subsection of AI that teaches computers through experience rather than additional programming. It is a method of data analysis that, without the need for programming, finds patterns in data and forecasts future events using statistical algorithms.

Natural language processing allows a computer to understand spoken or written language. It can analyse syntax and semantics. In determining meaning and developing suitable answers, this is helpful.

For example, it processes verbal commands given to intelligent virtual operators, virtual assistants that staff work with, or voice menus. Sentiment analysis is another application for this technology. More advanced natural language processing can “learn” to take into account context and read sarcasm, humor, and a variety of different human emotions.

A part of natural language processing called natural language understanding enables a computer to comprehend written or spoken language. Grammatical structure, syntax, and semantics of a sentence can all be examined using it. This helps in deciphering meaning and creating suitable answers.

Predictive analytics uses machine learning, data mining, and statistical analysis techniques to analyse data and identify relationships, patterns, and trends. One can create a predictive model using such data. It forecasts the possibility of a given thing happening, the tendency to do something, and their possible consequences.

How does speech analytics work in contact centres?

Software for speech analytics gathers and examines data from conversations with customers. Transcripts of phone conversations, dashboards, and reports can all be created using the gathered data.

Agent productivity, customer satisfaction, call volume, and other metrics are all shown in real time to contact centre management through dashboards. Call transcripts are recordings of conversations in text format used for training and quality control of service.

Speech analysis is most often carried out in the following stages:

#1 Interaction recording

A recording of a conversation that needs to be analysed.

#2 Separating the audio tracks of interlocutors

It enables you to more clearly pinpoint issues. For example, if the paths intersect in a conversation between a manager and a client, one interlocutor interrupts the other.

#3 Converting speech to text

This step helps to obtain a text version of the conversation that will be used for subsequent research.

#4 Text transcript

Different text processing techniques are applied to the resultant text to examine it. These include of finding tags and themes, marking words and phrases, and assessing the tone of the text. The program also processes terms, dialogues, and discussion.

#5 Data classification

By terms, topic, tone of emotion, or other parameters.

#6 Data visualisation

By charts, graphs, heat maps, and other visuals. The program will clearly show the results achieved.

#7 Data analytics

During this phase, judgments are made, trends are found, important discoveries are highlighted, and data is interpreted.

The system allows you to record calls and create detailed, complete reports, which will allow you to identify errors in work and find additional points of growth. This information will help develop the project and increase the average bill with the right choice of promotion tools and budget savings.

How can AI-driven speech analytics help businesses?

Depending on the company size, industry, size of the contact centre, and other factors, different benefits of speech analytics will come to the fore. The universal advantages are the following:

Increasing the number of verified calls

Quality control teams in call centres check an average of two to four operator calls per month. Businesses may quickly validate up to 100% of calls with speech analytics.

KPI fulfilment tracking

Various interaction metrics can be analysed with the use of speech analytics:

Request escalation rates
Out-of-script behaviour
Customer satisfaction
Average call handling time, etc.

Speech analytics tools are able to pinpoint the areas in which agents’ quality scores are lagging. Following that, it offers useful data to boost productivity.

Instant feedback

Supervisors may provide agents individualised feedback more quickly with faster analysis and 100% call coverage. Many contact centres have begun implementing AI assistants to give agents real-time suggestions.

Improved operational efficiency

Speech analytics reduces the time for verification processes. Contact centres can handle large call volumes and enhance operational efficiency with its help.

Large-scale customer self-service capabilities for common queries are provided by speech-to-text and text-to-speech voice assistants. Resources for agents to handle more complicated scenarios are freed up.

Personalised learning

Programs for individualised agent training can be developed by managers and workforce development teams. Because each agent’s call performance and attributes are advanced assessed, it becomes feasible.

Higher customer service quality

Speech analytics offers thorough insight into the requirements of the consumer. Teams can find elements of a satisfying customer experience by using sentiment analysis. Or indicators of a negative customer experience to influence the customer experience and lifecycle.

Problem identification and management

Words and phrases used in consumer interactions can be found via speech analytics. Problem-call information can be instantly sent to supervisors by email or instant messenger. Managers are able to address challenging issues in a timely manner because of notifications. After that, they use reports and dashboards to evaluate the effectiveness of their decisions.

Customer sentiment analysis

Speech analytics can determine a speaker’s emotions at a given moment by considering speech characteristics such as voice volume and pitch. Contact centres can use this information to determine a customer’s general opinion of the business.

What difficulties could you expect when using AI-based speech analytics?

Data privacy and security

Contact centres handle a large amount of personal and financial information. There is a risk of data breaches, unauthorised access, and misuse of customer information, which can lead to regulatory penalties and a loss of customer trust.

How to address:

Contact centres need to put strong data security procedures in place. These are the following:

Data encryption
Strict access controls
Regular security audits, etc.

It helps identify and address vulnerabilities. Also, you can employ solutions with built-in security features.

Cost of implementation

AI-based voice analytics implementation can need a large financial outlay. Such costs include the following:

Purchasing software
Integrating new systems with existing infrastructure
Training staff
Ongoing maintenance and support

How to address:

Contact centres should start with an ROI analysis. They ought to project possible cost reductions as well as increased income. Phased implementing modifications can assist in distributing costs. It lessens the financial load in the short term. You can also implement cloud-based solutions—it lowers up-front expenses because these are usually pay-as-you-go.

Technological complexity

Deploying advanced AI technologies and their integration with existing systems can be technically demanding and require specialised knowledge.

How to address:

Implementation complexity can be decreased by collaborating with seasoned suppliers that have a solid track record. These vendors can provide end-to-end services, including integration, training, and ongoing support.

The bottom line

Statistics show that mundane duties take up almost half of a contact centre agent’s working hours. The introduction of modern speech analytics services significantly optimises processes and allows you to obtain analytical data. Based on this data, you can develop a strategy for the further development of the company and improve relationships with customers, forming their loyalty.

The post How to use AI-driven speech analytics in contact centres appeared first on AI News.