TechForge

October 29, 2025

Helmet as OpenAI is putting more safety controls directly into the hands of AI developers with a new research preview of “safeguard” models.

OpenAI is putting more safety controls directly into the hands of AI developers with a new research preview of “safeguard” models. The new ‘gpt-oss-safeguard’ family of open-weight models is aimed squarely at customising content classification.

The new offering will include two models, gpt-oss-safeguard-120b and a smaller gpt-oss-safeguard-20b. Both are fine-tuned versions of the existing gpt-oss family and will be available under the permissive Apache 2.0 license. This will allow any organisation to freely use, tweak, and deploy the models as they see fit.

The real difference here isn’t just the open license; it’s the method. Rather than relying on a fixed set of rules baked into the model, gpt-oss-safeguard uses its reasoning capabilities to interpret a developer’s own policy at the point of inference. This means AI developers using OpenAI’s new model can set up their own specific safety framework to classify anything from single user prompts to full chat histories. The developer, not the model provider, has the final say on the ruleset and can tailor it to their specific use case.

This approach has a couple of clear advantages:

  1. Transparency: The models use a chain-of-thought process, so a developer can actually look under the bonnet and see the model’s logic for a classification. That’s a huge step up from the typical “black box” classifier.
  1. Agility: Because the safety policy isn’t permanently trained into OpenAI’s new model, developers can iterate and revise their guidelines on the fly without needing a complete retraining cycle. OpenAI, which originally built this system for its internal teams, notes this is a far more flexible way to handle safety than training a traditional classifier to indirectly guess what a policy implies.

Rather than relying on a one-size-fits-all safety layer from a platform holder, developers using open-source AI models can now build and enforce their own specific standards.

While not live as of writing, developers will be able to access OpenAI’s new open-weight AI safety models on the Hugging Face platform.

See also: OpenAI restructures, enters ‘next chapter’ of Microsoft partnership

Banner for AI & Big Data Expo by TechEx events.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security Expo, click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

About the Author

Senior Editor

Ryan Daws is a senior editor at TechForge Media with over a decade of experience in weaving narratives and dissecting complex topics. His articles and interviews with industry leaders have earned him recognition as a key tech influencer from numerous organisations. Under his leadership, publications have been praised by analyst firms for their excellence and performance. Connect with him on X, Mastodon, Bluesky, Threads, and/or LinkedIn.

Related

March 6, 2026

March 6, 2026

March 5, 2026

March 5, 2026

Join our Community

Subscribe now to get all our premium content and latest tech news delivered straight to your inbox

Popular

4679 view(s)
3038 view(s)
2884 view(s)
2789 view(s)

Subscribe

All our premium content and latest tech news delivered straight to your inbox

This field is for validation purposes and should be left unchanged.