OpenAI strips warnings from ChatGPT, but its content policy hasn’t changed

February 14, 2025

5 Views

SaveSavedRemoved 0

OpenAI strips warnings from ChatGPT, but its content policy hasn’t changed

OpenAI has removed the ChatGPT orange warning boxes that indicate whether a user may have violated its content policy.

Model behavior product manager Laurentia Romaniuk shared in a post on X that they “we got rid of ‘warnings’ (orange boxes sometimes appended to your prompts).”

Romaniuk also put the word out for “other cases of gratuitous / unexplainable denials [users have] come across,” regarding ChatGPT’s tendency to play it safe with content moderation.

Tweet may have been deleted

Joanne Jang, who leads model behavior, added to this request, asking “has chatgpt ever refused to give you what you want for no good reason? or reasons you disagree with?” This further addresses the issue that ChatGPT would previously stay away from controversial topics, but also flag chats that seemed innocuous, like one Redditor who said their chat was removed for including a swearword in their prompt.

Mashable Light Speed

Tweet may have been deleted

Earlier this week, OpenAI overhauled its Model Spec, which details its approach to how the model safely responds to users. Compared to the much shorter earlier version, the new Model Spec is a huge document, outlining its approach to current controversies like denying a request to share copyrighted content and allowing discussion that supports or criticizes politicians.

ChatGPT has been accused of censorship, with President Trump’s “AI Czar” David Sacks saying in a 2023 All-In podcast episode that ChatGPT “was programmed to be woke.”

However, both the previous and current Model Specs say, “OpenAI believes in intellectual freedom which includes the freedom to have, hear, and discuss ideas.” Yet removing the warnings raised questions about whether this is related to an implicit change in ChatGPT’s responses.

An OpenAI spokesperson said this is not a reflection of the updated Model Spec and the change doesn’t impact the model responses. Instead, it was a decision to update how they communicate its content policies to users. Newer models like o3 are more capable of reasoning through a request and therefore hypothetically better at responding to controversial or sensitive topics instead of defaulting to refuse a query.

The spokesperson also said OpenAI will continue to show the warning sign in certain cases that violate its content policy.

Source