Yesterday OpenAI announced GPT-5. Altman has claimed that it has reached PhD level reasoning.
It turned up in my account this morning, so I thought I’d have a go. Interested as I am in Project 2025 I asked it a general question:
A disappointing response. I doubt it’s censoring results; rather my guess is that it’s saving money by not doing any kind of search.
For comparison, here is the response from a few months ago under version 4:
Now maybe it’s not so bad and I can coax it into shape. This isn’t a failure of reasoning.
However the problems seem to be deeper than that. Take a look at this exchange where the user attempts to get it to count the number of “b”s in “blueberry”.
Hardly the PhD level response we should expect. Although maybe I should have tried this approach in my viva.
Overall this seems like a downgrade to me. Or at least not the step forward it claims to be!
I use other models like Gemini and Claude for integrations so I’m considering unsubscribing until they get their house in order. Assuming they manage that.