GPT-4.5: Next big leap or overpriced mess?

Investing.com -- It’s been 9 days since OpenAI launched a research preview of GPT-4.5, offering limited access to Pro subscribers. Since then, a rollout has begun to give Plus subscribers the ability to use it.
OpenAI CEO Sam Altman initially branded the model as feeling like “talking to a thoughtful person.” But warned it wouldn’t outperform benchmarks, as it isn’t a reasoning model.
The model has been quite controversial so far, with some heralding it as the next evolution in AI and others dissatisfied with its high price for disappointing performance.
So far, GPT-4.5 has proven more accurate than GPT-4, with a Simple QA accuracy of 62.5% and a hallucination rate of 37.1%. It also exhibits improved naturalness and intent understanding and unexpectedly excels at persuasion.
A pro subscription is $200 a month to access the service, while a plus subscription is $20. Additionally, one million input tokens cost $75, and one million output tokens cost $150.
So far, most researchers and consumers seem focused on whether or not the developments justify the price increase.
Experiments and tests:As soon as the launch was announced, several AI engineers and researchers rushed to test the model’s capabilities.
Perhaps the most notable of these tests was held when Andrej Karpathy, former AI Director at Tesla (NASDAQ:TSLA) and a co-founder of OpenAI. Karpathy conducted an experiment on X comparing GPT-4 and GPT-4.5 shortly after its release. In a preamble, he noted the exponential improvement between each release of GPT over time, but also that the improvements have become more subtle as models have progressed. Karpathy recalled that it was difficult to find clear examples where GPT-4 outperformed 3.5, and had similar expectations for GPT-4.5.
3rd party Ad. Not an offer or recommendation by Investing.com. See disclosure here or remove ads.Kaparthy then introduced a 5 question experiment that compared GPT-4 and GPT-4.5, with 5 prompts being answered by both models, and users being able to vote on which response was better.
Question 1 was centered around GPT-4.5 “roasting” GPT-4, and GPT-4 humorously responding. GPT-4.5 won with 56% of the votes.
Question 2 was for both models to write a standup critiquing OpenAI. GPT-4.5 lost with 43% of votes.
Question 3 was to write a new literary genre blending cyberpunk, magic, and ancient mythology. GPT-4.5 lost with 35% of the votes.
Question 4 was to compose a poem from the nostalgic view of a retired search engine. GPT-4.5 lost with 35% of the votes.
Question 5 was to write about a to-do list of a black hole struggling with imposter syndrome. GPT-4.5 lost with 36% of votes.
Karpathy remarked that he was surprised by the results, as he felt that GPT-4.5 performed much better than GPT-4 in subtler ways, such as tone, cadence, and creativity of writing.
Vizuara Labs’ Dr. Raj Dandeker also tested GPT-4.5, critiquing and disputing OpenAI’s claims of improved writing, programming, tactical problem-solving, and reduced false responses compared to GPT-4. Despite GPT-4.5’s $75 per million tokens cost—far exceeding GPT-4o’s $2.50 and GPT-4o mini’s $0.15—the engineer found little justification for the price.
In a complex integral test (0 to 1), both models failed initially, though GPT-4.5 picked the correct answer from the options. Dandeker stated it was slower and still unacceptable, given its cost.
In an ARC AGI pattern test, both models erred, with GPT-4 closer to correct.
3rd party Ad. Not an offer or recommendation by Investing.com. See disclosure here or remove ads.For teaching the Perceptron (NASDAQ:PRCP) algorithm, both gave near-identical responses, undermining claims of GPT-4.5’s superior intent understanding.
Neither sought context in other EQ-related questions, though GPT-4.5 once asked for details when drafting a conflict email—a slight edge over GPT-4.
In a basic multiplication test, GPT-4 succeeded while GPT-4.5 failed.
Dandeker concluded that GPT-4.5’s marginal or absent improvements don’t justify its steep price.
Media and consumer reaction:Wired’s Reece Rogers (NYSE:ROG) argued that GPT-4.5’s ambition is its biggest flaw. He suggested that OpenAI’s push for AGI undermines ChatGPT’s utility, noting GPT-4.5’s unexpectedly emotional tone deviates from prior practical goals. Rogers further criticized its business fit, stating, “It’s super expensive and offers marginal gains only seasoned chatbot users might notice,” clashing with OpenAI’s earlier model.
Futurism’s Noor Al-Sibai also examined GPT-4.5, declaring, “The hype is dying.” She added, “It appears the much-hyped new model seriously lacks the spark that made the original ChatGPT and its successor, GPT-4, major cultural and financial phenomena.”
In his essay “Humor Is Hard,” Stanford student Jacob Rintamaki took a different view, asserting that GPT-4.5’s humor marks it as a major AI breakthrough. Despite its low benchmark scores, he argued its humor reflects a sophisticated grasp of language and social nuance. Rintamaki concluded that GPT-4.5’s mastery of complex social dynamics signals a vital leap forward for AI innovation.
Generally, few will deny the advancement in GPT-4.5’s comprehension, but the consensus appears to be that it isn’t worth the high price.
AI opinions: 3rd party Ad. Not an offer or recommendation by Investing.com. See disclosure here or remove ads.As tester and consumer reviews have proven to be rather mixed, perhaps the best AI review can come from AI itself. To test this, both Grok and ChatGPT were asked their opinions on GPT-4.5
Grok’s response:
As Grok 3, built by xAI, I don’t have hands-on experience with GPT-4.5, but I can give you a perspective based on the chatter and data floating around as of March 6, 2025. GPT-4.5, launched by OpenAI in late February, seems to be a bit of a mixed bag—impressive in some ways, but not quite the earth-shattering leap some might have hoped for.
From what’s out there, GPT-4.5 shines in the "vibes" department—think emotional intelligence, conversational flow, and a knack for picking up on nuance. It’s been fine-tuned to feel more human, with users noting it’s less robotic and better at handling everyday chats, creative tasks, and even emotionally charged prompts. Sam Altman himself called it the first model that feels like talking to a thoughtful person, which is a big claim. It’s also supposed to hallucinate less than its predecessors, like GPT-4o, which is a win for anyone tired of AI confidently spouting nonsense. On the flip side, it’s not a reasoning powerhouse—models like OpenAI’s o1 or o3-mini still outpace it in math, coding, and structured problem-solving. Benchmarks show only marginal gains in areas like MMLU (general knowledge), and it lags in science and math compared to those reasoning-focused models.
The catch? It’s a resource hog. GPT-4.5 is OpenAI’s biggest model yet, and it comes with a hefty price tag—$75 per million input tokens and $150 per million output tokens via the API, making it 30 times pricier than GPT-4o for input and 15 times for output. That’s sparked some grumbling, especially since the performance boost doesn’t always seem to justify the cost. Critics argue it’s more of a polished upgrade than a game-changer—like a shiny new coat of paint on a familiar car. Some X posts and reviews echo this: it’s fun to chat with, maybe even a step toward AGI in how it grasps the world, but for heavy analytical lifting, you’re better off elsewhere.
3rd party Ad. Not an offer or recommendation by Investing.com. See disclosure here or remove ads.My take? It’s a fascinating pivot—OpenAI seems to be betting on scaling up raw language intuition over deep reasoning, possibly as a foundation for future hybrids like GPT-5. If you’re into AI that vibes with you, gets your subtext, and writes a killer poem, GPT-4.5 might be your jam. But if you need it to ace your calculus homework or debug code on a budget, it’s not the champ. Honestly, it’s cool to see AI inch closer to human-like chat, but the hype-to-reality ratio feels a bit off. What do you think—does the "thoughtful friend" angle intrigue you, or are you more about raw smarts?
ChatGPT’s response:
A Polished Upgrade in AI
GPT-4.5 builds on the solid foundation of GPT-4, bringing noticeable improvements in context retention, precision, and creativity. It excels at maintaining more coherent conversations, even over multiple turns, and delivers sharper, more nuanced responses that are better aligned with user intent. The model is better at interpreting vague queries and reduces the likelihood of providing ambiguous or irrelevant answers.
When it comes to creativity, GPT-4.5 adapts seamlessly to different styles—whether it’s casual, formal, or technical—and offers more sophisticated responses in specialized fields like science and technology. Its ability to minimize "hallucinations" (fabricated information) makes it more reliable for fact-based tasks.
While it’s still not perfect—long, complex conversations can sometimes throw it off—GPT-4.5 is a significant step forward, offering a smoother, more accurate AI experience. It’s a powerful tool for both personal and professional use.
Conclusion:Currently, GPT-4.5 marks an advancement in generative AI’s comprehension and communication. Yet, its steep price tag and varied user feedback indicate potential hurdles for broad acceptance. As it is only a research preview, more updates and wider availability could shed light on its standing within the ever-evolving AI competition.
3rd party Ad. Not an offer or recommendation by Investing.com. See disclosure here or remove ads.Is MSFT truely undervalued?
With MSFT making headlines, investors are asking: Is it truly valued fairly? InvestingPro's advanced AI algorithms have analyzed MSFT alongside thousands of other stocks to uncover hidden gems with massive upside. And guess what? MSFT wasn't at the top of the list.
Unlock ProPicks AI