In attempting to keep up with (or ahead of) the competition, model releases proceed at a steady clip: GPT-5.2 represents OpenAI’s third major model release since August. GPT-5 launched that month with a new routing system that toggles between instant-response and simulated reasoning modes, though users complained about responses that felt cold and clinical. November’s GPT-5.1 update added eight preset “personality” options and focused on making the system more conversational.
Numbers go up
Oddly, even though the GPT-5.2 model release is ostensibly a response to Gemini 3’s performance, OpenAI chose not to list any benchmarks on its promotional website comparing the two models. Instead, the official blog post focuses on GPT-5.2’s improvements over its predecessors and its performance on OpenAI’s new GDPval benchmark, which attempts to measure professional knowledge work tasks across 44 occupations.
During the press briefing, OpenAI did share some competition comparison benchmarks that included Gemini 3 Pro and Claude Opus 4.5 but pushed back on the narrative that GPT-5.2 was rushed to market in response to Google. “It is important to note this has been in the works for many, many months,” Simo told reporters, although choosing when to release it, we’ll note, is a strategic decision.
According to the shared numbers, GPT-5.2 Thinking scored 55.6 percent on SWE-Bench Pro, a software engineering benchmark, compared to 43.3 percent for Gemini 3 Pro and 52.0 percent for Claude Opus 4.5. On GPQA Diamond, a graduate-level science benchmark, GPT-5.2 scored 92.4 percent versus Gemini 3 Pro’s 91.9 percent.
OpenAI says GPT-5.2 Thinking beats or ties “human professionals” on 70.9 percent of tasks in the GDPval benchmark (compared to 53.3 percent for Gemini 3 Pro). The company also claims the model completes these tasks at more than 11 times the speed and less than 1 percent of the cost of human experts.
GPT-5.2 Thinking also reportedly generates responses with 38 percent fewer confabulations than GPT-5.1, according to Max Schwarzer, OpenAI’s post-training lead, who told VentureBeat that the model “hallucinates substantially less” than its predecessor.
However, we always take benchmarks with a grain of salt because it’s easy to present them in a way that is positive to a company, especially when the science of measuring AI performance objectively hasn’t quite caught up with corporate sales pitches for humanlike AI capabilities.
Independent benchmark results from researchers outside OpenAI will take time to arrive. In the meantime, if you use ChatGPT for work tasks, expect competent models with incremental improvements and some better coding performance thrown in for good measure.
Source link
#OpenAI #releases #GPT5.2 #code #red #Google #threat #alert









