OpenAI O1 Released With Reasoning Abilities: OpenAI released the long-awaited o1 model and a smaller, cheaper version called o1-mini to improve its GPT powers. The company that made these releases says they can “reason through complex tasks and solve harder problems than earlier models in science, coding, and math.” The models used to be called Strawberry.
OpenAI says this is the first of a line of products like this in Chat GPT and on its API, though it is still just a preview. More are on the way. People have been teaching these models to “think through problems more before they react, much like a person would,” according to the company. As part of their training, they learn how to think more clearly, try new things, and find their mistakes.
OpenAI says that this next model update works similarly to PhD students on difficult benchmark tasks in physics, chemistry, and biology. This new O1 model is impressive because of this. It does really well in math and code too.
OpenAI O1 Released With Reasoning Abilities
Feature | Description |
---|---|
Model Name | OpenAI O1 |
Training Method | Reinforcement Learning |
Reasoning Capability | Complex reasoning with internal chain of thought |
Competitive Programming | Ranks in the 89th percentile on Codeforces |
Math Olympiad Performance | Top 500 in USA Math Olympiad qualifier (AIME) |
GPQA Benchmark | Surpasses human PhD-level accuracy in physics, biology, and chemistry |
Training Efficiency | Highly data-efficient training process |
Test-Time Compute | Performance improves with more test-time compute |
Reasoning Benchmarks | Outperforms GPT-4o on reasoning-heavy tasks |
MMLU Subcategories | Improves on 54/57 MMLU subcategories |
Math Performance | 74% average on AIME exams with a single sample per problem |
Consensus Accuracy | 83% with consensus among 64 samples |
Re-ranking Accuracy | 93% when re-ranking 1000 samples with a learned scoring function |
Human Expert Comparison | Rivals human experts on reasoning-heavy benchmarks |
Model Availability | Early version available as OpenAI O1-preview |
API Access | Available to trusted API users |
Training Constraints | Different from LLM pretraining |
Performance Improvement | Consistently improves with more reinforcement learning |
Benchmark Performance | Greatly improves over GPT-4o on challenging reasoning benchmarks |
Use Cases | Suitable for complex reasoning tasks in various domains |
GPT-4o only got 13% of the questions right on a test to qualify for the International Math Olympiad (IMO), while the reasoning model got 83%. OpenAI says that because ChatGPT is still a very early model, it does not have many of the features that make it useful, such as the ability to browse the web and share files and pictures. The company still says that GPT-4o is best for these uses.
But OpenAI thinks it is a “significant advancement and a new level of AI capability. It will be good for jobs that require complex reasoning to solve problems. OpenAI says that users of ChatGPT Plus and Team can now use the o1 model. In the model picker, you can choose the new models by hand. When it first starts, 01-preview will be able to send 30 texts per week, and o1-mini will be able to send 50. From next week, ChatGPT Enterprise and Edu will be able to use both modes for solving complex jobs. The business wants all ChatGPT users to be able to use 01-mini in the future.