Generative AI Performance in Elite Japanese University Entrance Examinations

Introduction

The AI venture LifePrompt Inc. reported on April 27 that OpenAI's ChatGPT 5.2 Thinking model achieved scores exceeding those of the highest-ranking human candidates in the 2026 entrance examinations for the University of Tokyo and Kyoto University.

Main Body

The assessment methodology involved converting examination questions into image data for the AI model. To ensure accuracy in evaluating essay-based responses, grading was conducted by educators from the Kawai Juku preparatory school. At the University of Tokyo, the model attained 503 out of 550 points in the Natural Sciences III medical track—surpassing the top human score of 453 by 50 points—and achieved a perfect score in mathematics. In the Humanities and Social Sciences exam, the AI scored 452 out of 550, exceeding the top successful applicant's score of 434. Similarly, at Kyoto University, the model recorded 771 points for the Faculty of Law (surpassing the top score of 734) and 1,176 points for the Faculty of Medicine (surpassing the top score of 1,098). Despite these results, the AI demonstrated disparate performance across different subject types. While the model achieved a 90% accuracy rate in English, its performance in World History essay questions was limited to 25%. These outcomes represent a significant progression in model capability; previous iterations tested by LifePrompt in 2024 (ChatGPT 4) failed to meet the minimum passing requirements, while the 2025 model (o1) first succeeded in crossing the passing threshold. Stakeholder perspectives on these results diverge regarding the implications for human cognition and institutional evaluation. Satoshi Endo, head of LifePrompt, posits that the velocity of AI development necessitates a long-term strategic shift in business operations over the next two decades. Conversely, Satoshi Kurihara, head of the Japanese Society for Artificial Intelligence and professor at Keio University, suggests that comparing human and AI performance is fundamentally flawed due to the AI's capacity for massive data absorption. Professor Kurihara likens the AI's efficiency to that of a calculator and argues that this trend necessitates a re-evaluation of entrance examinations that prioritize calculation and knowledge retention over the creation of original value.

Conclusion

The current situation indicates that while generative AI has surpassed human performance in standardized quantitative and knowledge-based testing, it continues to exhibit limitations in specific qualitative essay domains.

Vocabulary Learning

disparate (adj.)
Distinctly different / 迥然不同的
Example:The model demonstrated disparate performance across different subject types.
iterations (n.)
Repetitive cycles of development or refinement / 迭代
Example:Previous iterations tested by LifePrompt in 2024 failed to meet the minimum passing requirements.
progression (n.)
The act of moving forward or advancing / 進展
Example:These outcomes represent a significant progression in model capability.
stakeholder (n.)
A person or group with an interest or concern in a project / 利益相關者
Example:Stakeholder perspectives on these results diverge regarding the implications for human cognition.
threshold (n.)
A point of entry or limit beyond which something changes / 門檻
Example:The 2025 model (o1) first succeeded in crossing the passing threshold.

Sentence Learning

To ensure accuracy in evaluating essay-based responses, grading was conducted by educators from the Kawai Juku preparatory school.
Infinitival Clause: The sentence begins with an infinitival clause functioning as an adverbial modifier that sets the purpose of the main clause.不定式從句: 這句以不定式從句作為狀語,說明評分的目的。
At the University of Tokyo, the model attained 503 out of 550 points in the Natural Sciences III medical track—surpassing the top human score of 453 by 50 points—and achieved a perfect score in mathematics.
Participial Phrase: The dash introduces a participial phrase "surpassing the top human score of 453 by 50 points" which modifies the preceding clause, highlighting the model's superiority.分詞短語: 破折號後的分詞短語「surpassing the top human score of 453 by 50 points」修飾前面的主句,強調模型的優勢。
Satoshi Endo, head of LifePrompt, posits that the velocity of AI development necessitates a long-term strategic shift in business operations over the next two decades.
Appositive: The appositive "head of LifePrompt" provides additional identifying information about Satoshi Endo, clarifying his role.同位語: 同位語「head of LifePrompt」補充說明Satoshi Endo的身份,闡明其職務。
While the model achieved a 90% accuracy rate in English, its performance in World History essay questions was limited to 25%.
Concessive Clause: The subordinate clause introduced by "While" contrasts the AI's high accuracy in English with its low performance in World History, creating a concessive relationship.讓步從句: 以「While」引導的從句對比了AI在英語上的高準確率與在世界歷史上的低表現,形成讓步關係。
Professor Kurihara likens the AI's efficiency to that of a calculator and argues that this trend necessitates a re-evaluation of entrance examinations that prioritize calculation and knowledge retention over the creation of original value.
Relative Clause: The relative clause "that prioritize calculation and knowledge retention over the creation of original value" modifies "entrance examinations", specifying the type of examinations being critiqued.關係從句: 關係從句「that prioritize calculation and knowledge retention over the creation of original value」修飾「entrance examinations」,說明被批評的考試類型。