Generative AI Performance in Elite Japanese University Entrance Examinations
Introduction
The AI venture LifePrompt Inc. reported on April 27 that OpenAI's ChatGPT 5.2 Thinking model achieved scores exceeding those of the highest-ranking human candidates in the 2026 entrance examinations for the University of Tokyo and Kyoto University.
Main Body
The assessment methodology involved converting examination questions into image data for the AI model. To ensure accuracy in evaluating essay-based responses, grading was conducted by educators from the Kawai Juku preparatory school. At the University of Tokyo, the model attained 503 out of 550 points in the Natural Sciences III medical track—surpassing the top human score of 453 by 50 points—and achieved a perfect score in mathematics. In the Humanities and Social Sciences exam, the AI scored 452 out of 550, exceeding the top successful applicant's score of 434. Similarly, at Kyoto University, the model recorded 771 points for the Faculty of Law (surpassing the top score of 734) and 1,176 points for the Faculty of Medicine (surpassing the top score of 1,098). Despite these results, the AI demonstrated disparate performance across different subject types. While the model achieved a 90% accuracy rate in English, its performance in World History essay questions was limited to 25%. These outcomes represent a significant progression in model capability; previous iterations tested by LifePrompt in 2024 (ChatGPT 4) failed to meet the minimum passing requirements, while the 2025 model (o1) first succeeded in crossing the passing threshold. Stakeholder perspectives on these results diverge regarding the implications for human cognition and institutional evaluation. Satoshi Endo, head of LifePrompt, posits that the velocity of AI development necessitates a long-term strategic shift in business operations over the next two decades. Conversely, Satoshi Kurihara, head of the Japanese Society for Artificial Intelligence and professor at Keio University, suggests that comparing human and AI performance is fundamentally flawed due to the AI's capacity for massive data absorption. Professor Kurihara likens the AI's efficiency to that of a calculator and argues that this trend necessitates a re-evaluation of entrance examinations that prioritize calculation and knowledge retention over the creation of original value.
Conclusion
The current situation indicates that while generative AI has surpassed human performance in standardized quantitative and knowledge-based testing, it continues to exhibit limitations in specific qualitative essay domains.