AI Does Better Than Students in Japanese Tests
AI Does Better Than Students in Japanese Tests
Introduction
A company called LifePrompt tested a new AI. This AI is called ChatGPT 5.2. It took tests for the University of Tokyo and Kyoto University. The AI got more points than the best students.
Main Body
The AI was very good at math and science. It got a perfect score in math. It also got high points in law and medicine tests. Teachers from a school checked the AI's writing to make sure it was correct. But the AI was not perfect. It was very good at English. However, it was bad at World History. It only got 25% of the history questions right. Older AI models were not this good. In 2024, the AI failed the tests. In 2025, the AI passed the tests for the first time. Now, the new AI is much stronger.
Conclusion
The AI is now better than humans at tests with facts and numbers. But it still has problems with some writing tasks.
Vocabulary Learning
Sentence Learning
Generative AI Performance in Elite Japanese University Entrance Exams
Introduction
The AI company LifePrompt Inc. reported on April 27 that OpenAI's ChatGPT 5.2 Thinking model scored higher than the top human candidates in the 2026 entrance exams for the University of Tokyo and Kyoto University.
Main Body
To test the AI, the company converted exam questions into images. To ensure the essay answers were graded fairly, educators from the Kawai Juku preparatory school performed the evaluations. At the University of Tokyo, the model scored 503 out of 550 points in the Natural Sciences III medical track, beating the top human score of 453 by 50 points, and achieved a perfect score in mathematics. In the Humanities and Social Sciences exam, the AI scored 452 out of 550, which was higher than the top successful applicant's score of 434. Similarly, at Kyoto University, the model outperformed the top human scores in both the Faculty of Law and the Faculty of Medicine. However, the AI's performance varied depending on the subject. While it achieved a 90% accuracy rate in English, it only scored 25% on World History essay questions. These results show a major improvement in AI capabilities. Previous versions tested by LifePrompt in 2024 failed to pass, while the 2025 model was the first to reach the minimum passing score. Experts have different opinions on what these results mean for human intelligence and education. Satoshi Endo, head of LifePrompt, asserted that the rapid development of AI means businesses must change their long-term strategies over the next twenty years. On the other hand, Satoshi Kurihara, a professor at Keio University, criticized the comparison between humans and AI. He argued that because AI can absorb massive amounts of data, it is like a calculator. Consequently, he emphasized that universities should rethink exams that focus on memory and calculation rather than the ability to create original value.
Conclusion
In summary, while generative AI has outperformed humans in standardized tests and knowledge-based questions, it still faces challenges in specific areas of qualitative essay writing.
Vocabulary Learning
Sentence Learning
Generative AI Performance in Elite Japanese University Entrance Examinations
Introduction
The AI venture LifePrompt Inc. reported on April 27 that OpenAI's ChatGPT 5.2 Thinking model achieved scores exceeding those of the highest-ranking human candidates in the 2026 entrance examinations for the University of Tokyo and Kyoto University.
Main Body
The assessment methodology involved converting examination questions into image data for the AI model. To ensure accuracy in evaluating essay-based responses, grading was conducted by educators from the Kawai Juku preparatory school. At the University of Tokyo, the model attained 503 out of 550 points in the Natural Sciences III medical track—surpassing the top human score of 453 by 50 points—and achieved a perfect score in mathematics. In the Humanities and Social Sciences exam, the AI scored 452 out of 550, exceeding the top successful applicant's score of 434. Similarly, at Kyoto University, the model recorded 771 points for the Faculty of Law (surpassing the top score of 734) and 1,176 points for the Faculty of Medicine (surpassing the top score of 1,098). Despite these results, the AI demonstrated disparate performance across different subject types. While the model achieved a 90% accuracy rate in English, its performance in World History essay questions was limited to 25%. These outcomes represent a significant progression in model capability; previous iterations tested by LifePrompt in 2024 (ChatGPT 4) failed to meet the minimum passing requirements, while the 2025 model (o1) first succeeded in crossing the passing threshold. Stakeholder perspectives on these results diverge regarding the implications for human cognition and institutional evaluation. Satoshi Endo, head of LifePrompt, posits that the velocity of AI development necessitates a long-term strategic shift in business operations over the next two decades. Conversely, Satoshi Kurihara, head of the Japanese Society for Artificial Intelligence and professor at Keio University, suggests that comparing human and AI performance is fundamentally flawed due to the AI's capacity for massive data absorption. Professor Kurihara likens the AI's efficiency to that of a calculator and argues that this trend necessitates a re-evaluation of entrance examinations that prioritize calculation and knowledge retention over the creation of original value.
Conclusion
The current situation indicates that while generative AI has surpassed human performance in standardized quantitative and knowledge-based testing, it continues to exhibit limitations in specific qualitative essay domains.