AI Does Better Than Students in Japanese Tests (CEFR Compare)

Apr 27, 2026, 07:53

AI Does Better Than Students in Japanese Tests

Introduction

A company called LifePrompt tested a new AI. This AI is called ChatGPT 5.2. It took tests for the University of Tokyo and Kyoto University. The AI got more points than the best students.

Main Body

The AI was very good at math and science. It got a perfect score in math. It also got high points in law and medicine tests. Teachers from a school checked the AI's writing to make sure it was correct. But the AI was not perfect. It was very good at English. However, it was bad at World History. It only got 25% of the history questions right. Older AI models were not this good. In 2024, the AI failed the tests. In 2025, the AI passed the tests for the first time. Now, the new AI is much stronger.

Conclusion

The AI is now better than humans at tests with facts and numbers. But it still has problems with some writing tasks.

Vocabulary Learning

bad (adj.)

poor / undesirable壞的；不好的

Example:The weather was bad yesterday.

company (n.)

firm / business公司

Example:The company announced a new product.

good (adj.)

nice / favorable好的；優秀

Example:He did a good job on the assignment.

students (n.)

learners學生

Example:The students gathered in the library.

tests (n.)

examinations / quizzes測驗；考試

Example:She studied hard for the tests.

Sentence Learning

It got a perfect score in math.

Prepositional Phrase: The word 'in' shows where the score was earned.介系詞短語: 'in' 表示得分的領域是數學。

But the AI was not perfect.

Connector: The word 'but' shows a contrast.連接詞: 'but' 表示對比。

In 2024, the AI failed the tests.

Time Marker: 'In 2024' tells when the event happened.時間標記: 'In 2024' 表示事件發生的時間。

Now, the new AI is much stronger.

Temporal Adverb: 'Now' indicates the current situation.時間副詞: 'Now' 表示目前的情況。

The AI got more points than the best students.

Comparative: The word 'than' compares two things.比較: 'than' 用於比較兩件事物。

Generative AI Performance in Elite Japanese University Entrance Exams

Introduction

The AI company LifePrompt Inc. reported on April 27 that OpenAI's ChatGPT 5.2 Thinking model scored higher than the top human candidates in the 2026 entrance exams for the University of Tokyo and Kyoto University.

Main Body

To test the AI, the company converted exam questions into images. To ensure the essay answers were graded fairly, educators from the Kawai Juku preparatory school performed the evaluations. At the University of Tokyo, the model scored 503 out of 550 points in the Natural Sciences III medical track, beating the top human score of 453 by 50 points, and achieved a perfect score in mathematics. In the Humanities and Social Sciences exam, the AI scored 452 out of 550, which was higher than the top successful applicant's score of 434. Similarly, at Kyoto University, the model outperformed the top human scores in both the Faculty of Law and the Faculty of Medicine. However, the AI's performance varied depending on the subject. While it achieved a 90% accuracy rate in English, it only scored 25% on World History essay questions. These results show a major improvement in AI capabilities. Previous versions tested by LifePrompt in 2024 failed to pass, while the 2025 model was the first to reach the minimum passing score. Experts have different opinions on what these results mean for human intelligence and education. Satoshi Endo, head of LifePrompt, asserted that the rapid development of AI means businesses must change their long-term strategies over the next twenty years. On the other hand, Satoshi Kurihara, a professor at Keio University, criticized the comparison between humans and AI. He argued that because AI can absorb massive amounts of data, it is like a calculator. Consequently, he emphasized that universities should rethink exams that focus on memory and calculation rather than the ability to create original value.

Conclusion

In summary, while generative AI has outperformed humans in standardized tests and knowledge-based questions, it still faces challenges in specific areas of qualitative essay writing.

Vocabulary Learning

accuracy (n.)

correctness / the quality of being free from errors準確度

Example:The model achieved a 90% accuracy rate in English.

evaluations (n.)

assessment / the act of judging or measuring the quality of something評估

Example:The teachers conducted thorough evaluations of the students' essays.

outperformed (v.)

exceeded / performed better than others超越

Example:The AI outperformed the top human candidates in the exams.

rapid (adj.)

quick / happening at a fast pace快速

Example:The rapid development of AI has changed many industries.

rethink (v.)

reconsider / think about again with a new perspective重新思考

Example:Universities should rethink exams that focus on memory.

Sentence Learning

In the Humanities and Social Sciences exam, the AI scored 452 out of 550, which was higher than the top successful applicant's score of 434.

Relative Clause: The clause 'which was higher than the top successful applicant's score of 434' adds additional information about the AI's score, indicating it was better than the highest applicant's result.關係子句: 此子句為關於 AI 分數的額外資訊，說明其分數高於最高成功申請者的分數。

The essay answers were graded fairly by educators from the Kawai Juku preparatory school.

Passive Voice: The verb phrase 'were graded' is in passive voice, showing that the essay answers received the action performed by the educators.被動語態: 這句使用被動語態，主語『the essay answers』是動作的接受者，由『educators』執行。

While it achieved a 90% accuracy rate in English, it only scored 25% on World History essay questions.

Contrastive Conjunction: The word 'While' introduces a contrast between two different outcomes, showing that despite good performance in English, the AI performed poorly in World History.對比連詞: 這句使用『While』連接兩個對照的子句，表示雖然在英語上取得 90% 的準確率，但在世界歷史題目上僅 25%。

He argued that because AI can absorb massive amounts of data, it is like a calculator.

Causal Clause: The clause 'because AI can absorb massive amounts of data' explains the reason for the comparison, indicating that the AI's data absorption capacity makes it similar to a calculator.原因子句: 此句使用『because』引導原因子句，說明 AI 能吸收大量資料的原因，進而被比作計算機。

On the other hand, Satoshi Kurihara, a professor at Keio University, criticized the comparison between humans and AI.

Contrastive Conjunction: The phrase 'On the other hand' signals an opposing viewpoint, highlighting that another expert criticized the human-AI comparison.對比連詞: 這句使用『On the other hand』表示對立觀點，強調另一位教授的批評。

Generative AI Performance in Elite Japanese University Entrance Examinations

Introduction

The AI venture LifePrompt Inc. reported on April 27 that OpenAI's ChatGPT 5.2 Thinking model achieved scores exceeding those of the highest-ranking human candidates in the 2026 entrance examinations for the University of Tokyo and Kyoto University.

Main Body

The assessment methodology involved converting examination questions into image data for the AI model. To ensure accuracy in evaluating essay-based responses, grading was conducted by educators from the Kawai Juku preparatory school. At the University of Tokyo, the model attained 503 out of 550 points in the Natural Sciences III medical track—surpassing the top human score of 453 by 50 points—and achieved a perfect score in mathematics. In the Humanities and Social Sciences exam, the AI scored 452 out of 550, exceeding the top successful applicant's score of 434. Similarly, at Kyoto University, the model recorded 771 points for the Faculty of Law (surpassing the top score of 734) and 1,176 points for the Faculty of Medicine (surpassing the top score of 1,098). Despite these results, the AI demonstrated disparate performance across different subject types. While the model achieved a 90% accuracy rate in English, its performance in World History essay questions was limited to 25%. These outcomes represent a significant progression in model capability; previous iterations tested by LifePrompt in 2024 (ChatGPT 4) failed to meet the minimum passing requirements, while the 2025 model (o1) first succeeded in crossing the passing threshold. Stakeholder perspectives on these results diverge regarding the implications for human cognition and institutional evaluation. Satoshi Endo, head of LifePrompt, posits that the velocity of AI development necessitates a long-term strategic shift in business operations over the next two decades. Conversely, Satoshi Kurihara, head of the Japanese Society for Artificial Intelligence and professor at Keio University, suggests that comparing human and AI performance is fundamentally flawed due to the AI's capacity for massive data absorption. Professor Kurihara likens the AI's efficiency to that of a calculator and argues that this trend necessitates a re-evaluation of entrance examinations that prioritize calculation and knowledge retention over the creation of original value.

Conclusion

The current situation indicates that while generative AI has surpassed human performance in standardized quantitative and knowledge-based testing, it continues to exhibit limitations in specific qualitative essay domains.

Vocabulary Learning

disparate (adj.)

Distinctly different / 迥然不同的

Example:The model demonstrated disparate performance across different subject types.

iterations (n.)

Repetitive cycles of development or refinement / 迭代

Example:Previous iterations tested by LifePrompt in 2024 failed to meet the minimum passing requirements.

progression (n.)

The act of moving forward or advancing / 進展

Example:These outcomes represent a significant progression in model capability.

stakeholder (n.)

A person or group with an interest or concern in a project / 利益相關者

Example:Stakeholder perspectives on these results diverge regarding the implications for human cognition.

threshold (n.)

A point of entry or limit beyond which something changes / 門檻

Example:The 2025 model (o1) first succeeded in crossing the passing threshold.

Sentence Learning

To ensure accuracy in evaluating essay-based responses, grading was conducted by educators from the Kawai Juku preparatory school.

Infinitival Clause: The sentence begins with an infinitival clause functioning as an adverbial modifier that sets the purpose of the main clause.不定式從句: 這句以不定式從句作為狀語，說明評分的目的。

At the University of Tokyo, the model attained 503 out of 550 points in the Natural Sciences III medical track—surpassing the top human score of 453 by 50 points—and achieved a perfect score in mathematics.

Participial Phrase: The dash introduces a participial phrase "surpassing the top human score of 453 by 50 points" which modifies the preceding clause, highlighting the model's superiority.分詞短語: 破折號後的分詞短語「surpassing the top human score of 453 by 50 points」修飾前面的主句，強調模型的優勢。

Satoshi Endo, head of LifePrompt, posits that the velocity of AI development necessitates a long-term strategic shift in business operations over the next two decades.

Appositive: The appositive "head of LifePrompt" provides additional identifying information about Satoshi Endo, clarifying his role.同位語: 同位語「head of LifePrompt」補充說明Satoshi Endo的身份，闡明其職務。

While the model achieved a 90% accuracy rate in English, its performance in World History essay questions was limited to 25%.

Concessive Clause: The subordinate clause introduced by "While" contrasts the AI's high accuracy in English with its low performance in World History, creating a concessive relationship.讓步從句: 以「While」引導的從句對比了AI在英語上的高準確率與在世界歷史上的低表現，形成讓步關係。

Professor Kurihara likens the AI's efficiency to that of a calculator and argues that this trend necessitates a re-evaluation of entrance examinations that prioritize calculation and knowledge retention over the creation of original value.

Relative Clause: The relative clause "that prioritize calculation and knowledge retention over the creation of original value" modifies "entrance examinations", specifying the type of examinations being critiqued.關係從句: 關係從句「that prioritize calculation and knowledge retention over the creation of original value」修飾「entrance examinations」，說明被批評的考試類型。