
Uh Oh….
The content you’re trying to access is for members only. Please consider joining our community for exclusive access to all video and written media, plus webinar participation and a free annual conference registration. Already a member? Click here to sign in.
Subject A’s rapid rise in his score and Subject B’s stability suggested that the AI grader concentrates only on some aspects of speaking. Subject A is an international businessperson and educator and is accustomed to enunciating his words so that non-native speakers understand what he says. In other tests, graders judged that his pronunciation, intonation, and fluency were high, while this AI test did not judge it that way on the first attempt. So, he adjusted his speaking style to the one where he linked more words and spoke faster, which resulted in a gradual rise of his score almost to the full. The rise of the score is primarily due to Subject A’s ability to speak two types of English: 1) intentional enunciation and 2) native speaker-like speech. The AI grader scored the second type much higher. Here we should note the fact that generally, the faster a person speaks, the more his or her coherence of content deteriorates. This person was no exception, but his score went up drastically.
Subject B was a young ongoing learner of English who spoke with all her might every time she took the test. Her pronunciation, intonation, and speed mainly stayed the same every time she took the test. Also, the accuracy and the content stayed almost the same. After all, as all readers know, it is virtually impossible to improve one’s speaking skills drastically in two days.
Although this was a small personal experiment, it suggested essential things about the assessment by AI: It is still in the developmental stage and has advantages and disadvantages. Unfortunately, assessment robots have not reached the level we see in the movies.
Here we want to share what we found about AI assessment. (Please note that these are still only hypotheses and need more experiments to prove.) We assume AI graders are measuring the proximity of test takers’ speech to the native speakers’ sound data in the big data collected and amassed from English-speaking countries. They use state-of-art voice recognition technology to texturize test takers’ answers and assess their grammar and vocabulary. Also, we assume that AI graders measure the response time and speaking speed.
The advantages are 1. Grading is 100 percent objective with no room for human subjectiveness; and 2. You get the results quickly in several minutes, which can open up many opportunities for educators. The disadvantages are 1. They probably cannot assess the coherence and logicalness of speech; and 2. They will not value enunciation or slow and easy speech, which is commonly appreciated in global communication.
All in all, considering these advantages and disadvantages, it is crucial to know where to use AI graders and human graders. For example, we would use AI graders for screening for a particular pronunciation and grammar level and human graders for selecting a candidate for an academic or business endeavor.
Let us see how far AI can go in 2030. Will it be the level we see in the movie AI or Blade Runner?
Get featured in our blog
You might also enjoy
Testing in 10 – Barry O’Sullivan
Interview with Professor Barry O’Sullivan, Head of Assessment Research & Development at the British Council
Professor Barry O’Sullivan was responsible for the design and development of the Aptis test service. He has undertaken research across many areas on language testing and assessment and its history.
Testing in 10 – Dr. Ricky Lam
Interview with Dr. Ricky Lam — Associate Professor, Department of Education Studies, Hong Kong Baptist University Assistant Professor, Department of Education Studies, Hong Kong Baptist University Teaching Fellow,
ELTSociety Webinar – Designing Listening Tests
ELTSociety Webinar – Designing Listening Tests — Led by Webinar Chair, Michael Fields