Subject A’s rapid rise in his score and Subject B’s stability suggested that the AI grader concentrates only on some aspects of speaking. Subject A is an international businessperson and educator and is accustomed to enunciating his words so that non-native speakers understand what he says. In other tests, graders judged that his pronunciation, intonation, and fluency were high, while this AI test did not judge it that way on the first attempt. So, he adjusted his speaking style to the one where he linked more words and spoke faster, which resulted in a gradual rise of his score almost to the full. The rise of the score is primarily due to Subject A’s ability to speak two types of English: 1) intentional enunciation and 2) native speaker-like speech. The AI grader scored the second type much higher. Here we should note the fact that generally, the faster a person speaks, the more his or her coherence of content deteriorates. This person was no exception, but his score went up drastically.
Subject B was a young ongoing learner of English who spoke with all her might every time she took the test. Her pronunciation, intonation, and speed mainly stayed the same every time she took the test. Also, the accuracy and the content stayed almost the same. After all, as all readers know, it is virtually impossible to improve one’s speaking skills drastically in two days.
Although this was a small personal experiment, it suggested essential things about the assessment by AI: It is still in the developmental stage and has advantages and disadvantages. Unfortunately, assessment robots have not reached the level we see in the movies.
Here we want to share what we found about AI assessment. (Please note that these are still only hypotheses and need more experiments to prove.) We assume AI graders are measuring the proximity of test takers’ speech to the native speakers’ sound data in the big data collected and amassed from English-speaking countries. They use state-of-art voice recognition technology to texturize test takers’ answers and assess their grammar and vocabulary. Also, we assume that AI graders measure the response time and speaking speed.
The advantages are 1. Grading is 100 percent objective with no room for human subjectiveness; and 2. You get the results quickly in several minutes, which can open up many opportunities for educators. The disadvantages are 1. They probably cannot assess the coherence and logicalness of speech; and 2. They will not value enunciation or slow and easy speech, which is commonly appreciated in global communication.
All in all, considering these advantages and disadvantages, it is crucial to know where to use AI graders and human graders. For example, we would use AI graders for screening for a particular pronunciation and grammar level and human graders for selecting a candidate for an academic or business endeavor.
Let us see how far AI can go in 2030. Will it be the level we see in the movie AI or Blade Runner?
Get featured in our blog
You might also enjoy
ELTSociety Webinar – Reliability: How do I measure it? How do I increase it? Led by our Webinar Chair, Michael Fields, and sponsored by English Language Programs – US Department of State
Deborah Crusan is professor of TESOL/Applied Linguistics at Wright State University where she prepares teachers for the language classroom and teaches linguistics, assessment, and pedagogical grammar in the MATESOL program.
Special guest: Moroni Flake, CEO of English3.Headquartered in Phoenix, Arizona, English3’s mission is to increase access to higher education around the world through accurate, affordable, and convenient online English proficiency instruction and assessment.