Artificial intelligence (AI) is transforming education in many positive ways, from adaptive learning to scalable testing environments. But it’s also introducing new risks, and that’s particularly true in the high-stakes world of English language testing, where credibility, fairness and trust are essential.
As online assessment becomes more sophisticated, so too do the methods used to undermine it – and that’s why in today’s post, we’ll argue that the future of assessment is not fully automated. While AI has a valuable role to play in scaling and supporting assessment, human oversight remains critical to protecting academic integrity and ensuring authentic evaluation.
The rise of AI-assisted cheating and identity fraud
Online exam fraud is a far cry from the traditional kinds of cheating we grew up with. As well as being an assessment tool, AI is something students can exploit, and institutions today face increasingly complex forms of academic misconduct powered by it.
Perhaps most troubling is deepfake technology, which can now generate convincing fake faces and voices, enabling “proxy test takers” to sit exams on behalf of other candidates undetected. According to research highlighted by DeepIDV, deepfake-enabled impersonation is a growing concern in online assessment environments. And these aren’t isolated cases. The same report notes that up to 12% of online exam attempts in unmonitored settings involved third parties, with increasing use of AI-generated identities.
At the same time, AI-powered cheating tools are becoming more accessible, making it a mainstream risk. Real-time answer generation, hidden collaboration software and automated prompts can all be used to manipulate online tests. This means that modern proctoring systems must now address a completely different scale and sophistication of cheating behaviour, making human oversight more important than ever.
Why institutions still value human judgement
Despite advances in automation, human judgement remains essential in assessment environments. A trained human proctor can interpret behaviour in context, validate performance and make fair, informed decisions in real time in ways that automated systems can’t.
As EduSynch points out, human proctors are also able to distinguish between genuine testing behaviour and unusual but harmless actions that software may incorrectly flag. This is particularly important in English language testing, where communication, interaction and nuance are central to the assessment itself. It’s also worth noting the broader shift towards valuing human evaluative judgement in education, with research by Taylor & Francis arguing that it’s becoming increasingly important in an era where generative AI can produce polished but inauthentic outputs.
In this context, we shouldn’t be viewing human oversight as an outdated legacy process or operational inefficiency, but as a core component of credible, trustworthy assessment.

The limitations of fully automated proctoring systems
AI-driven proctoring systems offer clear advantages in scalability and efficiency, but they have important limitations. Automated systems typically identify “suspicious” behaviour based on patterns and predefined rules, rather than genuine understanding of context. Innocent actions, such as a student looking away from their screen, adjusting their seating position or experiencing technical issues, may trigger alerts regardless of intent.
As Eva Heinrich highlights in her review of online proctoring systems, concerns around transparency, fairness and ethical decision-making remain significant challenges in AI-based assessment environments. AI is a powerful tool for increasing efficiency; it can enhance security, quickly identify anomalies and support large-scale assessment delivery. But without human validation and oversight, fully automated systems risk becoming incomplete and unreliable. The most effective assessment models are increasingly hybrid, combining technological capability with human expertise.
The risk of false positives and the value of human review
Another issue institutions must now contend with is the fact that AI systems can generate large volumes of alerts that require further investigation – many of which may ultimately prove harmless. According to Proctor360, automated proctoring tools can produce significant numbers of incorrect flags, adding to staff workload with their need for manual resolution. At the same time, this impacts student confidence and trust by creating false accusations and unnecessary extra scrutiny.
Human review acts as an important safeguard against this. It allows institutions to assess incidents fairly, interpret behaviour in context and ensure that legitimate candidates aren’t penalised by automated assumptions. In this sense, human oversight protects both the integrity of the institution and the experience of the learner.
Cultural nuance and contextual understanding in language testing
Looking at the bigger picture, English language assessment is fundamentally different from many other forms of testing because language itself is deeply human. There’s more to proficiency than grammatical accuracy or vocabulary recall. It’s about communication style, interpreting meaning and nuance, and the ability to respond meaningfully in context. Clearly, there’s a limit to how much AI assessment can measure this.
With that in mind, research published in the International Journal of Evaluation and Research in Education (IJERE) highlights the importance of interactive speaking assessment in evaluating authentic language ability. Similarly, Cambridge University Press notes that speaking performance is heavily influenced by context, familiarity and background knowledge.
These are factors that can’t always be accurately assessed through automated scoring systems alone; it needs real-time human interaction to provide a true measure of communication ability. Human evaluators also bring cultural awareness, contextual understanding and interpretive judgement to the assessment process – all of which are essential in measuring real-world communication skills.
The importance of live speaking interaction
Live speaking assessment remains one of the strongest safeguards of authenticity in English language testing. Speaking is a real-world skill directly linked to academic success, employability and international mobility, and as a study published in Frontiers in Psychology highlights, speaking proficiency assessment plays a critical role in validating practical language competence.
For this reason, many globally recognised English language tests continue to rely on live or face-to-face speaking interaction. Direct engagement makes it significantly harder to fake ability, outsource responses or rely on AI-generated assistance. That’s why, rather than being seen as a limitation, face-to-face or live interaction should be recognised as a differentiator – a sign of direct, authentic assessment and credibility.
Trust is the currency of assessment
As AI continues to reshape education, institutions will increasingly need to balance scalability with trust. Technology can strengthen assessment security and improve operational efficiency, but in high-stakes English language testing, credibility still depends on human judgement, contextual understanding and authentic interaction. Human oversight is not a legacy feature of assessment – it’s a strategic necessity.
Contact us to find out more about partnering with us and the reassurance our mix of advanced automation and human proctoring offers your institution.



