A groundbreaking study has found that lawyers consistently fell short against AI in legal research, overturning longstanding beliefs about the capabilities of artificial intelligence in the legal profession.
Vals AI has completed a groundbreaking new study, pitting four AI tools – Alexi, Counsel Stack, Midpage, and ChatGPT – head-to-head against a group of practising lawyers to evaluate their performance in legal research.
The study evaluated how both AI platforms and lawyers performed on 200 US legal research questions, carefully crafted with input from US law firms, covering a wide range of scenarios “typically encountered in private practice”.
Each response was evaluated across three weighted criteria: accuracy (50 per cent) to check if it was factually correct, authoritativeness (40 per cent) to gauge the quality of sources, and appropriateness (10 per cent) to see if it was clear, reliable, and ready to share with colleagues or clients.
Surprisingly, the AI products clearly outperformed the lawyers in legal research, achieving average weighted scores of 74 to 78 per cent, compared with the lawyers’ average of 69 per cent.
Among the AI tools, the legal AI products outperformed the generalist AI, with Counsel Stack achieving the highest score across all evaluated criteria.
The study also revealed that AI products outperformed lawyers on 150 of the 200 questions (75 per cent), with the AI achieving an average margin of 31 percentage points on those questions.
However, it wasn’t all bad news for the lawyers. The study found that in four of the question types, lawyers outperformed AI, particularly in cases that required a “deeper understanding of context, complex multi-jurisdictional reviews and judgment-based synthesis was required”.
Breaking down the 3 measured criteria
Accuracy was strong across the board, but legal AI products clearly outperformed the human lawyers, achieving an average score of 80 per cent compared with the lawyers’ 71 per cent.
Authoritativeness proved to be the closest contest, reflecting the challenge of sourcing and citing relevant legal materials, but AI still held the edge, averaging 74 per cent compared with the lawyers’ 68 per cent.
The gap was most striking in terms of appropriateness, with AI products averaging 69.5 per cent – well ahead of human lawyers, who managed just 60 per cent.