Large language models and the law

Topics:

  • LLMs
  • Legal Intelligence
  • Generative AI

Location:

  • United States

Driving the news:

In the past couple of years, Large Language Models (LLMs), a form of generative artificial intelligence (Gen AI), have become powerful tools used in several industries, and are likely to become more integrated into business processes. This includes the legal industry, where a growing number of firms are utilizing LLMs for everything from document review to drafting briefs. However, as shown by a recent study by researchers at Yale and Stanford, LLMs are still prone to making “legal hallucinations” that raise concerns about the reliability of AI in more complex legal tasks.

Why it matters: 

LLMs and other AI tools have become increasingly capable of handling legal tasks, with some observers predicting that AI could result in a fundamental shift in the legal profession, potentially replacing human attorneys for many functions. However, while LLMs may be able to reproduce legal writing and analysis, it is far from a perfect replication. 

Researchers at Stanford’s RegLab and Institute for Human-Centered AI put out a paper, “Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models,” that highlights legal errors made by LLMs like ChatGPT 3.5, PaLM 2, and Llama 2. The study found that LLMs generated “legal hallucinations” – responses inconsistent with legal facts, caselaw, and standards– at a rate of 69-88% of the time when asked for an answer to a direct, verifiable question. The study also found that LLMs usually give answers to legal questions that seem correct but are actually inaccurate and fight to accurately asses their level of certainty. In essence, AI has a tendency to give bad answers that look correct, and can be overconfident about them. 

Unsurprisingly, LLMs tended to do worse when asked more complex questions, like providing the central holding of a case or determining whether two cases were in doctrinal agreement with one another. The researchers also found that the legal hallucinations varied by court and jurisdiction, with the LLMS performing better at higher judiciary levels and more populated jurisdictions. LLMs also generated fewer hallucinations when analyzing newer cases. AI also did not fare well with “contrafactual bias,” tending to accept that premises in an input query were true even if they were objectively false, leading to correct-sounding responses based on false premises.

Moving forward:

Lawyers’ jobs are safe for now– at least until AI’s kinks are worked out. In the meantime, those utilizing AI tools for legal tasks should keep a watchful eye out for the stylistically-correct but legally-unsound answers they can generate. Any tool or instrument is, of course, only as good as its user: legal professionals should consider whether AI is the appropriate tool for the job that needs to be done, and should be cognizant of the accuracy of the content of prompts and queries they are feeding to LLMs.

There are also organizations working to push AI forward in the legal field. For example, Stanford’s Legal Design Lab’s Justice Innovation project seeks to leverage AI in a way that makes legal services more accessible, hoping to eventually develop AI tools that help with legal tasks like creating court filings or conducting legal research.

Related articles