Hacker news

Top
New
Past
Ask
Show
Jobs

AI outperforms law professors in Stanford Law study (https://law.stanford.edu)

411 points by berlianta 4 days ago | 357 comments | View on ycombinator

godelski 3 days ago |

I find this study quite suspect. I'd have to dive deeper but there's definitely significant alarm bells that should be going off for anyone reading.

Figure 2 (page 6) screams problems. There's only 16 professors (3k comparisons each?!?!) and the professors are all over the place. That's very high variance, suggesting the study has no meaningful statistical power. Poor instructor 16 can't catch a break lol

There's also really clear bias given that the main results only feature Google models. Other models show up elsewhere, why not there?

I'm no lawyer, but I'm a pretty competent statistician and can confidently say this paper has a smell to it. I can't call it bullshit, but there are red flags all over

causal 4 days ago |

As a software engineer I have some intuition for what the risks are of letting agents do some tasks vs others.

I don't have a similar intuition calibrated for what could go wrong when asking AI to draft a legal document. Some things seem harmless, i.e. drafting a will, but I don't really know- our legal system is notoriously rife with footguns.

finnborge 3 days ago |

I understand why the conversation on this article looks like it does, but the study is specifically focused on the potential for LLMs to operate as tutors for law students. I enjoy the extrapolation out to whether LLMs will replace lawyers, but did not find that to be discussed in the study itself.

In the framing of using LLMs as legal tutors, with the implication of lowering the cost of legal training, this seems like a socially-positive outcome. Furthermore, it feels kind of intuitive to me that any contemporary system operating with an LLM and access to legal reference material will be prepared to answer _student-originated questions_ comprehensively and with breadcrumbs or direct references to educational/source materials, as seems to have been found in the study.

The authors explicitly and intentionally emphasize that many legal questions require contextualization, as opposed to some discrete calculated answer. The result of the study implies that the LLM-based systems were capable of using what many of us here understand to be the "stochastic best-fit algorithmic generation" of a contemporary language model to adequately contextualize a student's question, providing insight into the trade-offs or complications implicit in the question, while then, critically, _meeting the professional standards of legal educators in explaining that complexity to a student_.

Realistically, I would hope this provides some confidence to readers of HN that they can actually ask a legal question to an LLM and expect the response will explain the complexity of the law in relation to the question. This is great news, and is likely the minimal pre-work any of us should do before actually consulting a lawyer, if time permits.

On the other hand, I do _not_ think that this study provides any indication that an LLM is prepared to actually provide direct legal counsel. Possibly in the same way that a legal textbook does not replace legal counsel, or perhaps more accurately, the same way that stumbling upon a legal case study for approximately the same situation you're in doesn't guarantee you'll have the same result.

quantisan 3 days ago |

I'm surprised Stanford Law would go along with this over-reaching press release title. How about "For common first-year contracts-law questions, law professors preferred AI-generated answers to professor-generated answers"

aristofun 3 days ago |

In general it is not surprising. Even if this particular study is bad.

There are certain areas of law work that are about analyzing large amounts of texts, drawing conclusions and writing other texts based on that and nothing more. That is literally the bread of LLMs.

Those types of lawyers should be the first in line for unemployment, not programmers, not even close.

chewbacha 4 days ago |

My best guess is that Gemini was trained on the textbooks that the questions are meant to test against, thus they are probably better at explicit recall of those questions or related questions.

This is a pretty limited introductory course based on what it says in the methods of the paper itself.

mrdependable 3 days ago |

I wonder if this could be explained in a similar way to Hollywood movies. If the movies are designed to please the largest group of people, there is a greater chance people will choose to see it than another movie. The human law professors come with their own personalities, beliefs, and opinions that come through in their writing. An LLM has been trained to please the largest swathe of the population. That doesn't mean the answer is better; just like Captain America isn't necessarily better than American Beauty.

ulrischa 3 days ago |

By its very nature, the field of law is ideally suited for AI language models. Fundamentally, everything is based on interconnected texts. I believe that even larger waves of layoffs could loom here than in the IT sector. However, it is likely that a more powerful lobby will be at work here—one that will grossly inflate the perceived value of their work and shield it from outside intrusion.

dogmayor 3 days ago |

Figure I.1 is telling. It shows answer length is the strongest predictor of win rate. I suspect this is due to the flawed methodology of the study. Professors were instructed to be succinct ("Please be concise. We expect that each answer takes no more than 3 minutes to write down.") and likely erred on the short side. Also, professors may not have put great effort into their written answers, especially when already trying to be concise. This isn't the headline the authors think it is.

wilg 4 days ago |

> In a blind evaluation of nearly 3,000 anonymized comparisons, professors rated AI responses significantly higher than answers written by other professors, with AI winning 75% of head-to-head matchups.

75% win rate seems pretty good!

Paper link: https://law.stanford.edu/wp-content/uploads/2026/06/salinas_...

applicative 3 days ago |

What the LLM cannot do is explain why it said what it said, when cross-examined. It simply hallucinates the best account of why someone would have said such a thing as it said, same as it can give a probable account of why someone else said something different. The question 'But why did you say this not that ...?' does not lead it to make explicit its grounds for what it said, but just to make a new more complicated statement.

rockskon 3 days ago |

I do question at what point AI could be useful as a teaching aid.

The quality of LLMs depends heavily on, among other things, how you word your questions.

Knowing the correct questions to ask is not something most students know how to do given that it tends to require a fair bit of pre-existing domain knowledge.

gamblor956 3 days ago |

While they provided the questions that professors and LLMs were asked to respond to, they don't include any of the answers from either the humans or the LLMs, so there's no way to independently verify that the LLMs actually returned "better" answers.

Given the number of responses the professors were asked to rate (200 each), they probably graded them the same way that bar exam responses are graded: quickly and superficially. Not surprising that LLMs achieved higher scores in this scenario, since they excel at producing superficially nice answers that don't hold up under scrutiny.

Also...unless statistics has changed in the past 2 decades, the math in the charts doesn't math. That's probably why they're leaving out the actual numerical data. I also wouldn't be surprised if we learn in the coming days that the charts were AI generated.

epicureanideal 3 days ago |

One way to make legal services more affordable and accessible would be to put the burden of ensuring the AI legal services are accurate on a private-public partnership with the government.

If a person using the service is given inaccurate legal advice and acts on that advice, the person can't be charged with a crime, can't be given any civil penalties, etc., as long as the law in question is non-obvious.

Obviously if by some exploit, some fundamentally obvious crime (murder, theft, obvious fraud, etc.) is said to be legal, that wouldn't apply, but of course the service should try to prevent those kinds of exploits anyway.

Could limit this to something like business regulations to begin with, or even specifically for small businesses, or contracts within some time limit and dollar amount that would otherwise be coverable by small claims court, etc.

piker 3 days ago |

Having been a law student and practicing lawyer, it's clear to me that law professors aren't really representative of much if any part of private practice. Most of the things they think and reason about are quite theoretical and academic, and it doesn't surprise me that the models would regurgitate a more average response which most human graders would prefer.

That's the entire point, though!

The legal academy is supposed to have outlying opinions on things and present novel philosophical answers to questions. (And questions to answers!) So in addition to the statistical arguments against this paper made elsewhere, to me it doesn't real much new information.

TrackerFF 3 days ago |

In many (most?) countries you can defend yourself, waive your court appointed attorney. You are of course highly discouraged to do so. But sometimes people do it, mostly for smaller claims where they don't want to rack up legal bills for things which might cost more than what is at stake.

But, it makes me wonder, will clients be able to use these AI-attorney systems in the future, in the court. Where they basically either just parrot what the model is instructing them to do, or - I dunno - give the model permission to speak for them (while waiving liabilities).

I have no doubt that some complex AI system can perform better than a bottom-tier, overworked lawyer.

homeonthemtn 4 days ago |

Personally I think this is very good. One of the hardest things out there is maintaining a society in the face of changing times and it's because law is dense and slow.

I think, in the right hands, this could be huge.

iLoveOncall 3 days ago |

The title of the study "Law Professors Prefer AI Over Peer Answers" is VERY different from the title on HackerNews. This is completely clickbait at this point.

KnuthIsGod 3 days ago |

In the hands of a domain expert, AI is useful. In the hands of the naive, it is a foot gun.

I killed my Arch installation and was stuck at the GRUB prompt.Unwilling to brush up my rusty knowledge of GRUB syntax, I asked Gemini for help. The commands Gemini suggested would have wiped my hd...

Once Gemini was told that I was using BTRFS, the suggestion from Gemini looked a bit more sane, but still looked incorrect to me.

It was only after I informed Gemini that I was using a NMVE with BTRFS that it finally produced a sane command.

galaxyLogic 3 days ago |

I'm going to need some legal help for my startup. But I can't pay much. So I figured I will ask AI all relevant questions, as well as forms filled etc. Perhaps even create a patent-application for me.

THEN I find a human lawyer and give AI's answers to them and say "Can you find any errors in this? Can you improve it?" .

That way I think my legal bills should be smaller because the AI has already done most of the work. What do you think? Which LLM is best for legal work?

eichi_uehara 3 days ago |

I beat lawyers twice before generative AI even existed. Recently I asked Gemini a few questions about personal conflicts in everyday life. It's often too conservative, with views too shallow for the problem. So I still handle human conflicts myself. I only outsource the templated stuff like routine chat replies or marketing copy though it saves me huge amount of time. People who quote AI in serious conflicts are too weak to handle them on their own.

mchl-mumo 3 days ago |

16 is such a small number for what they phrase as an important finding. It really couldn't be much harder to coordinate with 100+ professors.

Aperocky 3 days ago |

> rated AI responses significantly higher than answers written by other professors, with AI winning 75% of head-to-head matchups.

That's the problem, you never know when the 25% deliver a true stink bomb, and that's not considering prompting - while a fair prompt/question maybe considered objective, it's very easy to stray.

xyzal 3 days ago |

This contradicts my anecdata.

Recently, I tasked Opus 4.6 to study a new Czech building permit law in conjunction with some waste disposal regulations and the result was disappointing. The model could not stop drawing conclusions from obsolete regulations in its training dataset, even when given the fulltext of the new law. The usual "you are totally right" also applied and its conclusions were most of the time obviously wrong even to a human with cursory knowledge of the subject.

I ended with studying the relevant regulations myself over the weekend.

weatherlite 3 days ago |

It is important for society to understand it is not merely programmers and customer support who are at risk of losing their jobs. Clearly A.I can do much more than just program.

himata4113 3 days ago |

There is quite a simple solution for many of the problems described in the comments: Make drafting legal papers a defined interface.

If you think about it and extract sematics of any law you get something that looks familiar, sort of like code. Of course there's some complexities where certain phrases can mean different things, but legal papers in a way are written like they're programming languages already especially when it comes to law.

First we would have to define a language that can handle ambigious operations and we alread y have this with programatic proofs where n should land in x. So in the end I'd assume it would look something like this in a two party dispute:

This is very simplified and pseudo like language, writing out a full contract would be as long as a real contract.

     DEFINE DEFENDANT "A Corp"
     DEFINE PLAINTIFF "B Corp"
     DEFINE CONTRACT  CONTRACT(PLAINTIFF, DEFENDANT, 3054-41-95)

     // attaching extracted requirements, definitions and obligations of contract

     FACT   PLAINTIFF delivered(goods) ON 7054-34-99
     FACT   DEFENDANT paid(0) OF CONTRACT.amount

     CLAIM  breach WHEN obligation(DEFENDANT, "pay") IS NOT satisfied

     PROVE breach:                                                                                                                                                                  
         REQUIRE  PLAINTIFF performed                                                                                                                                               
         REQUIRE  DEFENDANT.paid < CONTRACT.amount                                                                                                                                  
         ASSERT   delay WITHIN reasonable(time)

     IF PROVE(breach):
         AWARD PLAINTIFF (CONTRACT.amount - DEFENDANT.paid) + interest()
     ELSE:
         DISMISS

Then you would run a proof based LLM to generate it into target language and since we already had an example of this from one of the AI labs we know it works. Automatic citations and supporting proof would be automatically populated from reviewed legal -> DSL extracted papers as supporting evidence.

I am sure that many AI labs are working on something similar already and we will see something like that in the near future as proof based llms evolve.

airstrike 4 days ago |

Yes, LLMs are great at search. That's not news.

damnesian 3 days ago |

Does the "outperforming" conclusion incorporate the appropriateness of decisions? Or just if things are technically correct. Without human eyes on cases, things could easily get very off track. AI can do a lot of data wrangling, but there is no conscience.

the_real_cher 3 days ago |

Law and accounting both seem to be the perfect fields to replace with AI.

Just massive data where you either do calculations or interpretation.

You will replace 100 lawyers with AI and have a single lawyer to review what the AI outputs and stamp their name on it for accountability.

elnatro 3 days ago |

When I see news pieces like this I wonder about the failures. Maybe the failure percentage is low but what happens if a bot gives bad counseling? Who is responsible then?

Attorneys will be using LLMs for convenience but they will not disappear, because there needs to be an ultimately human responsible of the decisions.

undefined 4 days ago |

undefined

Esophagus4 4 days ago |

Yeah this could be interesting. A lot of the spotlight has been on “law firm stuff” like demand letters and writing contracts…

But imagine if a dev team didn’t have to go engineer -> product manager -> legal team to get a question answered on local data retention requirements. You could ship that much faster.

motbus3 3 days ago |

As others pointed. It kind implies it surpasses professors, but reading more carefully it seems more like the mythos situation. There was a single professor or test that it surpasses.

Reading it makes me extremely suspicious on how cherry picked this was

francisdavey 3 days ago |

I'm not a law lecturer. I spend most of my time wrangling contracts and advising about data law. But I did a stint of part-time work teaching a masters in law.

My experience then (this was back before "Attention Is All You Need", I hadn't met the output of generative models) was that students tended to produce work that did not have a proper thread of reasoning in it. There was a tendency to repeat things they had read but rehashed in various ways.

Reviewing some of their texts it was clear that much of the writing - by law tutors - was of the same kind. Much was incorrect. The fact that someone at some time had said a particular case was a proposition for something, meant that got repeated from book to book. Many authors simply didn't read their sources or check their references. Students repeated what they had been told incuriously.

Note: this was a graduate level course. Not wet about the ears undergraduates.

The worst material was little potted notes produced for law students. Utterly awful material in most cases.

Anyway, when LLM's became a thing, a lot of what did not feel right about their output and many of their error patterns, reminded me of the experience of teaching masters' students.

One of the saving graces of English court room practice (when I did that sort of thing) was that judges would say to you "where does it say that?" in a case you cited. You had better have them all at your fingertips and know exactly where you had cited. That avoided a lot of hallucination.

Just a random remark which might be of interest.

aitchnyu 3 days ago |

Tangential, is there a "test suite/CI" for AI writing legal documents? Long back in terms of AI progress, a lawyer filed something with hallucinated sources. Do new tools prevent this?

expedition32 3 days ago |

America has the jury system- which means you have to be a good actor.

Making people believe that the 14 year old girl is a slut that was raping your poor client- THAT is lawyering.

RataNova 3 days ago |

I'd read this less as "AI replaces law professors" and more as "AI may be a surprisingly strong first-pass tutor, especially when the student knows enough to question it"

songting591 3 days ago |

The interesting shift isn't whether AI beats law professors on tests â it's what happens to the value chain after that threshold is crossed.

When AI clears the knowledge bar in a domain, the remaining moat becomes trust, accountability, and local regulatory context. That's actually good news for niche SaaS builders targeting specific jurisdictions: the generic AI layer commoditizes, but the "AI + local compliance + human accountability" bundle still has real pricing power.

Curious whether anyone has seen this play out already in contract review or compliance tooling outside the US.

throw7 4 days ago |

Oh, a "Human-Cented" study by AI lover:

Julian Nyarko

    Professor of Law
    Co-Chair Stanford Law AI Initiative
    Senior Fellow, Stanford Institute for Human-Cented AI (HAI)

LOL!

king_zee 4 days ago |

I think there will be a market for firms that aggressively market themselves as non-AI, and then as more people turn towards that human connection we'll go full circle

vessenes 3 days ago |

* Gemini 2.5 Pro (no outside resources), and * NotebookLM (not versioned -- with added legal resources).

NotebookLM was considered slightly better than 2.5 Pro by the evaluators.

dguest 3 days ago |

I'm not a lawyer, I program.

My understanding is that Civil Law (most of the world excluding UK, US, AU) is like a program: you feed it a situation, it outputs a decision, every once in a while you edit it.

Common Law (UK, US) isn't really a program, but you could stretch and say it's a state machine that has been running since the country started. Every interaction sets a new precedent and changes the state. But the programming analogy falls apart because no one in the right mind would design such a program.

LLMs might actually be the best example of such a program though: Common Law is basically one long chat with an LLM, hundreds of years long.

Before LLMs came along, a Common Law system seemed to have a finite time limit before it's co-opted by wealthy people with the resources to read the whole history. Now I think maybe can push it a bit further.

But it's still a terrible program.

cess11 3 days ago |

I skimmed portions of the study but didn't manage to figure out whether this actually measures a preference for confident mediocrity.

Danox 3 days ago |

Sure it does AI multiple IPOs incoming...

tipsytoad 3 days ago |

Curious how they do a “blind” preference test. To any evaluator I’m sure it’s quite clear which answer is AI vs human.

teiferer 3 days ago |

Question is: if a legal question is answered incorrectly by an LLM, who is going to be held responsible?

IFC_LLC 3 days ago |

This is exactly what LLM designed to do. Double up a lot of data and find connections and patterns in it.

So no wonder on this point.

One thing I want to mention: Law != Justice.

So while LLMs are awesome at the law study they will suck at justice. Just because one has to solve very emotional problems with it at times. And LLMs are not that good at finding the correct emotion.

gaiagraphia 4 days ago |

Incredible that the common people will be able to wrestle the right to rule of law away from the bloated legal caste, who have built themselves quite the moat.

The inaccessibility of justice is a huge driver of inequality. Any tools which bridge this gap will help make a more just society.

atleastoptimal 3 days ago |

And this was done with Gemini 2.5

By the time any research study is done on AI is published the models are already 0.5-1 generation ahead. Even this bullish outcome for AI models and their ability to perform useful work does not reflect how good they are now.

t0lo 3 days ago |

Library outperforms student... more news at 9

u1hcw9nx 3 days ago |

After quick look of study details and statistics, it does not look very definitive in one way or another.

I mean, LLM's do OK with tutoring, but it depends more of how unique the questions are, not how difficult they are.

Thaxll 4 days ago |

AI will never convince a jury though.

lp4v4n 3 days ago |

Honestly it's not surprising that AI provided answers that were flagged less often as "pedagogically harmful" if we take in account that somehow LLMs create an "average" of all knowledge they ingested.

Eufrat 3 days ago |

What is the point of this conclusion? That law professors like the tone and verbosity of AI slop? Okay?

NoSalt 3 days ago |

Uh, oh ... AI is in for it now. It has rankled the ire of lawyers. ;-D

tj_hustler_1966 3 days ago |

interesting

34981t 4 days ago |

He is basically an AI professor for law. This study just confirms his existence:

https://juliannyarko.com/

Stanford and its donors of course want to replace anyone but its administrators, so they cheer on such anti-intellectual nonsense.

flanked-evergl 3 days ago |

...

rimliu 3 days ago |

Yes yes, the IPO is near.

infoinlet 3 days ago |

[flagged]

charliewang0322 3 days ago |

[dead]

dfilppi 3 days ago |

[dead]

steele 4 days ago |

[flagged]

jimbokun 4 days ago |

[flagged]

fgh_ask 4 days ago |

[flagged]

aetq51 4 days ago |

[flagged]

bko 4 days ago |

Marc Andreessen argued that we've already reached AGI. He says that the top AI models give better answers than 99% of people he has access to, and he has access to some of the best people in their field.

I'm getting more convinced. I mean, sure it makes dumb mistakes sometimes but its a particular set of self serving mistakes, commenting out tests in order to pass. We obv don't want this behavior but I wouldn't say it's dumb.

It'll be like the Turing test, which we just blew past years ago and no one cared. After all the hand-wringing about sentience and rights of the AI if it passes the Turing test, and now we just have AI bots running 24/7 writing slop.

How does everyone else feel?

t0lo 3 days ago |

More great news from the prestigious university where 40% of students claim they are disabled

https://fortune.com/article/rise-in-elite-students-seeking-a...

and where they wanted to ban words such as "chief", "stupid", "karen" and "American"

https://reason.com/2022/12/21/stanford-elimination-harmful-l...