March 25, 2026

Add Us On GoogleAdd SciAm

Is AI solving proofs—or just dividing our opinions?

A new challenge reveals how well AI can tackle true math problems

By Kendra Pierre-Louis, Joseph Howlett, Fonda Mwangi & Alex Sugiura

SOPA Images/Getty Images

SUBSCRIBE TO Science Quickly

Apple | Spotify | YouTube | RSS

Love math? Sign up for our weekly newsletter Proof PositiveEnter your email

I agree my information will be processed in accordance with the Scientific American and Springer Nature Limited Privacy Policy. We leverage third party services to both verify and deliver email. By providing your email address, you also consent to having the email address shared with third parties for those purposes.

Kendra Pierre-Louis: For Scientific American’s Science Quickly, I’m Kendra Pierre-Louis, in for Rachel Feltman.

In 1997, Deep Blue, a supercomputer built by IBM, did the unexpected: it defeated chess giant Garry Kasparov at his own game, leading to a flurry of headlines about whether Deep Blue was truly intelligent and if computers could now outthink humans. The answer, at least then, was mostly no.

But it’s now 2026, and we have a growing number of generative AI models that are once again making us wonder, “Can machines outthink us?” To dig into this question, a group of researchers aren’t turning to chess this time—they’re looking to math.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

To learn more about that, I talked to Joe Howlett, a staff reporter here at SciAm covering math. Thanks for joining us today, Joe.

Joe Howlett: Thank you for having me.

Pierre-Louis: So you wrote a piece that’s talking about the challenges of AI and math. Before we kinda get into the meat and potatoes of that piece, I have a—maybe a more basic question for you.

Howlett: Yeah.

Pierre-Louis: For those of us who maybe peaked with high-school algebra, when you’re talking about AI and math problems, what are the kind of math problems we’re really talking about?

Howlett: That’s actually a lot of what this story’s about, is that the kind of questions that mathematicians ask and spend their time thinking about kind of don’t really sound like or have anything in common with the problems that we work on for homework in math class.

Pierre-Louis: Mm-hmm.

Howlett: If you’ve recently taken a math class, you’re used to problems that have answers, right?

Pierre-Louis: Mm-hmm.

Howlett: And the answer is, like, a number ...

Pierre-Louis: Yep.

Howlett: Or something. And you hand in your homework, and the teacher can check that number [Laughs], if it’s the right number or the wrong number, and they give you a grade.

But what research mathematicians are doing is trying to prove that statements are either true or false about the mathematical universe. So what does that mean? Like, you know about triangles and squares and basic shapes, but there’s ...

Pierre-Louis: I did graduate from kindergarten, yes. [Laughs.]

Howlett: [Laughs.] That’s right, exactly. That’s about as far as I made it, too.

There’s way more complicated shapes that exist in many dimensions and have weird curvatures that you can’t even picture in your mind. But mathematicians are able to say things about them, right? Using equations and using proofs, they’re able to learn about these objects that we can’t actually see or picture.

Pierre-Louis: So now that we kind of know what math is, in [one of your pieces] you note that LLMs have had some mathematical wins, like Google Gemini Deep Think achieved a gold-level score on the International Mathematical Olympiad and that AI has solved multiple “Erd&odblac;s problems.” Why isn’t that enough to show AI’s math prowess?

Howlett: Yeah, I mean, the thing about most of these so-called benchmarks, is what they call ’em—for a lot of reasons AI companies have fixated on mathematics as, like, the next thing to prove ...

Pierre-Louis: Mm-hmm.

Howlett: That LLMs can think, or to take a step towards intelligence. But most of those examples, like you said, they have more in common with the kind of test questions and homework problems that we were just talking about, not really looking like ...

Pierre-Louis: Mm-hmm.

Howlett: Research math, right, which is more about proving statements about the world and exploring that world, posing questions that are interesting.

So in a way all of those accomplishments are very impressive. [Laughs.] It’s crazy that a computer can win gold on the math IMO ...

Pierre-Louis: Mm-hmm.

Howlett: But it doesn’t say much about whether and to what extent

Is AI solving proofs&mdash;or just dividing our opinions?

Is AI solving proofs—or just dividing our opinions?