How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation

https://simonwillison.net/2025/May/24/sean-heelan/ • May 28, 2025 17:23

Extracto

Sean Heelan: > The vulnerability [o3] found is CVE-2025-37899 (fix [here](https://github.com/torvalds/linux/commit/2fc9feff45d92a92cd5f96487655d5be23fb7e2b)), a use-after-free in the handler for the SMB 'logoff' command. Understanding the vulnerability requires reasoning about concurrent connections to …

Resumen

Resumen Principal

Un avance significativo en la investigación de seguridad ha sido logrado con el uso de o3, una herramienta de modelo de lenguaje grande (LLM), que ha identificado CVE-2025-37899, una vulnerabilidad zero-day de use-after-free en la implementación SMB del kernel de Linux. Esta vulnerabilidad crítica reside en el manejador del comando 'logoff' de SMB y requiere una comprensión profunda de las conexiones concurrentes al servidor y cómo estas comparten objetos. La capacidad de o3 para razonar sobre el código y comprender interacciones complejas, como la liberación prematura de un objeto no referenciado mientras aún es accesible por otro hilo, marca un hito. Esta es la primera discusión pública de una vulnerabilidad de esta naturaleza descubierta por un LLM, lo que indica un salto adelante en la habilidad de estas tecnologías para la seguridad. El descubrimiento subraya cómo los LLMs no buscan reemplazar a los investigadores expertos, sino potenciar su eficiencia y efectividad, permitiéndoles abordar problemas complejos representados en menos de 10.000 líneas de código.

Elementos Clave

Descubrimiento de Vulnerabilidad Zero-Day: El LLM o3 logró identificar CVE-2025-37899, una vulnerabilidad de use-after-free en el

Contenido

How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation (via) Sean Heelan:

The vulnerability [o3] found is CVE-2025-37899 (fix here), a use-after-free in the handler for the SMB 'logoff' command. Understanding the vulnerability requires reasoning about concurrent connections to the server, and how they may share various objects in specific circumstances. o3 was able to comprehend this and spot a location where a particular object that is not referenced counted is freed while still being accessible by another thread. As far as I'm aware, this is the first public discussion of a vulnerability of that nature being found by a LLM.

Before I get into the technical details, the main takeaway from this post is this: with o3 LLMs have made a leap forward in their ability to reason about code, and if you work in vulnerability research you should start paying close attention. If you're an expert-level vulnerability researcher or exploit developer the machines aren't about to replace you. In fact, it is quite the opposite: they are now at a stage where they can make you significantly more efficient and effective. If you have a problem that can be represented in fewer than 10k lines of code there is a reasonable chance o3 can either solve it, or help you solve it.

Sean used my LLM tool to help find the bug! He ran it against the prompts he shared in this GitHub repo using the following command:

llm --sf system_prompt_uafs.prompt              \ 
    -f session_setup_code.prompt                \          
    -f ksmbd_explainer.prompt                   \
    -f session_setup_context_explainer.prompt   \
    -f audit_request.prompt

Sean ran the same prompt 100 times, so I'm glad he was using the new, more efficient fragments mechanism.

o3 found his first, known vulnerability 8/100 times - but found the brand new one in just 1 out of the 100 runs it performed with a larger context.

I thoroughly enjoyed this snippet which perfectly captures how I feel when I'm iterating on prompts myself:

In fact my entire system prompt is speculative in that I haven’t ran a sufficient number of evaluations to determine if it helps or hinders, so consider it equivalent to me saying a prayer, rather than anything resembling science or engineering.

Sean's conclusion with respect to the utility of these models for security research:

If we were to never progress beyond what o3 can do right now, it would still make sense for everyone working in VR [Vulnerability Research] to figure out what parts of their work-flow will benefit from it, and to build the tooling to wire it in. Of course, part of that wiring will be figuring out how to deal with the the signal to noise ratio of ~1:50 in this case, but that’s something we are already making progress at.