减少记录排名的严重问题
Hard problems that reduce to document ranking

原始链接: https://noperator.dev/posts/document-ranking-for-complex-problems/

LLMS在listWise文档排名中表现出色,令人惊讶的是,使用此技术可以重新构架和解决许多复杂问题。我已经成功地将其应用于N日漏洞检测,证明可以根据其与安全咨询(QUERY)的相关性将补丁分散视为排名差异(文档)。 在RVASEC '24和DistrictCon '25上,我展示了这种方法,最终在GPT-4O MINI中找到了仅30美分的1600+更改功能的特定漏洞。我的命令行工具Raink进一步验证了这一概念。 文档排名超出了脆弱性检测;它可以查明目标模糊目标或优先考虑Web应用程序注入点。未来的改进包括在迭代中分析排名最高的结果,并为漏洞生成可验证的概念验证利用。受模糊成功的启发,我认为这种方法值得进一步探索,与类似的情绪相呼应:“ LLMS不合理的成功的原因”。


原文

There are two claims I’d like to make:

  1. LLMs can be used effectively for listwise document ranking.
  2. Some complex problems can (surprisingly) be solved by transforming them into document ranking problems.

I’ve primarily explored both of these claims in the context of using patch diffing to locate N-day vulnerabilities—a sufficiently domain-specific problem that can be solved using general purpose language models as comparators in document ranking algorithms. I demonstrated at RVAsec ‘24 that listwise document ranking can be used to locate the specific function in a patch diff that actually fixes a vulnerability described by a security advisory, and later wrote on the Bishop Fox blog in greater defense of listwise ranking by publishing a command-line tool implementation (raink) to prove the idea.

The key insight is that instead of treating patch diffing as a complex problem requiring specialized security engineering knowledge, you can reframe it as ranking diffs (documents) by their relevance to a security advisory (query), applying proven document ranking techniques from information retrieval.


Using this technique, I proved at DistrictCon ‘25 that GPT-4o mini could locate a fixed vulnerability in a haystack of over 1600 changed (and stripped!) functions in a patch—costing only 5 minutes and 30 cents to do so.

Document ranking can be applied to other offensive security problems, like identifying candidate functions for fuzzing targets (in addition to using them for auto-generating harnesses), or prioritizing potential injection points in a web application for deeper testing. A few potentially powerful improvements to this technique:

  • Analyze the top N ranked results, and then apply the same ranking algorithm to the analyses.
  • Make the ranked results verifiable; e.g., for N-day vulnerabilities, use an LLM to generate an automatically testable proof-of-concept exploit.

Following Thomas Dullien’s FUZZING ‘24 keynote “Reasons for the Unreasonable Success of Fuzzing”, I’m inclined to give a similar talk—“Reasons for the Unreasonable Success of LLMs.”

联系我们 contact @ memedata.com