减少记录排名的严重问题

减少记录排名的严重问题
Hard problems that reduce to document ranking

原始链接: https://noperator.dev/posts/document-ranking-for-complex-problems/

LLMS在listWise文档排名中表现出色，令人惊讶的是，使用此技术可以重新构架和解决许多复杂问题。我已经成功地将其应用于N日漏洞检测，证明可以根据其与安全咨询（QUERY）的相关性将补丁分散视为排名差异（文档）。在RVASEC '24和DistrictCon '25上，我展示了这种方法，最终在GPT-4O MINI中找到了仅30美分的1600+更改功能的特定漏洞。我的命令行工具Raink进一步验证了这一概念。文档排名超出了脆弱性检测；它可以查明目标模糊目标或优先考虑Web应用程序注入点。未来的改进包括在迭代中分析排名最高的结果，并为漏洞生成可验证的概念验证利用。受模糊成功的启发，我认为这种方法值得进一步探索，与类似的情绪相呼应：“ LLMS不合理的成功的原因”。

该讨论集中在计算机科学中的“减少”一词，尤其是在问题转化和复杂性的背景下。原始作者使用“减少”来描述将一组硬问题映射到文档排名问题上，从而利用LLMS在抽样和排名中的优势来找到潜在的解决方案。一位评论者认为，通常会以另一种方式使用“减少”：将一个充分理解的问题（a）简化为正在研究的问题（b）以证明B的复杂性。这是因为传统上“ A还原为B”是指求解B可以用于解决A。但是，原始作者澄清说，目标不是复杂性分析，而是利用已知的文档排名解决方案来解决漏洞发现等新颖问题。在这种情况下，该方法将新的Nday问题减少到已知的DoCrank中以解决它。

（评论） 2025-02-22

Show HN：Badseek - 如何回门大型语言模型 2025-02-22

（评论） 2025-02-23

SWE基础数据集的一些关键问题 2025-02-23

原文

There are two claims I’d like to make:

LLMs can be used effectively^{for listwise document ranking.}
Some complex problems can (surprisingly) be solved by transforming them into document ranking problems.

I’ve primarily explored both of these claims in the context of using patch diffing to locate N-day vulnerabilities—a sufficiently domain-specific problem that can be solved using general purpose language models as comparators in document ranking algorithms. I demonstrated at RVAsec ‘24 that listwise document ranking can be used to locate the specific function in a patch diff that actually fixes a vulnerability described by a security advisory, and later wrote on the Bishop Fox blog in greater defense of listwise ranking by publishing a command-line tool implementation (raink) to prove the idea.

The key insight is that instead of treating patch diffing as a complex problem requiring specialized security engineering knowledge, you can reframe it as ranking diffs (documents) by their relevance to a security advisory (query), applying proven document ranking techniques from information retrieval.

Using this technique, I proved at DistrictCon ‘25 that GPT-4o mini could locate a fixed vulnerability in a haystack of over 1600 changed (and stripped!) functions in a patch—costing only 5 minutes and 30 cents to do so^{.

Document ranking can be applied to other offensive security problems, like identifying candidate functions for fuzzing targets (in addition to using them for auto-generating harnesses), or prioritizing potential injection points in a web application for deeper testing. A few potentially powerful improvements to this technique:

Analyze the top N ranked results, and then apply the same ranking algorithm to the analyses.
Make the ranked results verifiable; e.g., for N-day vulnerabilities, use an LLM to generate an automatically testable proof-of-concept exploit^.

Following Thomas Dullien’s FUZZING ‘24 keynote “Reasons for the Unreasonable Success of Fuzzing”, I’m inclined to give a similar talk—“Reasons for the Unreasonable Success of LLMs.”}

减少记录排名的严重问题 Hard problems that reduce to document ranking

减少记录排名的严重问题
Hard problems that reduce to document ranking