展示HN:Aroma:每个TCP代理都可以通过RTT指纹识别。
Show HN: Aroma: Every TCP Proxy Is Detectable with RTT Fingerprinting

原始链接: https://github.com/Sakura-sx/Aroma

## Aroma:代理检测摘要 Aroma是一个概念验证系统,旨在通过分析TCP往返时间(RTT)测量来检测TCP代理。它**不**使用IP情报,并且目前**无法**检测VPN或非TCP代理,尽管其底层技术*可能*应用于其他类型的代理。 Aroma基于最小TCP RTT与平滑TCP RTT(tcpi_min_rtt/tcpi_rtt)的比率计算得分,数据通过Fastly自定义VCL和Linux内核获得。目前,低于0.1的得分会将连接标记为可能的TCP代理。正常连接通常得分在1-0.7之间,而不稳定的连接可能落在0.7-0.3之间。 有趣的是,Aroma *已经*检测到Cloudflare WARP,它作为UDP到TCP代理运行。该系统利用了代理会增加RTT的原理,可以通过网络时序分析来检测。 **重要提示:**这是一个演示项目,**尚未达到生产就绪状态**。代码可用于实验和贡献,可以使用Fastly部署或适应其他托管环境。演示地址为[https://aroma.global.ssl.fastly.net/](https://aroma.global.ssl.fastly.net/),得分获取地址为[https://aroma.global.ssl.fastly.net/score](https://aroma.global.ssl.fastly.net/score)。

## Aroma:通过RTT指纹识别TCP代理 Sakura-sx 提出了一种名为“Aroma”的方法,通过分析最小往返时间(RTT)与平滑RTT之间的关系来检测TCP代理。该系统使用Fastly的基础设施和Linux内核数据进行演示,基于`tcpi_min_rtt / tcpi_rtt`计算得分。得分低于0.1表明可能存在代理。 讨论强调了该技术的局限性:它可能难以处理像Tor这样复杂的设置,并且依赖于对代理的准确了解。应对措施包括人为调整延迟,但这会引入可检测的异常。 虽然并非万无一失,但Aroma为识别代理提供了一个有价值的信号,特别是那些用于网络爬取的代理。它成功检测住宅代理的次数约为50%。作者承认检测与规避之间持续的“猫捉老鼠”游戏,以及可能出现更复杂的代理服务。 许多评论员指出现有的服务,如Layer3Intel和sshuttle,它们采用了类似的原理。该项目旨在证明检测是可行的,为大型CDN构建更强大的系统铺平道路。
相关文章

原文

Important

  • Aroma does not use any kind of IP intelligence information, although IP intelligence information can be used to complement Aroma.
  • Currently the score needed for detection is very low to avoid false positives, so even if it doesn't detect the proxy it may give a low score (0.3-0.1 is very low but does not flag as proxy).
  • The current code is not ready for production, it's just to prove a point.
  • Aroma does not currently detect VPNs or any kind of proxy which isn't a TCP Proxy, Aroma may detect VPNs that use TCP Proxying and the techniques used in Aroma are not limited to TCP and can be applied to other kinds of proxies, but for simplicity and technical reasons (there are variables of the connection that Fastly does not expose to me).

A demo of Aroma detecting Cloudflare WARP (higher score is better):

Aroma WARP demo

Note

I have to admit I was a bit surprised that Aroma was detecting WARP, since I thought it was a VPN, but apparently it acts like a UDP => TCP proxy. If Aroma doesn't detect your VPN, that's normal and means your VPN is doing Layer 3 proxying. If your VPN is detected it's doing Layer 4 proxying (some privacy VPNs do this on web ports for privacy reasons).

If you want to check out Aroma for yourself, you can go to:

https://aroma.global.ssl.fastly.net/.

And you should see an "allowed" page if you are not using a TCP Proxy and a block page if you are using a proxy.

If you want to get your score you can go to https://aroma.global.ssl.fastly.net/score.

This is done by measuring the minimum TCP RTT (client.socket.tcpi_min_rtt) seen and the smoothed TCP RTT (client.socket.tcpi_rtt). I am getting this data by using Fastly Custom VCL, they get this data from the Linux kernel (struct tcp_info -> tcpi_min_rtt and tcpi_rtt). I am using Fastly for the Demo since they have PoPs all around the world and they expose TCP socket data to me.

The score is calculated by doing tcpi_min_rtt/tcpi_rtt. It's simple but it's what worked best for this with the data Fastly gives me. Based on my testing, 1-0.7 is normal, 0.7-0.3 is normal if the connection is somewhat unstable (WiFi, mobile data, satellite...), 0.3-0.1 is low and may be a proxy, anything lower than 0.1 is flagged as TCP proxy by the current code.

Warning

The current code is not production-ready. First of all, the things on the oldcode/ folder are old Python code that I was using to do some testing, it's not complete code but feel free to make a PR if you want to make the server complete, I don't mind :)

If you want to use Fastly, you can upload aroma.vcl to Custom VCL on Fastly.

If you want to modify anything, you can edit aroma.vcl.tpl and run build_vcl.py to generate a new aroma.vcl file. Then you can upload the file following the previous instructions.

If you want to host this somewhere else, this README file explains how this works. Using that information you can make the code for it to run wherever you want.

According to Special Relativity, information cannot travel faster than the speed of light.

Therefore, if the round trip time (RTT) is 4ms, it's physically impossible for them to be farther than 2 light milliseconds away, which is approximately 600 kilometers.

We can do some assumptions, for example, light travels approximately 33% slower in fiber optic cables, so the distance is likely less than 400 kilometers away. Depending on their hop count, we can further narrow this down, we can also use information about the route, but as a proof of concept I'll use a simpler method.

Network timing is way more than just RTT

There is no single RTT, at the very least you have Layer 3 RTT, Layer 4 RTT (TCP), layer 7 RTT (HTTP), and if it's encrypted, you probably also have TLS RTT.

There is also an initial RTT, a client hello delay, and many other timings. Wikipedia has a great article on the topic.

What can we do with this?

Well, in a normal connection, all the RTT measurements should be somewhat similar once we account for protocol overhead and we use smoothed values to account for jitter.

But proxies will make it so the RTT measurements over the protocol of the proxy go up. For example, if the proxy is a TCP proxy, the HTTP RTT and the TLS RTT will be higher than the TCP RTT and layer 3 RTT.

Continuing the example, let's say the proxy user is in Australia, and the proxy is in the United States, the proxy connects to a server, ideally we would have as many servers as possible around the world, so let's say the proxy connects to a server very near to it and it has a 10ms RTT (Based on my testing with Fastly, the TCP RTT for a request to the closest Fastly edge server is usually less than 1ms, and from my computer I get about 10ms RTT, and I don't have a particularly good connection).

Because the proxy user is in Australia, let's say the proxy user has 160ms RTT to the proxy.

In this example, the proxy would see 10ms RTT for L3 and L4 (TCP RTT), but 170ms RTT for TLS and HTTP, the HTTP RTT can be measured by seeing how long it takes to follow a redirect or fetch some resource like CSS or JavaScript.

For the situations where we can see the proxy RTT to be less than 1ms. The client using the proxy is limited to very near proxies to avoid meaningfully increasing the proxy RTT.

A simple algorithm for calculating the score would be (proxy RTT)/(non-proxy RTT), which is what I will be doing in this proof of concept.

Any sufficiently large internet service can collect all the network timing data they can, get a baseline, compare it to the data known proxies generate, and with this be able to create scoring algorithms that measure the proxy likelyhood.

联系我们 contact @ memedata.com