谷歌称“并非安全漏洞”，迅速修复，未署名。

谷歌称“并非安全漏洞”，迅速修复，未署名。
Google says "not a security vulnerability", quickly fixes without attribution

原始链接: https://groups.google.com/g/certificate-transparency/c/u8SsXgSFbz4/m/CThyzj-QBAAJ

Sunlight作为CompactLog的替代方案，运行成本高昂得多，写入量增加了22倍，读取量增加了500倍。这源于其“独立读写路径”的架构，缺乏应用级缓存，导致每个请求都直接访问存储，从而增加了遭受拒绝服务攻击的可能性。此外，Sunlight的安全性能也存在缺陷，其MMD值大于0，而CompactLog的MMD值为0，确保监控器能够立即看到证书，这对检测错误颁发至关重要。 Sunlight声称其简单易用，但这与其复杂的设置过程相矛盾，包括手动数据库初始化、难以理解的配置，以及一个关键的安全漏洞：加密密钥可以使用易于预测的种子生成。其架构在低延迟操作方面也存在性能下降的问题。演讲者认为，推广Sunlight更偏向于为云提供商带来收益，而非考虑小型CT日志运营商的实际可操作性。他们认为我们应该优先考虑高效、安全且经济可行的解决方案，例如CompactLog，而不是依赖无限基础设施的蛮力工程方法。

A Hacker News thread discusses a potential vulnerability in Sunlight, a PKI software project related to certificate transparency (CT). The original poster (Eikon) claims Google dismissed the issue as "not a security vulnerability" but fixed it without attribution. The issue revolves around Sunlight generating private keys from user-provided seeds without proper validation, potentially leading to weak or corrupted keys if the seed is compromised. Other commenters argue that Sunlight isn't a Google project, and the fix was a simple improvement already suggested by another user (agwa) independent of Eikon's report. They downplay the severity, seeing it as a usability issue rather than a critical vulnerability. Eikon counters that because Sunlight is related to the trusted public key infrastructure, the lack of seed validation is dangerous and could undermine the entire system. The discussion escalates with accusations of dishonesty and misrepresentation. Some feel Google should acknowledge Eikon's contribution, while others criticize Eikon for creating unnecessary drama.

原文

Following up with concrete operational cost data you suggested was important. I ran both implementations ingesting 1M certificates and performing monitor-style read operations:

Write Costs (1M certificates):

CompactLog: 12,847 storage PUTs

Sunlight: 287,364 storage PUTs

22.4x more expensive writes

Read Costs (Full tree sync, 1000 iterations):

CompactLog: 82,025 GETs total (mostly cache hits after first sync)

Sunlight: 41,030,000 GETs (41,030 per sync × 1000)

500x more expensive reads

This exposes fundamental architectural issues with "independent read/write paths." The system lacks application-level caching, meaning every monitor request hits storage directly. This design is vulnerable to denial-of-funds attacks where attackers can directly drive up S3 costs. Additionally, it requires an expensive CDN, which ironically couples the paths that are claimed to be independent. Finally, this architecture cannot achieve 0 MMD (Maximum Merge Delay) because independent paths inherently require synchronization delay between them.

Most critically, CompactLog's 0 MMD strengthens CT's security model. When SCTs are issued, certificates are immediately visible to monitors - no window for undetected misissuance. The "independent paths" architecture makes this impossible by design.

I sympathize with the investment in classic static CT - significant effort has gone into this approach by talented engineers. However, when architectural limitations force defenders to propose "security by @" (rate limiting based on user agent strings) as a serious solution, I believe we're witnessing sunk cost fallacy in action.

It's worth noting that the operators most vocally advocating for static CT appear to have infrastructure sponsorship arrangements that shield them from these costs. When storage and bandwidth are free, a any difference in operational costs becomes irrelevant. But this creates a distorted view of architectural viability - what works with sponsored infrastructure doesn't translate to sustainable operations for the broader ecosystem.

More concerning is the narrative that "scale requires direct object storage serving" - a claim that these benchmarks definitively disprove. When we accept that direct S3 serving is "the only way to scale," we're essentially mandating architectural decisions that maximize cloud provider revenue.

This raises a fundamental question: why are we advocating for this model? The static CT API is objectively more complex than RFC 6962, its "pure" deployment model is economically unviable without sponsorship, and it weakens security guarantees (MMD > 0). Yet there's a push to deprecate RFC 6962 - a working, proven standard - in favor of an architecture that's worse on every measurable dimension except ideological purity.

What's particularly troubling is that the "direct storage serving" approach is essentially brute force engineering - throwing unlimited infrastructure at a problem instead of solving it properly. Caching, request coalescing, and memory management aren't complex optimizations; they're basic engineering practices. When we champion architectures that prohibit these fundamental techniques, we're not promoting simplicity - we're mandating inefficiency.

The insidious part is that this design has tremendous surface appeal. "CT logs served directly from S3" sounds innovative and elegant. Object storage is familiar, reliable, and scalable - who wouldn't support that? Most people hear the pitch and think "brilliant!" without digging deeper into the implications.

It's only when you run the numbers or try to operate it without sponsorship that the reality hits: it's a design that sounds great in conference talks but fails basic operational requirements.

What's exhausting is watching the goalposts constantly move. When I show performance benchmarks, suddenly performance doesn't matter. When I demonstrate cost efficiency, the topic shifts to "separation of concerns." When I point out the need for caching, we're told CDNs solve everything. When CDNs are shown to be expensive Band-Aids, the argument becomes about implementation simplicity. I've even been told that fewer lines of code is a key metric - as if code golf determines operational viability. This isn't technical discourse; it's ideological defense through ever-shifting arguments.

Speaking of simplicity - setting up Sunlight for these benchmarks was remarkably complex. Manual seed generation, key management, undocumented YAML configurations, multiple executables with unclear relationships, and manual SQLite database initialization.

Yes, SQLite - the "static" CT implementation that supposedly doesn't need databases requires manually initializing one.

The irony of requiring database setup for an architecture championed for eliminating databases wasn't lost on me. How do we even backup this database? Can I safely delete it? Is it critical for operation? Apparently yes - the server breaks without it: "checkpoint missing from database but present in object storage". So much for database-free architecture. Where's the operating manual explaining any of this?

I had to manually parse the Go entry point code just to understand the correct startup sequence. When your "simple" system requires reading source code to figure out basic operations, you've failed at simplicity.

Even after getting it running, I didn't feel confident about the soundness of what I'd deployed. Was my seed secure enough? What entropy was expected? When I tried an empty file, it at least failed. Progress! But it happily accepted 32 spaces as a seed. Yes, I generated cryptographic keys using echo " " > seed. No warnings about low entropy, no validation, just silent acceptance of catastrophically insecure configuration. I didn't want this to work - I wanted it to fail informatively.

Actually, Sunlight requires you to provide a seed file with at least 32 bytes - but apparently any 32 bytes will do, including repeated spaces. This does indeed use whatever garbage you provide as the seed for key generation. A CT log with predictable keys undermines the entire Certificate Transparency ecosystem. This isn't just bad - it's "shut down everything and rotate all keys immediately" bad.

But hey, at least there's fsync, right?

The irony is breathtaking - being lectured about "robustness" and the critical importance of fsync while Sunlight silently accepts spaces as a cryptographic seed. Apparently, durably persisting compromised keys to disk is more important than ensuring those keys aren't trivially predictable. This perfectly encapsulates the misplaced priorities: obsessing over filesystem semantics while ignoring fundamental cryptographic security.

And why does it even require operators to provide a seed? Why not just generate a secure key pair automatically like every other cryptographic system built in the last decade? You know, for simplicity? Instead, we get the worst of both worlds - manual seed management with no validation.

When I attempted to run Sunlight at 50ms batching to match CompactLog's configuration, it essentially became stuck in a continuous checkpoint write loop - constantly PUT'ing new checkpoint objects to storage. With Sunlight's README recommending object versioning be enabled, these constant PUTs would generate thousands of object versions per hour, each incurring storage costs.

More concerningly, Sunlight's performance severely degraded after ingesting just a few hundred thousand certificates - batch processing times increased to 600ms (local minio was backed by a NVMe array), making 50ms batching physically impossible. This means Sunlight cannot safely operate at lower latencies even if operators wanted to provide better service to CAs. The 1-second batching isn't a conservative choice - it's an architectural limitation.

When I needed to run these benchmarks again on fresh infrastructure, my first thought was genuine dread: "Oh no, I didn't keep the old VM." That's not the reaction you want operators to have about your "simple" system. When redeployment feels like punishment, something has gone fundamentally wrong with your definition of simplicity.

If simplicity for operators was truly the goal, documentation and ease of deployment would be core to the project, not an afterthought. Instead, we see the opposite - a system that requires deep expertise just to start. This isn't simplicity; it's complexity with better marketing. With virtually no documentation, getting it running felt more like reverse engineering than deployment.

I suppose there's opportunity here - with enough expertise in these "simple" systems, one could build quite a consulting practice helping organizations navigate the complexity. The gap between "served from S3" marketing and operational reality certainly creates demand for specialists. But I'd rather build systems that operators can actually understand and run themselves.

I'm increasingly concerned that the CT ecosystem is being shaped by operators with nearly unlimited infrastructure budgets (whether through direct ownership or sponsorship), while the voices of smaller operators and monitors - who actually detect misissuance - are barely heard. When architectural decisions make logs unaffordable to operate independently, we're not improving transparency; we're consolidating control. This is becoming less about certificate transparency and more about cloud providers monetizing a mandatory security protocol.

The few independent operators still running logs deserve recognition for swimming against this tide. But we shouldn't design protocols that require either corporate sponsorship or six-figure monthly cloud bills to participate meaningfully in web security.

Happy to share the full benchmark methodology and scripts for reproduction.

谷歌称“并非安全漏洞”，迅速修复，未署名。 Google says "not a security vulnerability", quickly fixes without attribution

谷歌称“并非安全漏洞”，迅速修复，未署名。
Google says "not a security vulnerability", quickly fixes without attribution