（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=41046956

在引入 OCSP Watch（一种旨在监控证书颁发机构 (CA) 响应的工具）后，发现 CA 对证书透明度 (CT) 日志中发现的证书频繁响应“未知”或“未经授权”，表明旧的和废弃的证书。这引起了人们的担忧，因为这些丢失或过期的证书可能在没有适当监控的情况下仍然处于活动状态，从而带来安全风险。传统的证书吊销列表 (CRL) 无法识别此类错误，因为它们主要记录吊销的证书，而不是为所有已颁发的证书提供实时状态更新。为了解决此问题，建议消除在证书本身中包含在线证书状态协议 (OCSP) URL 的必要性，而是强制要求在证书颁发机构数据库 (CCADB) 中公开每个 CA 的 OCSP URL。通过这样做，我们将消除隐私问题并降低与维护 OCSP 响应程序相关的运营成本，确保证书状态的持续透明度。但是，在处理证书使用中的时间戳时存在一些限制，特别是对于吊销时间很重要的联合签名或电子签名场景。根据现行标准，证书一旦颁发，就不能因其颁发前发生的事件而更改其撤销。这阻碍了检查先前颁发的证书的有效性，并且需要在其整个生命周期内维护准确的记录。撤销中间 CA 时会出现另一个挑战，因为在撤销之前颁发的证书无效，导致设备难以获取这些受影响证书的更新撤销详细信息。在网页浏览之外的许多场景中，缺乏适当的吊销机制会导致证书状态检查未经验证，从而导致性能下降或完全放弃吊销检查。尽管流行的浏览器通常会下载并刷新 CRL 列表，但鲜为人知的软件包可能缺乏类似的功能，需要下载大型且繁琐的 CRL 文件，甚至无法完全验证证书状态。像 CRLite 这样更加用户友好的解决方案可以解决状态信息检索中的错误，应该得到更广泛的采用，以确保跨不同平台的可靠认证流程。

After I created OCSP Watch[1], I regularly detected CAs returning an OCSP response of unknown or unauthorized for certificates found in CT logs, indicating that they had basically forgotten that they had issued a certificate. I find that rather troubling. Indeed, OCSP Watch is currently reporting several forgotten NETLOCK certificates. (The certificates from other CAs are recently issued and will probably have OCSP responses provisioned in the near future.)

CRLs can't be used to detect this, because they only list revoked certificates rather than providing a definitive status for every issued certificate.

I do wish the root programs had merely removed the requirement to include an OCSP URL in certificates, but required OCSP URLs for every issuer to be disclosed in the CCADB. That would have solved the privacy problems and made OCSP responders much cheaper to operate, while continuing to provide transparency into the status of certificates.

[1] https://sslmate.com/labs/ocsp_watch/

OCSP systems at scale are complex. At the core there is an on-demand or pre-signed OCSP response generation system, then there is usually an internal caching layer (redis or similar), then there is an external CDN layer. Just because the outer layer appears not to know about a certificate (a bug, to be sure), doesn't necessarily mean that other CA systems don't know about it.

Requiring CAs to run OCSP makes running a CA more complex and expensive, which has considerable downsides like it does for any system, particularly as it is a zero sum game. Every dollar or hour of engineering time that Let's Encrypt spends on OCSP (and it is a considerable amount) is a dollar or hour not being spent on other parts of the CA. From the outside that may not be very visible because it's not obvious what isn't getting done instead but there is a real and considerable cost that is not worth it IMO.

They are so complex, and in practice unreliable, that my employer runs a caching proxy for our (non-browser) users (they mostly don’t want fail open).

IMO it is unfixable and should go

CAs are money printing machines. If they cannot even track the state of their delegated trust, then why should they be trusted themselves?! Trust is their core value proposition, the only reason they exist!

The person you're replying to (Josh Aas) is the head of the largest CA in the world, which has never charged anyone for a certificate and has never made a profit. That CA, at least, isn't a money-printing machine!

Good point. Though being a non-profit doesn't remove the fact that the value proposition of CAs is trust.

Perhaps there is some compromise like they just have to submit all issuances certs to a 3rd party, and maintain the last few months worth of issuances.

Do you disagree that OCSP would be significantly less costly and complex if the responder URL were not included in certificates, freeing the responders from having to serve hundreds of millions of clients?

Yes, I disagree. Best case scenario I think it would just allow us to get rid of the CDN layer. We'd still have to build and manage the rest with utmost care.

Even that really depends on what the "SLA" expectation is. How many QPS would we be required to support before we're allowed to start dropping queries? If there's any possibility of a burst beyond what we can handle locally we'd keep the CDN, so maybe the CDN bill would be significantly smaller on account of lower volume but all the complexity and care of operation would still be there.

BTW what is the need for a CA to remember issued certificates?

Certificates should work as long as they are not expired, not on a CRL, and while the same applies to their.CA's certificate used to sign it.

The only good use for remembering issued certificates that I see is that a CA could detect a compromise if it's handed a certificate it does not remember issuing. It looks far-fetched to me, but I know nothing about CA operation.

When a CA has an incident, such as learning that one of their domain validation methods is flawed, they need to be able revoke all certificates impacted by the incident. Without a reliable database of certificates, there's no guarantee they'll find all of the impacted certificates. OCSP could fail closed in this situation. CRLs always fail open.

Aside: it was in fact envisioned that OCSP could be used to detect CA compromises, and the BRs used to say "The CA SHOULD monitor the OCSP responder for requests for 'unused' serial numbers as part of its security response procedures." I'm not sure how many CAs actually implemented that, and in any case I don't think it ever detected any compromises.

This central authority for certificate validation seems like extra infrastructure without which the internet fails.

Once upon a time, internet communication was between two computers. Now there is a third computer to verify that the communication between two computers is legitimate.

Is there another communication design that works without the need for a third computer?

Edit: I don't think so. Identity validation of a public computer should be done by another well-trusted computer.

Between two distrusting parties, a third mediating party is needed. But that party does not need to be centralized. Indeed, there are many TLS certificate registrars.

So many reasons, but the simplest is that those lookups will constantly fail because middleboxes makes variegated inane decisions about 53/udp and 53/tcp, which means you need to have a fallback mechanism, which will inevitably be exploitable. DANE is a dead letter.

CAs need to know every certificate they've issued so that

a) if they receive a request to revoke that certificate, they can actually do so; and

b) if they need to revoke all of their certs (e.g. due to discovering a validation process failure) they don't miss any.

Case (b) is problematic anyway in the general case, since for timestamped signed data the revocation time is relevant, because signatures provably created before the revocation time remain valid, and at least for OCSP a published revocation time is not allowed to precede the publication of earlier “good” revocation data for the same certificate. (For CRL this constraint is merely recommended, not strictly mandated.) This means you can’t retroactively revoke certificates for times in the past, and hence your original validation procedure better be valid.

In the same context, revoking the intermediate CA is bad in case some issued certificates were valid for an initial period, because then clients are unable to obtain fresh revocation information about issued certificates that would indicate the respective time of revocation (because the intermediate CA has to remain valid in order to be able to validate the signature on the OCSP response or CRL).

This is mostly not a concern in the context of TLS validation, because that usually relates only to the present time, not to times in the past, but it is relevant for code signing or other electronic signature use cases.

> b) if they need to revoke all of their certs (e.g. due to discovering a validation process failure) they don't miss any.

Dumb q, but why not just revoke/invalidate the signing certificate used to sign the certificates?

The CA did not forget the certificate if it's in their CT logs. The whole point of CT (unlike OCSP) is to make it impossible for a CA to forget what they've signed. If they forget to put a certificate in their CT logs, then the cert isn't valid. Anyone can download and retain copies of the logs, because they're signed and hash treed. Removing a cert after-the-fact is about as disruptive as editing Git history on Linus Torvalds' tree, meaning it will be incredibly obvious that the CA is trying to lie about what they signed.

The reason why OCSP forgets about certificates is that it was never intended to make CAs permanently remember every cert they ever signed. It was intended to replace CRLs with something less resource-intensive, except they screwed up the design, so it's now just a data siphon that provides little protection.

CT doesn't tell you if a certificate is revoked, so the CA still has to operate a revocation system. That revocation system can forget about certificates.

Why not simply add the Must Staple restriction unconditionally to all certificates (aka "status_request")?

That will eliminate privacy concerns -- no compliant TLS implementation should fetch a OCSP ticket given a stapled response -- while still allowing for broad non-browser support.

cough nginx. Nginx would start up and serve TLS on must-staple certs .. before doing the staple setup. ie: any client that validated that a must-staple cert had a stapled ocsp ticket would fail for the first few queries after nginx startup.

I don't know if they've fixed it yet. I doubt it though - they were pretty aggressive in their assertion that violating must-staple wasn't a concern.

Yeah, I looked into nginx's stapling implementation almost a decade ago. I fixed some simpler bugs (I submitted a patch which was mostly rewritten and then merged) but fixing the problem you mention would have required major re-architecting. I doubt it has changed.

In addition to listed below performance is another concern - OCSP stapling make TLS handshake large and slower. While it removes need for the client to query OCSP server itself in practice it such queries happen infrequently, asynchronously and in some may not happen at all. In theory a clients could include Certificate Status Request into the Client Hello only when a local cache expires but IIRC a few years ago Firefox did request OCSP on every TLS handshake making every handshake large if stapling is enabled on the server.

Issuing a certificate has significantly more overhead (e.g. Certificate Transparency logs) than signing a statement that an existing certificate is still valid.

Does highlighting the domain name not give the boost in security? I don't mind highlighting. And it's possible to get very aggressive with highlighting.

At the very least it should have a [...] and not completely hide the fact that there is a URL.

You talk about it like there’s a vibrant community of websites out there. There’s streaming sites and social media, and then a bunch of apps that are delivered through the web. All of them are sophisticated web operations.

Let's Encrypt's goal has always been to support the vibrant community of websites, extending TLS to the long tail.

The handful of streaming and social media sites can always go to a big commercial CA and spend a few hundred bucks a year on certs that do whatever they need.

In practice, nothing breaks if it can't reach OCSP in a lot of cases. But as with most PKI systems, it causes problems sometimes, so we have to do stupid things like add firewall rules to allow secure systems to reach OCSP endpoints to avoid random PKI-dependent stuff coming crashing down.

"As soon as the Microsoft Root Program also makes OCSP optional, which we are optimistic will happen within the next six to twelve months, Let’s Encrypt intends to announce a specific and rapid timeline for shutting down our OCSP services. We hope to serve our last OCSP response between three and six months after that announcement. The best way to stay apprised of updates on these plans is to subscribe to our API Announcements category on Discourse."

Interesting to see Microsoft dragging here.

Mozilla and Chrome were requiring it via the incorporation of the Baseline Requirements into their policies. The removal of the requirement from the BRs won't appear in the release notes of their policies.

Certificate management is an interesting problem along the intersection of human behavior and computer science that feels similar to BGP. In theory, it's simple, but when met with reality things get messy really really fast.

For the major browsers, this probably makes little difference, but for anything else, this will most likely result in not verifying the revocation status of certificates anymore or making things slower.

As far as I know, most browser vendors already download the CRLs, and then update the browsers based on what they downloaded. For instance firefox seems to be using CRLite. There is a lack of support for something like that in the non-major browsers and non-browsers. The alternative they have is to download the CRL instead of the OCSP reply, which is larger, probably making things slower. Or they could just not check the status, which is most likely what will happen.

CRLite changes the failure mode of the status check, it no longer just ignores error in downloading the status information.

We need better support for something like CRLite.

Can someone ELI5 what does this mean for people using LetsEncrypt today with servers like Nginx or Caddy ? Do we need to make any changes to adjust ?

For regular webserver users, accessed by web browsers, no changes are needed.

Note this is still a long ways out. At this time, only people who are writing code against OCSP need to be aware of the future roadmap.

You were previously using OCSP stapling for your server cert. The CRLs containing your server cert have nothing to do with your server. The server config you have found is for nginx to verify client certs.

To verify client and server certificates. For certificate-based auth this is quite important, but it's also nice to, e.g., know that your server cert is not revoked.

It's the client that is supposed to fetch the CRL. The server doesn't care unless it does client certificate authentication (which is incredibly rare due to poor UX.)

A server (nginx / httpd) might very well make outbound HTTPS calls and want to verify the origin's certificate to the fullest extent possible. This is known as a reverse proxy. However, many -- certainly not all -- reverse proxy configurations use origin servers that reside on the same exact network, and therefore don't need TLS connections (or use them with `SSLProxyVerify none` for e.g. self-signed certs to simplify things [0]), but I digress.

[0] https://httpd.apache.org/docs/current/mod/mod_ssl.html#sslpr...

Your client does the checking of the server cert, no need for any configuration server side.

If you use certs for client auth it’s unlikely it’s used on the web and only in a company setting where you control the PCs, where you could just use something more suited for that case.

Does ARI also return a suggested window of "right now" if the queried cert is manually revoked in the past? I was actually wondering this a few months ago when I implemented ARI in my ACME client - I also have a pending task to implement revocation checking, and if the ARI check makes that redundant it would be great. (Both the RFC and the LE blog only talk about querying ARI at some time before the CA decides to revoke the cert. My question is about querying ARI at some time after the user has revoked the cert.)

If your certificate has been revoked and you don't know that (e.g. from a very alarming e-mail), something has seriously gone wrong. In the general case, it'd be yourself requesting the revocation to begin with. But even if it isn't, I'm pretty sure the CA is supposed to tell you expeditiously.

Ideally your server handles this automatically, which is why Let’s Encrypt is working on extending ACME to help with this case.

Some good implementations of ocsp stapling can already automatically get a new certificate if they receive a “revoked” ocsp response.

Requiring humans to read their email and be looped in is work I want to avoid needing at all.

You’d need a monitoring solution for that, your web server won’t just send you a mail when it’s cert got revoked.

The client does that for you, by checking back at the CA. No need for any configuration server side.

The largest Let's Encrypt CRL right now is 254 KB. Most are smaller. We might want to partition into smaller shards again to hit a bit smaller size than that in the future.

Shorter certificate lifetimes will also reduce CRL sizes.

A lot of traffic comes from browsers, or TLS stacks integrated with their host operating system, which we expect will use compressed push-based methods like Mozilla's CRLite to receive more efficient data structures as well.

One thing this announcement allows us to do is motivate us to start working on making CRL mechanisms more efficient.

Indeed! I forgot that the BRs mandate the CRLDP extension if the certificate lacks OCSP AIA.

So all (non-short-lived) certificates will continue to have a standard revocation checking mechanism encoded in them.

Everything I described can be done programmatically. I've written the code to do it.

But anyways, as mcpherrinm reminded me, certificates will still have the CRL Distribution Point extension so you can forget what I said about the CCADB and just do what the RFCs say.

I'm curious how much of a factor this is. How often do certificates get revoked before expiration? I would think the only reason to revoke a certificate is if it was compromised.

Currently only 20,830,034 revoked certificates per my daily CRL download, versus ~1 billion active certificates. But the number of revoked certs could balloon if there's another Heartbleed-like event, or a mass misissuance.

I don't think this comment deserves to be downvoted. For the specific constraints imposed by embedded environments, a bloom filter seems perfectly cromulent here. The odds of a false positive scale according to the amount of memory that you can afford, and even in the event of a false positive it would fail closed, which is a better security posture than not checking at all.

Disappointing to hear considering the limitations of CRLs - is there any intention to go forward with OCSP stapling or is that completely abandoned at this point?

My understanding is that stapling is the victim of the usual incompetence and laziness that infects a lot of systems where if one in a billion fail closed that would be considered a disaster but one in ten fail open is considered fine. You can't achieve meaningful security this way.

The browser vendors have learned that you have to do it yourself or it won't be done well enough to be useful. So you pull every CRL, do a bunch of compression or other tricks, then give your users that data and now they have working revocation.

When Bob's CA and Kebab Shop breaks their revocation stack, instead of dozens of poor individual users or web site owners confused and calling Bob's outsourced call centre in Pakistan with no sign of a fix, now a Google account exec asks Bob's CTO whether they forgot to say they were getting out of the CA business...

I agree this isn't a desirable outcome, but it might be all we have.

> The browser vendors have learned that you have to do it yourself

Cool. We already got the internet ossified on TCP + UDP, other L4 protocols just get stuck in firewalls and whatnot. Now we're progressing in ossification of HTTP.

To be clear: this OCSP decision seems to be driven directly and only by web/HTTP consumers. Anything else is just not considered.

It is called the Web PKI after all. If somebody else actually wants to do all the hard work they're welcome, but my impression is that there's only enthusiasm for bitching and whining which won't get the work done.

x.509 is a certificate format used by many PKIs.

Let's Encrypt is a CA that is part of the WebPKI.

We follow standards set by the CA/B forum, undergo WebTrust Audits, and are accepted into the root programs run by the browser vendors (Primarily: Apple, Mozilla, Microsoft, and Chrome). That is the WebPKI.

CRLs don't scale, that's their problem and they take too long to update.

Why isn't there a standard binary format for CRLs that is a cuckoo filter or a similar data structure? That way you don't have to worry about a CRL ballooning to gigabytes and you can expect clients to fetch the latest binary blob frequently.

As a side project I also developed something like CRLite, but approximately 40% smaller: https://docs.rs/compressed_map/latest/compressed_map/

It's probably not state-of-the-art anymore, but it gets fairly close to the Shannon entropy of the {revoked, not revoked} distribution, supports non-Boolean types (like revocation reason), and is basically fast enough on both compression and queries.

The main missing functionality is incremental update. To implement daily updates you'd just compress "which certs were revoked today", and to check a 50-day-old cert you'd do 50 queries. (The queries are really fast tho, so that's probably fine.)

I… er… what?

First of all, privacy was one of the points of OCSP stapling.

Second, this breaks all non-http applications in the sense that they could previously work through OCSP stapling which would be communicated in-line. CRLs need to be fetched by the client, generally over HTTP.

Third, most non-browser TLS clients simply do not fetch CRLs, the implementation hurdle was too high.

… I'm left seriously befuddled by this decision by Let's Encrypt [edit: or rather the CA/B forum] :(

OCSP pretty much has to fail open, and stapling doesn't fix this.

If you're in a position to MITM a client using an exfiltrated certificate that no longer passes OSCP, you're probably also in a position to block the request to the OSCP server. And you're not going to staple while you MITM.

As a client, you can't really tell the difference between "this certificate is valid but not stapled and the OSCP server is down" and "this certificate isn't stapled because I'm being MITM'd, and the OSCP server is blocked".

For those who could successfully staple, really short-lived certificates might be a suitable answer -- that's effectively what OSCP gave you, only without actually ensuring that the certificate would cleanly expire.

It would be nice if one didn't need to be a TLS expert to understand the post -- particularly since the whole point of Let's Encrypt was to democratize TLS access.

I have no idea if this means my setup will break even after consulting the docs of my ACME client.

Can I still use ACME Tiny[1] with nginx? Any reason to think that will break? How can I tell if I'm using OCSP or CRL?

Totally incomprehensible blog post.

[1] https://github.com/diafygi/acme-tiny

For regular webserver users, accessed by web browsers, no changes are needed.

Note this is probably at least a year, if not more, away.

I'm sorry this post wasn't accessible enough, and we'll have more communications in the coming years as this gets closer.

(I work at Let's Encrypt and proofread this post before publishing)

Nothing will change for you, and nothing will break. The point of this post is to give a maximum-lead-time heads-up to the folks who _do_ need to care (the folks writing revocation-checking code in clients) so that later, more specific announcements don't come as a surprise.

（评论） (comments)

（评论）
(comments)