![]() |
|
![]() |
| yeah I agree, and while everyone is suggesting tools which are really good but I designed mine to get rid of the flags and CLI interface. Good for tech people that keeps remembering flags, I'm not :( |
![]() |
| Do be careful with iText products—they license their stuff under AGPL but their interpretation of AGPL is pretty extreme. If you talk to their team they'll tell you that ~everything your company makes should be AGPL-licensed if you use iText anywhere [0]:
> You may not deploy it on a network without disclosing the full source code of your own applications under the AGPL license. You must distribute all source code, including your own product and web-based applications. They also have this delightful nagware encoded as a base64 string that spits this out in your logs [1]: > You are using iText under the AGPL. > If this is your intention, you have published your own source code as AGPL software too. Please let us know where to find your source code by sending a mail to [email protected] We'd be honored to add it to our list of AGPL projects built on top of iText and we'll explain how to remove this message from your error logs. > If this wasn't your intention, you are probably using iText in a non-free environment. In this case, please contact us by filling out this form: http://itextpdf.com/sales If you are a customer, we'll explain how to install your license key to avoid this message. If you're not a customer, we'll explain the benefits of becoming a customer. For using RUPS on a local computer you're probably safe, but I avoid the company because everything about their approach to the AGPL suggests that they chose it as a marketing technique for their paid products (with an extremely strong desire that it never be used commercially without pay), not out of a serious commitment to free software. [0] https://itextpdf.com/how-buy/AGPLv3-license [1] https://github.com/itext/itext-dotnet/blob/develop/itext/ite... |
![]() |
| Yeah, I knew I was in for some onoz when I saw "compiled to WebAssembly and rendered with WebGL". In their defense, it's stunning that any text operations work at all
Also, "There is no DOM, HTML, JS or CSS" is some uh-huh given the considerable amount of silliness involved in view-source:https://www.egui.rs/ |
![]() |
| argh, that's too bad, feel free to open an issue, what's happening in the console? It's panicking, isn't it? Feel free to contact me via email if you prefer |
![]() |
| I recently wanted to edit out a huge background image repeating on almost every page of a PDF and found out there's no obvious way to do it.
Would appreciate any tool suggestions! |
![]() |
| This is probably a simple find-and-replace task, so I wouldn't bother with proper PDF parsing or libraries. I would:
1. Use pdftk to uncompress it: pdftk input.pdf output uncompressed.pdf uncompress 2. Look at the PDF code (it's text based) to find the image insertion code. 3. Replace all instances of the image insertion code with strings of spaces the same length (there's a table of object byte offsets at the end that you don't want to mess up). 4. Use pdftk to compress it again: pdftk edited.pdf output output.pdf compress I have a script that does this to remove pen strokes of particular colours so I can e.g. strip out marking rubric on test solutions written on a tablet. Get the PDF 1.7 spec from https://pdfa.org/resource/pdf-specification-archive/. You're looking for the "Do" operator invoking a named image object defined elsewhere with "/Subtype /Image". See section 4.8, particularly the example on p343. Or, if it's badly done, it might instead be an inline image using the "BI" operator (a bit later in the same section). |
![]() |
| what's a good tool to check if a pdf is not tampered with eg. as a tool to check before loading a pdf from a public bucket to your backend application? |
![]() |
| They trust their AWS EC2 instance doing the processing but not their AWS S3 bucket doing the storing? I don't really understand the threat model here.
And what's "public bucket"? |
By way of an example. Here's an object that represents a Page. You can see the dimensions in the MediaBox. The contents themselves are contained at object "9 0 obj" ("9 0 R" is the pointer to it):
Meanwhile "9 0 obj" has the drawing instructions. They seem a little weird at first glance but you see the values ".23999999 0 0 -.23999999 0 792" each get pushed on the stack and then "cm" pops them to interpret them as the transformation matrix. The depth and detail of all of the different possible things that can be represented in a PDF is insane. But understanding the structure above is all you need to begin your journey!EDIT The rest of your journey is contained in this epic document: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandard...