cross-posted from: https://lemmy.sdf.org/post/24645301
They emailed me a PDF. It opened fine with evince and looked like a simple doc at first. Then I clicked on a field in the form. Strangely, instead of simply populating the field with my text, a PDF note window popped up so my text entry went into a PDF note, which many viewers present as a sticky note icon.
If I were to fax this PDF, the PDF comments would just get lost. So to fill out the form I fed it to LaTeX and used the overpic pkg to write text wherever I choose. LaTeX rejected the file… could not handle this PDF. Then I used the
file
command to see what I am dealing with:
$ file signature_page.pdf signature_page.pdf: Java serialization data, version 5
WTF is that? I know PDF supports JavaScript (shitty indeed). Is that what this is? “Java” is not JavaScript, so I’m baffled. Why is java in a PDF? (edit: explainer on java serialization, and some analysis)
My workaround was to use evince to print the PDF to PDF (using a PDF-building printer driver or whatever evince uses), then feed that into LaTeX. That worked.
My question is, how common is this? Is it going to become a mechanism to embed a tracking pixel like corporate assholes do with HTML email?
I probably need to change my habits. I know PDF docs can serve as carriers of copious malware anyway. Some people go to the extreme of creating a one-time use virtual machine with PDF viewer which then prints a PDF to a PDF before destroying the VM which is assumed to be compromised.
My temptation is to take a less tedious approach. E.g. something like:
$ firejail --net=none evince untrusted.pdf
I should be able to improve on that by doing something non-interactive. My first guess:
$ firejail --net=none gs -sDEVICE=pdfwrite -q -dFIXEDMEDIA -dSCALE=1 -o is_this_output_safe.pdf -- /usr/share/ghostscript/*/lib/viewpbm.ps untrusted_input.pdf
output:
Error: /invalidfileaccess in --file-- Operand stack: (untrusted_input.pdf) (r) Execution stack: %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1 3 %oparray_pop 1833 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- %array_continue --nostringval-- Dictionary stack: --dict:769/1123(ro)(G)-- --dict:0/20(G)-- --dict:87/200(L)-- --dict:0/20(L)-- Current allocation mode is local Last OS error: Permission denied Current file position is 10479 GPL Ghostscript 10.00.0: Unrecoverable error, exit code 1
What’s my problem? Better ideas? I would love it if attempts to reach the cloud could be trapped and recorded to a log file in the course of neutering the PDF.
(note: I also wonder what happens when Firefox opens this PDF considering Mozilla is happy to blindly execute whatever code it receives no matter the context.)
Oracle Documaker PDF Driver
PDF version: 1.3
Not sure how to do that but I did just try
pdfimages -all
which was not useful since it’s a vector PDF.pdfdetach -list
shows 0 attachments. It just occurred to me thatpdftocairo
could be useful as far as a CLI way to neuter the doc and make it useable, but that’s a kind of a lossy meat-grinder option that doesn’t help with analysis.Thanks for the tip. I might have to look into that. No readme… I guess this is a /use the source, Luke/ scenario. (edit: found this).
I appreciate all the tips. I might be tempted to dig into some of those options.