Friday, June 25, 2021

PDF Formats

When I first heard about PDF Formats I thought it was redundant since PDF is an abbreviation for Portable Document Format, hence, portable document format formats. But, it turns out there are different types (formats) of PDFs, not to be confused with a 2D barcode called PDF417 (portable data format).

You may have noticed, from time to time, that you can't highlight and copy text from a PDF; or you can't search for text within a PDF. That's because it's a different PDF format than the type of PDF where you can copy and search for text.

1. PDF (FTG)

PDF (FTG) is a PDF with Full Text and Graphics. This is the best kind of PDF since you can highlight, copy, and search for text without any errors. If you created the PDF from the original text source, i.e. by using macOS's print to PDF feature, then you'll get a PDF with full text and graphics.

2. PDF (I)

PDF (I) is a PDF with the entire page stored as an Image (no text). It is basically a PDF version of a bitmap (i.e. JPG). This is the least useful kind of PDF because you can neither highlight and copy text, nor can you search for text. This type of PDF is generally created when you scan a document. The scanner treats the page as nothing more than an image.

3. PDF (I + HT)

PDF (I + HT) is a PDF with an Image plus Hidden Text and it's a nice workaround to a PDF (I). On the surface, the PDF is nothing more than an image, but, behind the scenes, OCR (optical character recognition) technology is used to read the text of the image. Hence the hidden text. This enables a user to see the original scanned document with the ability to highlight, copy, and search for text.

The PDF was originally an Adobe, patented technology. However it's become an ISO 32000 standard and anyone may create applications that can read and write PDF files, royalty-free.

