12 – PDF Processing

We recommend working on a computer with a fast processor and solid state drive (ssd), instead of the common hard disk drive, when editing multiple PDFs. PDFs typically require a lot of processing power and a fast scratch-disk. A regular hard drive is often too slow for efficient workflows.

Reducing file size of PDF

Only use this tool for access copies.

  1. Open Adobe Acrobat Pro/DC
  2. Select File, Save as Other, and Reduced Size PDF
  3. Important: In order to actually reduce the file size considerably, under “Make compatible with” select “Acrobat 10.0 and later”. If this option isn’t checked, the file size will not be reduced much compared to the original.
  4. You can also run this as a batch on multiple files by selecting “Apply to Multiple Files”.
  5. Once the process is done (best to find something else to do in the meantime) the file should have been reduced by at least 50-60%. Sometimes even more depending on the content of the PDF.

Redacting Signatures from documents (written instructions for Adobe Acrobat Pro, video applies to Adobe Acrobat DC)

  1. Open Adobe Acrobat.
  2. Select “Advanced” menu tab, then “Redaction”, then “Show Redaction Toolbar”.
  3. After toolbar appears, select “Mark for Redaction”. If popup window appears, click “OK”.
  4. Click and drag a shape along the area that you wish to redact. Double-click on the selected area to bring up a sticky note. Type any notes you wish to add in the box, then close it. Repeat for any additional areas you wish to redact.
  5. When you are done, click “Apply Redactions” on the Redaction Toolbar.

Adding Pages to a PDF

Combining files into a PDF then OCR and Saving as PDF/A

Acrobat DC will sometimes return this notice “Dimensions of this page are out-of-range. page content might be truncated” and stop batch processing. You can resume after confirming the notice and clicking start again. This usually happens when there is a landscape page.

  1. Open Adobe Pro and select Combine Files into PDF.
  2. Drag and drop all of the new JPEGs into the popup dialog box and click Combine.
  3. Once the PDF has been created, go to Document -> OCR Text Recognition (some versions require going to Tools -> Text Recognition) and select Recognize Text Using OCR. Click OK through the popup windows. Let it run.
  4. Save the PDF.
  5. Only do this part if your supervisor asks you to.
    1. Click Save As and select PDF/A.
      1. If conversion fails:
        1. Option 1 – Go to Advanced -> Preflight -> go to PDF/A compliance and then select Convert to PDF/A-2b
        2. Option 2 – Click Save As ->  Select PDF/A ->  Click Settings – > Select Save As PDF/A-2b -> Check the box “Create PDF/A-2b according to the following PDF/A-2b conversion profile” -> Click OK -> Save your file
    2. To run a batch, see video below or:
      1. Go to Tools, Action Wizard, and on the right hand side you’ll see “Archive Documents”.
      2. Select Archive Documents and then click Add Files. Select all of your files and then right click “Add Document Description” and click Skip this Step.
      3. Then change Save As to Save
      4. Click Start and find something else to do

Setting up action to make accessible PDF

  1. Go to the Action Wizard in Adobe Acrobat Pro DC and click manage actions
  2. Select the “Make Accessible” action and click Copy. Give the action a new name if desired and click OK.
  3. Select the new copy and click Edit.
  4. On the right side under “Action steps to show:”, click Add Document Description and then select the trashcan icon to delete the action. Using Exiftool to batch add document descriptions will be easier.
  5. Click Recognize Text using OCR and deselect Prompt User. Click Specific Settings and change downsample to 300 ppi.
  6. Deselect prompt users for the remaining actions.
  7. Delete both the Set Alternate Text and Run Accessibility Check actions.
  8. On the left side under “Choose Tools to add:”, expand Save & Export and select Save. Then click the +-> symbol in between the two sides to add the action after the last action.
  9. Test the full action on a PDF to make sure it never prompts the user.
  10. This action can then be exported and imported to other lab computers.

Using Exiftool to batch edit metadata for PDF files

This method uses the command line in Windows to batch embed titles to PDFs.

  1. In a spreadsheet, create five columns to the following data:
    • Column A: call \ExifTool.exePath -title=”
    • Column B: Title of document
    • Column C (quotation with space after): “
    • Column D: \pathtofile (note: your path to file may need to be surrounded in quotes if it has spaces)
      • You will only need the document file name and extension if this batch (.bat) file will be located in the same folder as your documents.
    • Column E: =concatenate(A2,B2,C2,D2)

      Exiftool example

      Spreadsheet example

  2. Copy Column E and then paste as values only.
    1. eg, call C:\exiftool.exe -title=”My Document” mydocument.pdf
  3. Copy the values only Column E into a text file and then save with the extension .bat.
  4. Double-click the .bat file to run, or you may need to right-click and Run as Administrator.
  5. Exiftool will create a new PDF and rename the original pdf with .pdf_original
  6. This same method can be used to add/change -filename, -keyword, and -author.

Using ExifTool to count number of pages in multiple PDFs recursively

If running this command against files on a server, I found outputting the text file locally instead of letting it output on the server folder you ran the script in will make it go much faster. So instead of just “> pages.txt”, I did “> C:\Users\MyAccount\Documents\pages.txt”. Do not include quotes in command.

  • “Path to ExifTool.exe” -T -r -filename -PageCount -s3 -ext pdf . > pages.txt