Find_SSNs - PDF File Support

Find_SSNs supports searching PDF files for sensitive data, but this feature is disabled by default.


A word of caution

Programmatic reading of text from PDF files is difficult, and at times depending on how the PDF file was created, it is impossible. Optical Character Recognition (OCR) may be employed, but it is not a 100% solution and may be error prone. To be 100% certain that sensitive data does not exist in files, humans should manually examine the files.


To enable PDF support in Find_SSNs

  1. Download and install xpdf on the computers that will execute Find_SSNs.
  2. Test that the program pdftotext is in the path and working correctly.
  3. Uncomment the PDF searching portion of the Find_SSNs source code.
  4. For Windows binaries, rebuild the Find_SSNs source code after enabling PDF support.

© 2009 Virginia Polytechnic Institute and State University