Find_SSNs searches for U.S. social security and credit card numbers. It may help individuals and organizations find sensitive numbers in files on computers.
Find_SSNs is not meant as a silver bullet. It does not secure the files it discovers. It may produce false positives and false negatives. It may miss some files altogether. Use it as part of a larger plan to identify and protect sensitive data stored on computers. Do not rely solely on it. To be 100% certain that sensitive data does not exist in files, humans should manually examine the files. Preventing sensitive data disclosures is a process. Organizations should have ongoing, recurring efforts in place to locate and secure sensitive data before a break-in occurs. You should also note that Find_SSNs is a tool. Like any tool, it can be used for good or bad purposes. For example, it can just as easily be used by 'bad guys' to find your sensitive data before you do.
Please remember to securely delete all of the Find_SSNs report files after you are finished using the program. The report files are road maps to potentially sensitive information. Do not store these as plain text.
Tested on Windows 7,Vista, XP, 2003 and 2000
Tested on Solaris, Mac OSX, GNU\Linux, Windows Vista, BSDs
A source code folder with a "RunMe" script that addresses users who are not comfortable using the Mac terminal.
Tested on Windows Vista, XP, 2003 and 2000
Find_SSNs can search *most files for sensitive numbers. Searchable file formats include Microsoft Word, Excel and Access as well as file formats that store data in plain text. The OASIS Open Document XML format (Open Office 2) and the Microsoft Office 2007 Open XML format are also supported. Adobe PDF files are supported, but PDF search is not enabled by default. See the notes in the source code about enabling it. The program searches for sensitive numbers such as these:
Find_SSNs is meant to be used by anyone, not just IT Professionals. On Windows, no software needs to be installed prior to running the program. Just download the Windows executable and run it. It's also designed to be as accurate as possible when searching files so as to reduce the number of false positives. However, there will always be false positives as many times valid sensitive numbers are often used in other contexts. For example, 123246789 is a valid SSN and because it's in this html page, Find_SSNs would identify this web page as a suspect file. So, always verify the results.
Many sensitive data discovery programs, that search for social security numbers, simply discard illegal area numbers (the first three digits). In our experience, applying this method to 1 million randomly generated nine digit numbers leaves roughly 720,000 suspect numbers. Unlike these programs, Find_SSNs uses data from the Social Security Administration to validate area number and group number relationships. This validation reduces the pool of suspect numbers to about 445,000.
Going from a large problem with an unknown scope (the locations of the suspect files that contain sensitive data) to a smaller problem with a known scope is very good, but not ideal for end-users. In our opinion, no other numerical validation methods can be applied to today's U.S. social security number format that will further reduce false positives. Context determination, that attempts to guess whether or not the suspect number is being used in the context of a SSN (i.e. finding surnames in addition to numbers, etc.) or logic that attempts to grade the context, may further reduce false positives, but will increase the potential for false negatives as well.
Credit card numbers are a different story. Out of 1 million randomly generated 15 and 16 digits numbers (potential AmEx, Visa, MasterCard, Discover and JCB) only approximately 100,000 will Luhn validate. Find_SSNs applies these three additional validations:
This reduces the 100,000 Luhn validated numbers to approximately 25,000 numbers. Applying Bank Identifier Numbers (BIN) or Issuer Identifier Numbers (IIN) validation would further reduce this... although this may not be entirely possible as the American Banking Association (ABA) is rather protective of BINs. However partial BIN list may be found online.
In our opinion, outside of these three additional validation steps there are no other validation methods to further reduce false positives when searching for credit card numbers in files. In the case of credit card numbers, we had a problem with an unknown scope that Find_SSNs reduces to a much more manageable problem.
Find_SSNs runs on most any computer platform. Mac OS X, Windows Vista, Windows XP, Windows 2000, RedHat Linux, Ubuntu Linux, Solaris, FreeBSD, and many others.
* PDF (Adobe Portable Document Format) files are not searched by default... users may enable this feature. Encrypted files cannot be searched. Nested zip archives are not searched. By default, files larger than 100 Megabytes are not searched... users may adjust this limit. System files and multimedia files are not searched. Read the source code for a complete list of files that are not searched.