Find_SSNs - Search files for U.S.
Social Security or Credit Card Numbers
| See how Find_SSNs validates sensitive numbers
Caution!
Find_SSNs is not a silver bullet against identity theft. It helps
individuals and organizations find sensitive numbers in files on
computers. It does not secure the files it discovers. It may
produce false positives and false negatives. It may miss some files
altogether. Use it as part of a larger plan to identify and protect
sensitive data stored on computers. Do not rely solely on it. To be
100% certain that sensitive data does not exist in files, humans
should manually examine the files. Preventing sensitive data
disclosures is a process. Organizations should have ongoing,
recurring efforts in place to locate and secure sensitive data
before a break-in occurs. You should also note that Find_SSNs is a
tool. Like any tool, it can be used for good or bad purposes. For
example, it can just as easily be used by 'bad guys' to find your
sensitive data before you do.
Please remember to securely delete all of the Find_SSNs report
files after you are finished using the program. The report files
are road maps to potentially sensitive information. Do not store
these as plain text. Do not email them. If you want to store or
email the reports, encrypt them first.
Downloads - Updated March 18th,
2009
Windows Executable - Tested on
Windows Vista, XP, 2003 and 2000 - Digital Signature
Source Code - Tested on Solaris, Mac
OSX, GNU\Linux, Windows Vista, BSDs - Digital Signature
Our
PGP Key | md5s
Other Stuff
More Program Information
Find_SSNs can search *most files for sensitive numbers.
Searchable file formats include Microsoft Word, Excel and Access as
well as file formats that store data in plain text. The OASIS Open
Document XML format (Open Office 2) and the Microsoft Office 2007
Open XML format are also supported. Adobe PDF files are supported,
but PDF search is not enabled by default. See the notes in the
source code about enabling it. The program searches for sensitive
numbers such as these:
- 9 digit U.S. Social Security Numbers
- 13 digit Visa
- 14 digit Diners Club (International and Carte Blanche)
- 15 digit American Express
- 15 digit JCB
- 16 digit VISA
- 16 digit Mastercard
- 16 digit Discover Card
- 16 digit JCB
- 16 digit Diners Club (U.S. and Canada)
Find_SSNs is meant to be used by anyone, not just IT Professionals.
On Windows, no software needs to be installed prior to running the
program. Just download the Windows executable and run it. It's also
designed to be as accurate as possible when searching files so as
to reduce the number of false positives. However, there will always
be false positives as many times valid sensitive numbers are often
used in other contexts. For example, 123246789 is a valid SSN and
because it's in this html page, Find_SSNs would identify this web
page as a suspect file. So, always verify the results.
How is Find_SSNs Different from Other Sensitive Data Discovery
Tools
Many sensitive data discovery programs, that search for social
security numbers, simply discard illegal area numbers (the first
three digits). In our experience, applying this method to 1 million
randomly generated nine digit numbers leaves roughly 720,000
suspect numbers. Unlike these programs, Find_SSNs uses data from
the Social Security
Administration to validate area number and group number
relationships. This validation reduces the pool of suspect numbers
to about 445,000.
Going from a large problem with an unknown scope (the locations of
the suspect files that contain sensitive data) to a smaller problem
with a known scope is very good, but not ideal for end-users. In
our opinion, no other numerical validation methods can be applied
to today's U.S. social security number format that will further
reduce false positives. Context determination, that attempts to
guess whether or not the suspect number is being used in the
context of a SSN (i.e. finding surnames in addition to numbers,
etc.) or logic that attempts to grade the context, may further
reduce false positives, but will increase the potential for false
negatives as well.
Credit card numbers are a different story. Out of 1 million
randomly generated 15 and 16 digits numbers (potential AmEx, Visa,
MasterCard, Discover and JCB) only approximately 100,000 will
Luhn
validate. Find_SSNs applies these three additional validations:
- Card Prefix (ISO 7812)
- Card Length
- Card Type
This reduces the 100,000 Luhn validated numbers to approximately
25,000 numbers. Applying Bank Identifier Numbers (BIN) or Issuer
Identifier Numbers (IIN) validation would further reduce this...
although this may not be entirely possible as the American Banking
Association (ABA) is rather protective of BINs. However partial BIN
list may be found online.
In our opinion, outside of these three additional validation steps
there are no other validation methods to further reduce false
positives when searching for credit card numbers in files. In the
case of credit card numbers, we had a problem with an unknown scope
that Find_SSNs reduces to a much more manageable problem.
Find_SSNs runs on most any computer platform. Mac OS X, Windows
Vista, Windows XP,
Windows
2000, RedHat Linux,
Ubuntu Linux, Solaris,
FreeBSD, and many others.
* PDF (Adobe Portable Document Format) files are not searched
by default... users may enable this feature. Encrypted files cannot
be searched. Nested zip archives are not searched. By default,
files larger than 100 Megabytes are not searched... users may
adjust this limit. System files and multimedia files are not
searched. Read the source code for a complete list of files that
are not searched.
©2009 Virginia Polytechnic Institute and
State University