The most popular database is Excel. Knowledge workers use Excel spreadsheets as a simple, non-relational database more often than as an accounting worksheet. That's OK, because it works. The big boys (software engineers) use relational databases, which is like tying together each worksheet in Excel; the tying together of database tables is the relational part. And, about a decade ago, the cool kids started using NoSQL databases which operate more or less like a hash table for blazing fast performance sans the relationships.
There are a few issues with using a person's SSN as a primary key. One issue is that it's a long number, much longer than a 1 – 5 digit integer typically used as a primary key, so it takes a database engine longer to compare numbers.
More importantly, what happens if a person's SSN changes? SSNs can change in domestic violence cases, identity theft, when two people are mistakenly issued the same SSN, etc. You may think, since these are such rare cases, that it's not a big deal for a database; but software systems have to be developed to handle every possible case to work effectively. Good software should reflect a true reality, not a theoretical reality. So, to solve this issue, most databases issue a sequence of integers for each row. In the case where databases are in a cluster, each database might tag a digit on the end representing which database server issued a specific primary key. In other words, instead of 1, 2, 3; database server #4 in a cluster might issue 14, 24, 34, etc. as primary keys.
Database Best PracticesHaving done a bit of database work at Apple, I've learned some best practices that are worth sharing.
1. Primary keys: Never build intelligence into a primary key – a primary key is simply an artifact of the database and it should represent nothing more than a way to access a row in a database table (i.e., don't use SSN as a primary key). Just like a software developer doesn't care what GUID is assigned to an object in memory, neither should a database designer care about the contents of a primary key. Making your primary key a simple integer is highly efficient since a computer can quickly find and compare numbers (in the case of integers) much faster than a string of nine characters (in the case of SSNs).
2. Table Names: Database table names should be singular (Employee, Order, Transaction, Statistic etc); they should be named for what each row in the table represents, not the entire collection. The reason is that, typically, there's a one-to-one mapping between a row in a database table and an object used in code. For example, in code, an instance variable referencing an Employee object should represent a single employee from the database while an an instance variable that's plural, such as Employees, should represent a collection of objects such as an array or dictionary.
3. Lookup Tables: A lookup table is a simple static database table that's used to populate a list or collection. For example, a list of countries that your company ships to. Perhaps, your company only ships to the U.S. and Canada. Later, when you start shipping to more countries, how do you update the pop-up list of countries on your website or mobile app? With a lookup table, you simply add another row to the table with the new country that you ship to. Updating the database table is easier than changing your code, recompiling, and deploying. Typically, a look up table also has a column representing a sort order. This is done so the list can be displayed in a specific order with, say, the U.S. listed first, instead of Afghanistan, since most of your customers are located in America.
4. Multiple Primary Keys: A database table should have a single primary key for a typical one-to-many relationship to another table (join). Sometimes, you need a many-to-many relationship. For example, a Person table/object joined to an Address. A person might have multiple addresses (homes), and an address might belong to multiple people. In these cases, where a many-to-many relationship is needed then a simple middle table is set up which contains only two columns which are both primary keys. One of the primary keys in the middle table is the primary key of the Person and the other is the primary key of the Address. Technically speaking, the two primary keys of the middle table are propagated foreign keys. One of the middle table's primary keys is a foreign key from the Person table and the other is a foreign key from the Address table.
As a side note, I'm not aware of a case where more than two primary keys are needed in a database table. In cases where I've seen three (or more) primary keys in a database I realized that the person designing the database didn't have a good understanding of relational databases. What they typically needed was a single primary key and indices created for their other columns to optimize their lookup speeds.