When designing a database to support applications you need to consider how you are going to handle primary keys. This article explores natural and surrogate keys, and discusses the pros and cons of each, allowing you to determine what makes the best sense in your environment when you are designing your databases.
When designing a database to support applications you need to consider how you are going to handle primary keys. There are two schools of thought, or maybe three. There are those that say primary keys should always be a made up key, or what is commonly called a surrogate key. Others say there are good reasons to use real data as a key value; this type of key is known as natural key. The third group is those that design their databases so their primary keys are a combination of natural and surrogate keys. In this article, Im going explore natural and surrogate key, and discuss the pros and cons of each. This will allow you to determine what makes best sense in your environment when you are designing your databases.
Natural Key verses Surrogate Key
When you design tables with SQL Server, a table typically has a column or a number of columns that are known as the primary key. The primary key is a unique value that identifies each record. Sometimes the primary key is made up of real data and these are normally referred to as natural keys, while other times the key is generated when a new record is inserted into a table. When a primary key is generated at runtime, it is called a surrogate key. A surrogate key is typically a numeric value. Within SQL Server, Microsoft allows you to define a column with an identity property to help generate surrogate key values.
Before I talk about the pros and cons of natural and surrogate keys, let me first expand a little more on each type of key. By doing this you will have a better understanding of each of these two types of keys, and will have a more solid foundation to determine which type of key you should use in your database design.
A natural key is a single column or set of columns that uniquely identifies a single record in a table, where the key columns are made up of real data. When I say real data I mean data that has meaning and occurs naturally in the world of data. A natural key is a column value that has a relationship with the rest of the column values in a given data record. Here are some examples of natural keys values: Social Security Number, ISBN, and TaxId.
A surrogate key like a natural key is a column that uniquely identifies a single record in a table. But this is where the similarity stops. Surrogate keys are similar to surrogate mothers. They are keys that dont have a natural relationship with the rest of the columns in a table. The surrogate key is just a value that is generated and then stored with the rest of the columns in a record. The key value is typically generated at run time right before the record is inserted into a table. It is sometimes also referred to as a dumb key, because there is no meaning associated with the value. Surrogate keys are commonly a numeric number.
Now that you have an understanding of the difference between these two types of keys I will explore why you might use one key over the other. In the world of data architects, there is much debate over when it is appropriate to use a natural key and when a better solution would be to use a surrogate key. As already stated there are mainly just two different camps. Some say you should always use a natural key and the others say a surrogate key is best. I suppose there is also a third camp that uses a combination of both natural keys and surrogate keys in their database design. Rather than state my opinion on which is best Ill give you the pros and cons of uses each and then you can decide with is best for your design.
Surrogate Key Pros and Cons
A definite design and programming aspect of working with databases is built on the concept that all keys will be supported by the use surrogate keys. To understand these programming aspects better, review these pros and cons of using surrogate keys.
- The primary key has no business intelligence built into it. Meaning you cannot derive any meaning, or relationship between the surrogate key and the rest of the data columns in a row.
- If your business rules change, which would require you to update your natural key this can be done easily without causing a cascade effect across all foreign key relationships. By using a surrogate key instead of a natural key the surrogate key is used in all foreign key relationships. Surrogate keys will not be updated over time.
- Surrogate keys are typically integers, which only require 4 bytes to store, so the primary key index structure will be smaller in size than their natural key counter parts. Having a small index structure means better performance for JOIN operations.
- If foreign key tables use surrogate keys then you will be required to have a join to retrieve the real foreign key value. Whereas if the foreign key table used a natural key then the natural key would be already be included in your table and no join would be required. Of course this I only true if you only needed the natural key column returned in your query
- Surrogate keys are typically not useful when searching for data since they have no meaning.
Natural Key Pros and Cons
Having natural keys as indexes on your tables mean you will have different programming considerations when building your applications. You will find that pros and cons for natural keys to be just the opposite as the pros and cons for surrogate keys.
- Will require less joins when you only need to return the key value of a foreign key table. This is because the natural key will already be imbedded in your table.
- Easier to search because natural keys have meaning and will be stored in your table. Without the natural key in your table, a search for records based on a natural key would require a join to the foreign key table to get the natural key.
- Requires much more work to change a natural key, especially when foreign relationship have been built off the natural key.
- Your primary key index will be larger because natural keys are typically larger in size then surrogate keys.
- Since natural keys are typically larger in size then surrogate keys and are strings instead of integers joins between two tables on a natural key will take more time.
What Kind of Database Designer Are You?
There is much debate in the world of data modeling over what kind of data should be used to support primary keys. There are some purist that say all primary key should be surrogate keys, no matter how small the natural key, or the fact that the natural key will never be updated. Other say you need to use natural keys because they make coding your application just so much easier. When you design your databases, you need to decide what works best in your environment. What kind of database designer are you and into which design camp do you fall?