Download the files for this article.
This is part 9 of a series. If you have not read part 8, you can find it here.
The Problem: Searching Encrypted Data
Once we have encrypted the data in our table, we cannot efficiently search that data without making further changes. Remember from part 7 that EncryptByCert, EncryptByKey, EncryptByAsymKey, and EncryptByPassPhrase all create different results (ciphertext) each time they are called.
The solution is to use Message Authentication Codes. Note that this is a special use of the general term used for messaging.
Hashing is different from encryption in that it always returns the same ciphertext. If I run this code:
SELECT HashBytes('MD5', 'secret')
I get this ciphertext:
Running it again returns the same result. Running it on a different machine also returns the same result.
Different algorithms return different ciphertext. For example, if I run this code:
SELECT HashBytes('SHA1', 'secret')
I get this ciphertext:
HashBytes always returns varbinary results with the following length:
|MD2, MD4, MD5||16|
I created two UDTs (HashResultMD and HashResultSHA) based on the result lengths of the different algorithms. It would be very reasonable to use the maximum length and have only one type. Another option is to use just some of the bytes of the result. I have chosen here to use the complete result. It is important to understand that, even using the full result, there is not a guarantee that you will not have collisions.
What About Dictionary Attacks?
Because hashing creates consistent ciphertext, it would be possible to run a dictionary attack against a table and find values that match a given string. From the example above, we can see that any entry with ciphertext that matches 0x5EBE2294ECD0E0F08EAB7690D2A6EE69 was created by hashing the string secret using the MD5 algorithm.
The way to fix this is by using a salt value. If we can unencrypt a salt value using DecryptByKey (which we have seen is fast) and then pre-pend (or append) the string were trying to store, then a hacker cant just hash a set of values and compare the ciphertext against the result.
The salt should be a reasonable length, but I dont think you should expose the exact length in your code. If you create a UDT that allows for the storage of a hashed value prepended with a secret, the UDT should allow for some variance in the secret length.
Look-Up by Hashed Value
Reference the attached file EncryptByKeyPerfTest_Setup.sql.
Given a value (for example, CardNumber or SecureString), we can now perform efficient searches. Since hashing consistently returns the same ciphertext, we can first hash the look-up value and then find a record or records that match that value.
Note that SecurityCode1 does not have a corresponding MAC column in CardEncryptByKey. This is because there is no use case for looking up a record by its security code without its card number. If we use a MAC column to allow efficient look-ups of records, the security code column is used only for validation that the proper record has been found.
Create MAC columns only for columns that will be searched, and remember that the MAC supports efficient look-up and not exact filtering.
A Simple(ish) Example
Reference the attached file SimplestEncrWithMAC.sql.
It is easier to show code than explain everything behind it. Here, we build a table to store the salt and another to store data records:
CREATE TABLE dbo.SecretSalt ( SaltID tinyint NOT NULL, Salt varbinary(100) NOT NULL CONSTRAINT SecretSalt_PK PRIMARY KEY CLUSTERED (SaltID) ); CREATE TABLE dbo.Hash1 ( RecordID int NOT NULL, EncryptResult varbinary(116) NOT NULL, -- Encrypted HashResult varbinary(20) NOT NULL -- Hashed CONSTRAINT Hash1_PK PRIMARY KEY CLUSTERED (RecordID) ); CREATE NONCLUSTERED INDEX Hash1_HashResult_IDX ON dbo.Hash1 ( HashResult );
First, lets add the salt record.
INSERT INTO dbo.SecretSalt ( SaltID, Salt ) SELECT 1, EncryptByKey( Key_GUID('TestKey1'), 'd=s_4ZeG3me4E#U4ru6ag&@ru@A2Afr7' );
Next, add records to Hash1. We run this code within a loop to create n records.
SELECT @string1 = 'Simple' + CONVERT(varchar(11), @ct); INSERT INTO dbo.Hash1 ( RecordID, EncryptResult, HashResult ) VALUES ( @ct, EncryptByKey(Key_GUID (N'TestKey1'), @string1), HashBytes('SHA1', @string1) );
The code will add records that look like this:
RecordID EncryptResult HashResult -------- -------------------------- ------------------- 1 0x00EC5555DE58...FFEDCA8C7 0xF71C033...4C307B5 2 0x00EC5555DE58...D73349886 0x7A66A82...A6B4DBC
Now, we can look up a single record that matches an expected string. We pick the string 5 which should match the fifth record and return a RecordID of 5.
DECLARE @lookUp int; SELECT @lookUp = 5; -- Pick some record to search for SELECT h.RecordID, CONVERT(int, CONVERT(varchar(11), DecryptByKey(h.EncryptResult))) AS 'Decrypted Value' FROM dbo.Hash1 AS h CROSS JOIN dbo.SecretSalt AS s WHERE s.SaltID = 1 AND h.HashResult = HashBytes ( 'SHA1', CONVERT(varchar(63), DecryptByKey(s.Salt)) + CONVERT(varchar(11), @lookUp) ) -- This uses the index to find the record AND CONVERT(int, CONVERT(varchar(11), DecryptByKey(h.EncryptResult))) = @lookUp;
The WHERE clause has three parts:
1. Even though there is only one record in the SecretSalt table, tell it which one to use.
2. Allow the optimizer to use the index on the varbinary hash column.
3. Validate that the correct record or records are returned.
All these parts are required for an efficient and reliable look-up. This returns the exact record we expected.
What do you think? If you agree, fine. If not, provide feedback in the forum. I have added a new thread called MAC Code. This code is complex. If you find a more straightforward way of doing this, please share.
Limitations and Options
Message authentication codes work only for finding an exact match. It does not support range queries or wildcards.
Using a message authentication code column allows a hacker to determine values that are the same. As an example, if we salted and then hashed two credit card numbers that were the same, the ciphertext would be exactly the same.
You may choose to not use the full ciphertext returned. For instance, the varbinary result (16 or 20 bytes) can be converted to an integer (4 bytes) or bigint (8 bytes).
Whether you use the full ciphertext or a portion of it, you still have to validate that the encrypted value matches the value you are searching for because of the possibility of hash collisions. An option is to SELECT records that match the hashed value and only validate the encrypted data if the SELECT returns more than one record.
For Next Time
Next month, I will discuss the performance results of the combination of encryption and hashing. I will also update the SqlCredit code to include all these changes.
- SQL Server 2005: searching encrypted data from Laurentiu Cristofor's blog
- Using Hashing to Obscure Sensitive Data from the Microsoft SQL Server Development Customer Advisory Team blog
Download the files for this article.
Discuss this article in the SQLCredit Forum.SqlCredit - Developing a Complete SQL Server OLTP Database Project
- SQL Server 2008's Change Data Capture - Tracking the Moving Parts
- Performance Testing - SQL Server 2008 versus SQL Server 2005
- Exploring SQL Server's Index INCLUDEs
- Exploring SQL 2005's Ranking Functions - NTILE() and ROW_NUMBER()
- SqlCredit - Part 19: Exploring SQL 2005's Ranking Functions - RANK() and DENSE_RANK()
- SqlCredit, Part 18: Exploring the Performance of SQL 2005's OUTPUT Clause
- SqlCredit - Part 17: Exploring SQL 2005's OUTPUT Clause
- SqlCredit - Part 16: The Cost of Bloat
- SqlCredit - Part 15: The Cost of Distribution
- SqlCredit - Part 14: The Cost of Translation
- SqlCredit - Part 13: More on Indexed Persisted Computed Columns
- SqlCredit - Part 12: Exploring Indexed Persisted Computed Columns
- SqlCredit - Part 11: Change Tracking Using History Records
- SqlCredit - Part 10: MAC Performance and Updating SqlCredit
- SqlCredit - Part 9: Message Authentication Codes
- SqlCredit Part 8: Comparing Encrypt/DecryptByCert and Encrypt/DecryptByKey
- SqlCredit Part 7: Performance Impact of EncryptByCert and DecryptByCert
- SqlCredit Part 6: Exploring EncryptByCert and DecryptByCert
- SqlCredit - Part 5: Adding Card, Vendor, and Purchase, Plus Much Refactoring
- SqlCredit - Part 4: Schema and Procedure Security
- SqlCredit - Part 3: Schema/Proc Updates and Automated Unit Testing
- SqlCredit - Part 2: Creating the Database, Tables, CRUD Procedures
- SqlCredit - Developing a Complete SQL Server OLTP Database Project