In a previous article, we discussed Dynamic Data Masking (DDM) and how DDM helps to secure PII/PHI data from unprivileged users. The DDM feature is available in SQL Server 2016 onwards as well as Azure SQL Database.
This feature is useful to implement data security for existing data sets and future data. One of the best parts of this feature is that it doesn’t require any change in the application and data doesn’t change at the physical level. The data masking can be achieved by using any available built in functions. We discussed all these functions in previous articles.
In this article, we will discuss how we can use these built in functions to mask the data based on different data patterns, like social security number, credit card number, date of birth, email address, name, etc.
Data Masking Functions
SQL Server provides four built in functions to mask data in SQL tables. These functions are as follows:
We learned about these functions in a previous article. In this article, we will discuss how we can use these functions to achieve data masking.
Test Data Preparation
First, we will create test tables and insert test data in these tables. Later, we will apply data masking functions and see how the data looks like after data masking.
Create test tables:
Create table TestDDM (ID Int, PersonName varchar (100), EmailAddress varchar(120), CreditCardNumber varchar(19), SocialSecurityNumber varchar(11)) CREATE TABLE TestPerson ( ID INT, Name VARCHAR(100), DOB DATETIME, INCOME NUMERIC)
Insert data in test tables:
INSERT INTO TestDDM Values (1, 'Anoop Kumar','firstname.lastname@example.org','1234-5678-4321-8765','123-45-6789') INSERT INTO TestPerson VALUES (1, 'John Smith','06-07-1986',10000)
Retrieve table data:
SELECT * FROM TestDDM
Figure 1 TestDDM table with Data
SELECT * FROM TestPerson
Figure 2 TestPerson table with Data
Currently, no data masking is implemented so data can be visible as it is inserted in tables.
Data Masking Implementation
In this section, we will use all four built in data masking functions and learn the behavior of these functions.
partial() - The partial () function gives the option to customize the masking. The function exposes the first and last letters and adds a custom padding string in the middle.
We have used partial() function to mask the social security number. We masked the initial five digits of the social security number and exposed last four digits.
ALTER TABLE TestDDM ALTER COLUMN [SocialSecurityNumber] ADD MASKED WITH (FUNCTION = 'partial(0,"XXX-XX-",4)');
email() : The email() function masks the email address.
The command below masks the column EmailAddress in the TestDDM table using the email() function. The function masks the whole string of the email address except the first letter of the email, the special character of the address (@), and the constant suffix of the email address (.com).
ALTER TABLE TestDDM ALTER COLUMN EmailAddress ADD MASKED WITH (FUNCTION = 'email()')
default() : The default() function masks the full value of a column.
This SQL command below masks the CreditCardNumber column in the TestDDM table. The function masks the complete number string and replaces the masked value with ‘X’.
ALTER TABLE TestDDM ALTER COLUMN CreditCardNumber varchar(19) MASKED WITH (FUNCTION = 'default()') NULL
Now, we will review the behavior of data masking implementation. We will run a Select query and review the data.
SELECT * FROM TestDDM;
Figure 3 TestDDM table Data - Unmasked
We can see that data is still visible as inserted. There is no change in data behavior and the data doesn’t mask. The reason for this behavior is user permission. In the current scenario, my ID has db_owner permission and has full access to the data.
To understand the behavior of mask functions and masked data, we will create a new database user TestMaskUser (without login) and will grant select permission on the TestDDM table to the newly created database user.
CREATE USER TestMaskUser WITHOUT LOGIN; GRANT SELECT ON TestDDM TO TestMaskUser;
Now, we will change the context of the query execution and review the TestDDM data table.
EXECUTE AS USER = 'TestMaskUser'; SELECT * FROM TestDDM; REVERT;
Figure 4 TestDDM table Data - Masked
We can see that after changing the user context, data in the TestDDM table has been masked and complete values are not visible to user TestMaskUser as expected.
- SocialSecurityNumber is showing last 4 digits
- EmailAddress is visible with first character, special character (@) and suffix (.com) only
- CreditCardNumber has masked completely
Now, you can understand how easy it is to mask the data available in your database. In the next example, we will see the use of the random() function and how to mask the date of birth data.
random() - function is used on any numeric type to mask the original value with a random value within a specified range.
This SQL command below uses the random() function to mask the Income column in the TestPerson table. This function will replace the original value with a random number within a given range in the random() function.
ALTER TABLE TestPerson ALTER COLUMN INCOME NUMERIC MASKED WITH (FUNCTION = 'random(1,100000)') NULL
This SQL command masks the DOB column in the TestPerson table with the default() function. This function will replace the actual DOB with a default value 1900-01-01 00:00:00:000.
ALTER TABLE TestPerson ALTER COLUMN DOB DATETIME MASKED WITH (FUNCTION = 'default()') NULL
Now, we will grant select permission to TestMaskUser and change the context of execution to review table TestPerson data.
GRANT SELECT ON TestPerson TO TestMaskUser EXECUTE AS USER = 'TestMaskUser'; SELECT * FROM TestPerson; REVERT;
Figure 5 TestPerson table Data - Masked
We can see that after applying the random() function the value of income was changed and the date of birth has been replace with the default value 1900-01-01 00:00:00:000.
List of Masked Columns
At any point of time, we can find what columns have been masked and what mask functions have been used to implement the data masking in a database.
The SQL query below can help to identify the list of masked columns in a database.
SELECT a.name, b.name as table_name, a.is_masked, a.masking_function FROM sys.masked_columns AS a JOIN sys.tables AS b ON a.[object_id] = b.[object_id];
Figure 6 List of Masked Columns
Grant and Revoke UNMASK Permission
We know that data masking is a feature used to secure data that should not be visible to unprivileged users.However, there are methods to control the access of data. We can use the query below to grant permission to UNMASK the user. After this, the unmasked user can see the actual stored data in a table rather than masked data.
GRANT UNMASK TO TestMaskUser;
Figure 7 Grant UNMASK to DB User
Once we Grant UNMASK to TestMaskUser, the user can see actual data rather than masked data.
Similarly, if we Revoke UNMASK permission to TestMastUser, the user can see only masked data.
REVOKE UNMASK TO TestMaskUser;
Figure 8 Revoke UNMSK to DB User
Dynamic data masking is a great feature for both on-premise SQL Server and Azure SQL Database as well. This feature can help users to secure their critical data elements without making any change at physical level. All the unprivileged users can only see masked data and don’t have access to actual values. The DDM is one of the methods to de-identify data in SQL Server 2016 onwards and Azure SQL Database.