In my last article I showed you how to install the database engine for SQL Server 2012. One of the new SQL Server 2012 component's that is part of the database engine that I didn't install was Data Quality Services. (DQS) In this article I will be showing you how to install Data Quality Services and the companion client tool called Data Quality Client.
What is Data Quality Services?
DQS is a new feature that came out with SQL Server 2012 that can be used to cleanse, match and transform incoming data. It consists of two components: Data Quality Server and Data Quality Client. Data Quality Server is installed on top of a SQL Server 2012 database engine, whereas the Data Quality Client can be installed on a SQL Server machine or a separate client machine.
When you install Data Quality Server three databases will be created on the database instance where it is installed: DQS_MAIN, DQS_PROJECTS, and DQS_STAGING_DATA. DQS_MAIN, as the name suggests, is the main database for DQS. This database contains all the DQS stored procedures for the DQS engine, and the published data quality knowledge base information that comes with DQS. The DQS_PROJECTS database will be used to store DQS Project information. DQS Projects are created to allow you to cleanse, and match and then export the clean data to a SQL Server database or file. Additional DQS Projects can be used to profile data to help you better understand the quality of your data. The DQS_STAGING_DATA database is where you can copy your incoming data so it can be processed and cleaned up by DQS operations.
Additionally there is a DQS cleansing task that has been incorporated into SQL Server Integration Services (SSIS) packages. This SSIS task allows you to cleanse your data within an SSIS package. This DQS cleansing component is automatically installed when you install Integration Services.
Out of the box DQS has a built-in knowledge base that you can use to validate your data. Here is a list of the built-in knowledge base domains that come with DQS as documented in Books Online:
- Country/Region: Contains the conventional long (official name as designated by the country/region ) and short names (common name used in lists, on maps, etc. ), two-letter abbreviation, three-letter abbreviation and three-digit code for each location. Leading value is set to the long country name.
- Country/Region (three-letter leading): Contains the conventional long (official name as designated by the country/region) and short names (common name used in lists, on maps, and so on), two-letter abbreviation, three-letter abbreviation and three-digit code for each location. Leading values is set to County three-letter abbreviation.
- Country/Region (two-letter leading): Contains the conventional long (official name as designated by the country/region ) and short names (common name used in lists, on maps, etc. ), two-letter abbreviation, three-letter abbreviation and three-digit code for each location. Leading value is set to the Country two-letter abbreviation.
- US - Counties: Contains a list of US counties.
- US - Last Name: Contains a list of last names (surnames) occurring 100 or more times in the Census 2000.
- US - Places: Contains a list of places for the 50 states, the District of Columbia, and Puerto Rico extracted from the Census 2010.
- US - States: Contains the conventional long (official) name and two-letter abbreviation for each state in the US. Leading value is set to the conventional state name.
With DQS you cannot only use the built-in domains to cleanse, match and transform your data, but you can expand the knowledge base with your own domains. This allows you to import and define new domains that are appropriate for cleansing, matching, and transforming data for your environment.
Enough about what DQS is, let me move on and talk about what it takes to install DQS.
Prior to installing DQS your machine must meet some minimum specifications. To install the DQS your machine needs to meet the same requirement as the SQL Server 2012 database engine and you need an instance of SQL Server 2012 running on the machine where you plan to install DQS. The Data Quality Client will require .NET 4.0. If your Data Quality Client machine does not have .NET 4.0 it will be installed when the client is installed. Additionally the client requires Internet Explorer 6.0 or higher.
Microsoft Excel is not a requirement for running the Data Quality Client. But if you want to take advantage of mapping your source data in an Excel file to knowledge base information for data cleansing and matching then you need to have Microsoft Excel 2003 or higher installed on the machine where the Data Quality Client will be installed.
Installation of DQS and the Data Quality Client
DQS needs to be installed on a machine that is running an instance of SQL Server 2012. But the data quality client can be installed on any machine that meets the minimum requirements and can connect to the machine where DQS is installed. For the purposes of this article I will be installing DQS and the Data Quality Client on the same machine where my instance of SQL Server 2012 is running.
The first step to installing DQS and the Data Quality Client is to mount the SQL Server media and then execute the setup.exe application just like I did when I installing SQL Server 2012. I go through the first few installation windows like I did when I installed SQL Server 2012. But when I get to the "Installation Type" window I select the "Add features to an existing instance of SQL Server 2012" radio button as I have done below:
Add features to an existing instance of SQL Server 2012
After the radio button is selected I click on the "Next >" button to continue with the installation process, which brings up the "Feature Selection" window. On this window I select the "Data Quality Services", "Data Quality Client" and "Integration Services" as you can see from the window below:
"Data Quality Services", "Data Quality Client" and "Integration Services"
I selected Integration Services so I can use Integration Services DQS Cleansing component within an SSIS package. Once I have selected these features I can click on the "Next >" button, which brings up the "Disk Space Requirements" window below:
Disk Space Requirements
After reviewed the disk space requirements I clicked on the "Next >" button, which brings up the "Service Configuration" windows below. On this window I need to identify the account that "SQL Server Integration Services" will run under. Since Microsoft best practices suggest that separate Windows accounts should be used for all services, I created and associated a local Windows account named "SSIS" with this service.
Associate a local Windows account
The account and password is the last thing I need to enter to install DQS, Data Quality Client, and the SSIS features. From this point on I can just click through the SQL Server 2012 setup windows, using the "Next >", "Install", or "Close" buttons until I have finished installing the Data Quality features.
It isn't intuitive that as I went through the installation process to add "Data Quality Services" to my current SQL Server 2012 instance that this service didn't really get completely installed. Because of this we need to run through a manual post-installation step to complete the "Data Quality Services" installation. This post-installation step will run the DQS Installer (DQSInstaller.exe) executable.
The DQS Installer executable can be run two different ways. It can either be run from the command prompt, or from the "Start" menu. To run it from the "Start" menu, you point to "All Programs", then click on "Microsoft SQL Server 2012". Then expand the "Data Quality Services" item and then click on the "Data Quality Services Installer" item. Alternatively I can use Windows Explorer to run the program "DQSInstaller.exe," which was installed in the "C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn" directory, since I installed my instance using the defaults.
When I start the DQS Installer a command prompt window opens and then eventually prompts me to enter a Database Master Key as below:
Enter a Database Master Key
This master key is used to encrypt provider key information, which will be stored in the DQS_MAIN database when you set up data providers using Data Quality Services. Once I enter the Database Master Key twice the installer continues on. After a number of minutes the installer completes and displays the following window:
The installer completes
As the window says, all I need to do to complete the installation is to click on any key. Once I do that DQS is installed.
If you what to review the log of the DSQ Installer you can find it in the following location: C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log.
The last post-implementation step I need to perform is to verify that Data Quality Services is working. To do this I will start Data Quality Services by using the "Start" menu and then point to "All Programs", then click on "Microsoft SQL Server 2012". In the menu selection that pops up I then expand the "Data Quality Services" item and finally clicking on the "Data Quality Client". When I do this the following window will be displayed:
Connect to server
On this window I just click on the "Connect" button. Upon doing this the following window will be displayed:
SQL Server Data Quality Services
This is the main Data Quality Client window. Being able to display this window shows that I have successfully installed the Data Quality Services on my VM machine.
Data Quality Services
Data Quality Services is a new feature that is available with SQL Server 2012. The installation of Data Quality Services is a little different, in the fact that it requires you to first add the new feature and then manually run the DQSInstaller.exe. If you need to cleanse, match and transform your data prior to loading it into your database/data warehouse environment then you should consider installing and using Data Quality Services.