While it does not bring revolutionary changes, the release of SQL Server 2008 delivers functionality, scalability, and performance improvements to SQL Server Integration Services (SSIS). Here is a comprehensive listing of new and enhanced features of SSIS with a short description of each.
SQL Server Integration Services (SSIS)
is the primary Extraction, Transformation, and Loading mechanism incorporated
into the Microsoft database management platform. While its origins go back as
far as SQL Server 7, its first implementation named Data Transformation
Services (DTS), offered only rudimentary ETL capabilities. With the release of
SQL Server 2005, Microsoft delivered its considerably more powerful successor,
with a range of new and enhanced features broad enough to justify rebranding
the product to its current name. The version included in SQL Server 2008 does
not bring equally revolutionary changes, but its functional, scalability, and
performance improvements warrant at the very least this short overview. Here is
their comprehensive listing with a short description of each. (You can find
their more detailed coverage by referring to articles dedicated to SQL Server
2008 Integration Services-related topics published on this Web site):
- Connections Project Wizard – a new
feature, assisting with an initial setup of a package (as well as Data Flow components)
when creating new Integration Services projects in the Business Intelligence
Development Studio.
It is triggered automatically by selecting the File->New->Project… menu
option with Integration
Services Connections Project as its template. It guides you through
selecting and configuring data providers and their corresponding
connection managers, as well as designating each as either a source or a
destination (which are subsequently incorporated into an autocreated Data Flow Task). - Import and Export Wizard – an
enhanced feature, whose superiority comes from an innovative way data type
mappings and conversions are handled. Unlike in its previous version
(included in SQL Server 2005 Integration Services), data type conversions
are no longer implicit, but instead all changes are clearly identified
(via the Review
Data Type Mapping
page and its Column
Conversion Details
popup window), giving you an option to reject them (preventing the
possibility of unintentional data loss). In addition, for each pair of corresponding
source and destination columns, you have the ability to specify the action
to be carried out (Fail or Ignore) in
response to data truncation and error condition. Data type mappings and
conversion behavior are determined based on sets of rules stored in
XML-formatted configuration files (located respectively in Program FilesMicrosoft
SQL Server100DTSMappingFiles and Program FilesMicrosoft SQL
Server100DTSbinn
folders). The process is further simplified by visual clues in the form of
icons associated with each pair of source and destination columns: - green checkmark conveys successful
outcome (if the conversion is guaranteed not to cause a data loss or is
not needed altogether, which you can determine by viewing the state of
the Convert checkbox) - yellow exclamation mark means that
a conversion does take place and its processing should be reviewed (in
order to establish whether realistic possibility of data loss exists). - red circle with a cross is a sign
that existing data cannot be converted (which effectively will prevent
you from executing the resulting package until this issue is resolved). - Package Upgrade Wizard – a new
feature, accessible from the Project->Upgrade All Packages menu item
in the BusinessIntelligence Development Studio, guides you through bulk upgrades
to packages created using the SQL Server 2005 version of Integration
Services (regardless of their storage type). While running the wizard is
relatively straightforward (it involves primarily specifying the source
location along with a few optional configuration steps, such as updating
connection strings to account for new provider names, validating upgraded
packages, or creating new package IDs), you might want to consider using Upgrade Advisor from the
SQL Server 2008 distribution media in order to identify any potential
issues that might surface during such upgrades. (Note that the wizard
offers you an option to back up original packages as part of the upgrade
process). - Control Flow Data Profiling Task and Data Profile Viewer – a new
task provides the ability to capture data processed by the SSIS engine
based on predefined sets of criteria geared towards identifying patterns
in distribution of its values as well as any inconsistencies that deviate
from such patterns (which frequently are the primary reasons for package
runtime errors). More specifically, they allow you, for arbitrarily
selected columns, to identify such characteristics as their length or
value distribution, maximums and minimums, number of NULLs, or level
of conformance to regular expression-based patterns), as well as
inter-column dependencies (including their suitability to function as
primary or foreign keys). The criteria sets are exposed in the Data Profiling Task
Editor
interface (as Profile
Type
in the Profile
Requests
section). Once a capture is completed (resulting in an .XML-formatted
file), its content can be analyzed with the Data Profile Viewer applet
(implemented as the DataProfiler.exe executable
residing in the Program
FilesMicrosoftSQL Server100DTSBinn folder and accessible via Data Profile Viewer shortcut
in the Start->All
Programs->Microsoft SQL Server 2008->Integration Services menu). - Data Flow Lookup Transformation, Cache Transfromation, and Cache Connection
Manager
– a mix of new and improved features that considerably enhance the
peformance benefits of caching, by extending its scope and functionality.
While caching of lookup operations was possible in the previous version of
SQL Server Integration Services as part of the Lookup Transformation feature
set, it was limited to its internal operations. Now, lookups can take
place within a separate data flow (or even package), typically by
utilizing a combination of a Cache Transformation and Cache Connection
Manager,
persist in memory (by default) or a designated file system-based cache
(which takes the form of a .CAW file), and be referenced by a Lookup Transformation operating
in the full cache mode. Additional improvement comes from negative
caching, which includes rows for which there are no matches in the
reference dataset. It is also worth mentioning that the Lookup Transformation contains
an extra output (Lookup
No Match Output),
which isolates failed lookups from errors (they were combined with errors
in its previous version of SSIS). - Script Task and Script Component –
scripting environment is no longer based on the Visual Studio for
Applications (VSA), but instead it utilizes the Visual Studio Tools for
Applications (VSTA) engine. Effectively, both Data Control’s Script Task and Data Flow’s Script Component (through
which access to the VSTA is provided) support both C# and VB.NET
programming languages (in the SQL Server 2005 implementation of
Integration Services, only the latter was available). In addition, as the
result of this change, the process of referencing custom .NET assemblies
has been simplified (as described in the MSDN Referencing
Other Assemblies in Scripting Solutions SQL Server 2008 Books Online article)
compared with a fairly cumbersome workaround required to accomplish the
same goal in the previous version of SSIS (as described in the
equivalent article of SQL Server 2005 Books Online). Furthermore, it
is now possible to take advantage of objects and methods implemented via
Web Services (this functionality was not available before). - ADO.NET Source and ADO.NET Destination – a pair
of new Data
Flow
components facilitate access to any data store supported by .NET Framework
provider (including ODBC and OLE DB). They are most commonly used when
working with Script
Tasks
or when an equivalent OLE DB provider-based access is not possible. ADO.NET Source
constitutes a replacement of the DataReader Source in SQL Server 2005
Integration Services and, similarly, ADO.NET Destination takes on
the role of DataReader
Destination
(even though the latter remains available in the current version). - Data Flow engine and parallelism in the processing
of execution trees – execution trees are no longer are limited to
individual threads, but instead, each (depending on the availability of
processing resources) can be subdivided into smaller execution units
(called subpaths), which
effectively extends the scope of parallelism and contributes to shorter
execution time (on multiprocessor systems). - DTExec.exe and package configurations – due
to changes in the way package configuration is applied (compared with SQL
Server 2005 Integration Services), it is now possible to set some of its
settings (such as connection strings) at runtime from the command line
(when invoking the execution via DTExec.exe utility). For more information
about this functionality, refer to the Package
Configuration article of SQL Server 2008 Books Online. - DT_DBTIME2, DT_DBTIMESTAMP2, and DT_DBTIMESTAMPOFFSET data types
– facilitate data type mapping to equivalent T-SQL date/time data types
introduced in SQL Server 2008. Their primary purpose is to provide support
for more accurate time measurements (such as fractional seconds and the ability
to account for time zone offset).
I hope that this short overview will
raise your interest in the SQL Server 2008 Integration Services sufficiently to
explore each of the functions in more detail.