At the IBM Information Management Information on Demand
(IOD) conference in Las Vegas in October 2007, IBM made a number of major
announcements, which included the christening of the DB2 Viper 2 beta as DB2 Version
9.5 (DB2 9.5), and the introduction of a new toolset for IBM data servers (DB2
for Linux, UNIX, and Windows, DB2 for i5/OS, DB2 for z/OS, and Informix IDS) called IBM Data
Ive been writing a series on some of the features that were
part of the DB2 Viper 2 beta program. Now that DB2 9.5 is generally available,
I can share with you the vision and the ideology behind the new IBM Studio data
What the DB2 toolset looked like before DB2 9.5
Before the official announcement of the IBM Data Studio, IBM
focused most of its attention (and coordination) on client-side deliverables
around the run time. For example, IBM delivered cross-DB2-family APIs for Call
Level Interface (CLI), Java Database Connectivity (JDBC), SQL Java (SQLJ),
.NET, Perl, Ruby on Rails (RoR), PHP, Python, and more.
From a development perspective, there wasnt a coordinated
focus on the graphical user interface (GUI). In DB2 Universal Database (DB2
UDB) Version 8, IBM delivered the DB2 Developer Center, which evolved into the
DB2 Developers Workbench in DB2 9, but the audience for this tool was focused
somewhere between developers and application-focused database administrators
(DBAs). From a development perspective, one challenge was that developers dont
just build stored procedures and user-defined functions (commonly called routines);
they had to build Web sites using JavaServer Pages (JSP) and applications using
a framework such as Java EE (formerly called J2EE), and so on. For Java
developers, our toolset may have helped with 10% of their job, but they
couldnt do everything they wanted to do using a single IDE. Multiple tools
imply a larger client-side footprint, not to mention the learning curve, all to
manage the end-to-end life cycle of an application.
Our previous toolsets also appealed to application DBAs and
provided them with a framework to build routines (if this wasnt the Java
developers responsibility), but there was no tooling to do schema evolution.
For schema work, the application DBA had to turn to the Control Center; however,
the Control Center was built on an entirely different set of technologies from the
DB2 9 Developer Workbench. Depending on your job responsibilities, you may have
had to use both tools.
Another group of users, operational DBAs, could use the Control
Center or the command line processor (CLP). However, they likely wanted a lightweight
infrastructure to manage the health of their data servers. For this task, these
folks likely leveraged yet another set of tools, the Web-based Command Center
and the Web-based Health Center.
In the end, your IT staffs DB2 data server toolkit looked more
like a bunch of separate tools than a tool suite:
The IBM data server toolset with the announcement of IBM Data Studio and beyond
The IBM Data Studio announcement is a major first step
towards a realignment of the disparate DB2 tools into a single customizable toolset.
And these tools extend beyond the DB2 mandate; they are tools for all IBM
relational data servers. But its even more than that.
IBM Data Studio provides perspectives that appeal to any tasks within the entire application life cycle, which IBM defines as:
Different vendors likely define this process with different labels,
but for the most part, every application rolled out by your IT department experiences
When you consider the breadth of the IBM software group (IBM
SWG) portfolio, you can appreciate the kind of expertise IBM has to offer. From
this broad-based perspective, IBM strategists identified the following roles
(with a data server focus) that the IBM Data Studio toolset needed to serve. The
IBM Data Studio addresses these roles directly with add-ins and perspectives.
These roles are shown below:
You can see that I tried to map the application life cycle outlined
earlier in this article with the roles that IBM strategists identified. For
example, the Design phase encompasses work across job responsibilities (in this
case, the business analyst and the database architect). As you look more
closely at this figure, you can see that the high-level aggregate role of DBA
gets broken down into granular DBA-related roles.
To simplify things, Ive shown you some roles, and some of what
folks in these roles do in their day-to-day jobs below. The toolset is
based on Eclipse the architecture of the IDE where personnel work. Notice,
however, that there is a Web component. The deployment and management of these
solutions are moving toward Web-based interfaces (such as the Data Studio
Administration Control), and thus the Data Studio toolset set has Eclipse-based
and Web 2.0-based facets.
Notice in the previous figure that Ive grouped application
developers and database developers in the same quadrant. Of course, there is administration
work to be done such as change management, schema evolution, and so on, which a
DBA needs to do. There is also a whole category dedicated to governance. For
example, if youre a retailer, are you compliant with the Payment Credit
Industry Data Security Standard (PCI DSS or PCI, for short)?
Quite simply, no other vendor that Im aware of has an
integrated suite that addresses all of the roles illustrated in the previous
figure. One of the primary advantages of such a toolset would be its ability to
help the entire organization gather information at the design phase and use
that information to make the development stages easier. It follows that during
the development phase, the organization would gather even more information
about the project, and then infuse that information (which builds on the information
gathered at the design phase) to facilitate the deployment phase. With improved
information, the deployment phase would naturally run more smoothly. The momentum
builds, of course, and by the time you get to governance, many of the manual
requirements to adhere to a standard are already fulfilled because theyve been
identified and built throughout the whole process with a toolset that is
ubiquitous to all of its stakeholders.
By the governance phase, the framework would be able to take
care of most of the things you need to know about the application life cycle; the
toolset would know what it is you are trying to do and might even be able to automate
some of it for you.
Some of the possibilities with the IBM Data Studio vision
Lets assume youre a retailer and you offer credit cards as
a form of payment and thus need to comply with the PCI standard. Lets further
assume you use IBM Rational Data Architect (IBM RDA) to model your physical and
A Moment on PCI:Its
outside the scope of this article to delve into the details of the PCI
standard. In a nutshell, major credit card companies have mandated that vendors
who use their services must comply with the PCI standard. The ultimate goal is to
provide coaching to those companies that handle credit card data to protect
cardholder information vis-à-vis secure networks, encryption, and so on.
Vendors who fail to comply with the
PCI standard are subject to some really hefty fines; in addition, any retailer
found to have leaked sensitive cardholder data can be required to submit to
external audits or risk losing their right to offer credit card services (from
those credit vendors that participate in the program). Most recently, a major US
retailers credit card processing company was fined $880,000 in the summer of
2007 and will continue to be fined $100,000/month until compliance is met. The IT
challenges posed by the PCI standard are significant: recent studies estimate
that only 40% of the major retailers in North America are compliant!
There are 12 parts to the PCI
standard. An example of just one of its declarations is that credit card
numbers on any report for those without a need to know has to put Xs instead
of the credit card numbers leading digits (except the last block of digits).
For example, a billing report:
Another declaration states that
PIN codes such as those you enter to access your account online must be
represented with *s, ·s, or
some masking symbol.
Another specification (Part 3.1)
notes that the storage of cardholder data should be kept to a minimum, and that network
diagrams that illustrate all connections to cardholder data (Part 1) are
required too. There are a lot of other declarations within this standard, such as
how to handle test data. You can learn more about the PCI standard
So lets assume you are building a retail database. What if you
could specify in the IBM RDA data model that a specific column contains credit
card numbers, or that another column contains PINs? If you could, then a governance
tool could know that these two fields need to be governed by the PCI standard
and represent these columns in your business glossary, the
governance solution could automatically put fine-grained access control (FGAC)
rules in place to mask the data whenever these columns are accessed (except for
applications that are specifically authorized to bypass these rules).
Consider another scenario for this same retailer. The PCI
standard notes that live customer data cannot be used as test data. This means that
real credit card numbers, real addresses, real names, and so on, cannot be used.
If you do, you can be fined. Now your IT department is faced with a challenge
to create a test database for the development organization. This is a perfect
example of different roles being able to leverage the same toolset to solve a
business problem. The DBA has to change the real data such that when it goes
into the test database it is PCI-compliant. But what would happen if the DBA didnt
do this but instead loaded up a new sample database with this data? Typically, nothing
would happen, which could lead to some large fines. Now imagine a scenario
where that same governance tool from the previous paragraph could detect that
PCI-compliant data was being loaded into a test database. If that tool could flag
this violation at data movement time, it would be a very valuable asset.
Now lets assume that you want to extract all the records in
your database for an advertising campaign but are not governed by the PCI
standard. However, your enterprise has its own set of privacy rules that
guarantee clients that their information wont be distributed to other vendors
without their permission. Now expand this scenario to the data model. If the
data server understands that column A is the opt-in or opt-out field to make
your home address visible, then every time a query is run against the data
server for a campaign extract, the data server can consult this field to see if
the address should be returned to the application or not.
In addition to this retail scenario, there are lots of other
examples from other industries. The Health Information Protection and
Portability Act (HIPPA) is a medical privacy initiative that pretty much says
you cant show medical records in reports (similar to the PCI standard for
credit card information). If a column has personal medical history information,
that column can only be shown to the patients physician. You cant show it to
just anyone who has DBA privileges. Same idea -- different industry.
The goal is to make the design layer smart enough to understand
the different kinds of fields that have these kinds of characteristics. That
will provide you with a very efficient environment.
Now, dont let all my talk about the tooling and the role
identification fool you; IBM is still focusing on the client-side runtime APIs
because if you cant run the application in a robust and high-performing
manner, there is no sense in having any tooling whatsoever. We plan to
enhance our already strong API story with a new suite of tools around the jobs
and roles of the personnel that actually do the work.
Note: The DB2 and Studio.NET experience has
had a long and rich history of deep integration. Since Visual Studio is the IDE
where .NET developers tend to work, IBM has provided plug-ins into this
environment since the DB2 UDB Version 8.1.2 release. Ive written extensively on this
particular integration and this API and IDE, so it isnt discussed in this
Heres my attempt to express IBMs intentions for the new
IBM Data Studio:
The goal of a comprehensive solution for the entire life
cycle of an application is unique and ambitious. Im talking about everything
from early up-front design of business processes and logical data models for
all of your applications, to the time when you start to design, code, and test
those applications. But the process isnt finished at that point. You then have
to create a physical database model including their schemas; you have to
allocate storage. You have to perform a lot of security work. You have to
accommodate for the day-to-day management of these systems such as backup and
recovery or meeting service level agreements. Finally, dont forget all the governance
work you have to perform (a growing percentage of the work required in the
application life cycle) to satisfy the regulatory compliance decrees placed
upon your business and more.
While there are lots
and lots of tools out there that address part of the application life cycle,
there are no other single toolsets that currently cover the full life cycle.
Indeed, as I look across the market, most tools are piece-part solutions, which
means you need two or more of them to address the methodology I outlined in
Quite frankly, the
IBM Data Studio vision is an ambitious undertaking, but one with extremely high
value and high potential for our clients since no one else really has this
vision. Whats more, the vision has the potential to grow beyond the set of relational
IBM data servers that IBM Data Studio currently supports. There would be a lot
of value in a cross-data server toolset to solve the inefficient conundrum with
respect to how enterprises align their data server personnel today: skills and
resources tend to be aligned and constrained in direct correlation to a
database vendor. In my opinion, this hurts productivity. A cross-brand full life
cycle toolset would lead to better control over staffing, better control over
resource allocation, and an unbeatable edge over the competition!
How the year 2007 will finish up for IBM Data Studio
IBM has now announced the general availability of IBM Data
Studio and best of all its free for everyone. Although the first iteration of
this IDE is far from the full life cycle vision Ive discussed in this article,
some valuable stuff is being delivered. For example, Data Studio contains a
subset of some of the functionality in IBM RDA. (See Part
3 in this series for an example of the overview data dependency diagrams,
which allow you to see tables and their relationships, and so on.) It also has entity-relationship
diagrams, a simple data distribution viewer (which allows you to see your
datas distribution), and more.
From a development perspective, youll get most of the things
that youre used to seeing as a developer, such as building SQL and XQuery
statements, routines, and so on. In addition to what its predecessor, DB2 9
Developer Workbench, gave you, IBM Data Studio has a set of rich enhancements designed
to make developers even more productive. Theres also a new Web services
framework called IBM Data Web Services. Finally, a lot of features were added
for DBA activities. From a governance perspective, IBM Data Studio gives you
the ability to work with security roles.
Obviously, I cant detail all the features in this article
because thats what the point of this series is but the following figure
should give you an idea of the things you can do free of charge with IBM Data
Studio and an IBM data server:
The capabilities outlined in the previous figure are all
free. Since the toolset is extensible, you can buy add-ons that add new
features to context menus or enable disabled features (in the same way that you
can add to the base tooling provided by vendors such as Quest or Embarcadero).
For example, when IBM Data Studio became generally available, IBM announced two
such chargeable add-ons: IBM Data Studio
Developer and IBM Data Studio
pureQuery Runtime, which are geared towards the incredibly cool Java
enhancements that go with pureQuery
(a subject for a future installment of this series).
Where is IBM Data Studio heading?
In this section, I outline some of the top priorities for
the IBM Data Studio. (Of course, these are not commitments but rather goals.)
The first goal is to reduce the footprint associated with
this toolset and, since client-side offerings arent tied to server releases,
IBM Data Studio can have its own availability schedule thats more aggressive
than that of a data server.
In addition, DB2 or Informix IDS must be seen as the unquestionable
choice for IBM WebSphere Application Server. The debut of pureQuery is a great
first step here, but expect to see initiatives along other integration points
too. Some possible examples include JDBC Capture to enable pureQuery for any
Java program, significant performance monitoring and problem determination
aids, openJPA support for pureQuery, Spring, iBatis, and so on. Also being considered is tight integration with Object
Grid (SQL syntax for caching, DB2 or Informix IDS persistence options,
replication from DB2 or Informix IDS to Object Grid, and more).
Since IBM Data Studio now represents a standard toolset with
a pluggable interface for all sorts of roles and features, youre likely to see
key IBM Information Management products integrated into this toolset such as DB2
Change Management Expert, DB2 Performance Expert, the DB2 Design Studio from
DB2 Warehouse Edition, IBM Optimum suite (from the Princeton Softech
acquisition), and DB2 High Performance Unload, to name a few. Expect to see
some sort of performance management add-ons for IBM Data Studio too. For
example, such a feature could provide SQL statement-level performance details
and historical trend analysis. In addition, data server personnel could have
the ability to report database resource usage in multiple facets such as by SQL
statement, package or collection, application, application server, Java class
name, and more.
Finally, look for expanded functionality in the
administration piece with alter anything capabilities (essentially,
integrating the DB2 Change Management Expert in the toolset): the ability to
view and change registry, database, and database manager configuration
parameters; FTAs for deadlock and timeout events; and other items from the DB2 Control
Center that improve the up-and-running experience for an IBM data server. Remember, these are goals, not commitments.
Wrapping it up
In this article, I tried to describe the vision behind the
newly announced IBM Data Studio. I attempted to outline just how unique and
beneficial our envisioned toolset platform would be for all those involved in
the application life cycle. Specifically, such a toolset would help by:
Slashing development time up to 50% with an integrated
data management environment
Promoting collaboration across roles to optimize data server and
Accelerating Java development productivity with new pureQuery
Simplifying development of applications by implementing industry-specific
Monitoring data server operation and performance anywhere,
anytime from a Web browser.
IBM Data Studio is expected to simplify and speed the development
of new skills by providing a learn once, use with all supported data servers
toolset thats engineered with an easy-to-use and integrated user interface
thats compatible with the IBM Rational Software development platform.
Finally, IBM Data Studio is expected to accelerate information
as a service by allowing you to develop and publish data-related services as Web
services without programming; and since its Info 2.0 ready, with support for
Web 2.0 protocols and formats, its going to power your applications into the
next technology wave.
With all this excitement about IBM Data Studio, you should
know where to get it. Check out www.ibm.com/software/data/studio
for IBM Data Studio-related FAQs, tutorials, downloads, blogs, and more. A
supporting forum and set of blogs is available at: www.ibm.com/developerworks/forums/dw_forum.jsp?forum=1086&cat=19.
See All Articles by Columnist Paul C. Zikopoulos
About the Author
Paul C. Zikopoulos, BA, MBA, is an award-winning writer and
speaker with the IBM Database Competitive Technology team. He has more than 13
years of experience with DB2 and has written more than 150 magazine articles
and is currently working on book number 12. Paul has authored the books
Information on Demand: Introduction to DB2 9.5 New Features, DB2 9 Database
Administration Certification Guide and Reference (6th Edition), DB2 9: New
Features, Information on Demand: Introduction to DB2 9 New Features, Off to the
Races with Apache Derby, DB2 Version 8: The Official Guide, DB2: The Complete
Reference, DB2 Fundamentals Certification for Dummies, DB2 for Dummies, and A
DBA's Guide to Databases on Linux. Paul is a DB2 Certified Advanced Technical
Expert (DRDA and Cluster/EEE) and a DB2 Certified Solutions Expert (Business
Intelligence and Database Administration). In his spare time, he enjoys all
sorts of sporting activities, including running with his dog Chachi, avoiding
punches in his MMA class, and trying to figure out the world according to Chloë
his daughter. You can reach him at: email@example.com.
IBM, DB2, DB2 Universal Database, i5/OS, Informix, pureXML, Rational,
Rational Data Architect, WebSphere, WebSphere Application Server, and z/OS are trademarks
or registered trademarks of International Business Machines Corporation in the United
States, other countries, or both.
Other company, product, or service names may be trademarks
or service marks of others.
Copyright International Business Machines Corporation, 2007.
All rights reserved.
The opinions, solutions, and advice in this article are from
the authors experiences and are not intended to represent official
communication from IBM or an endorsement of any products listed within. Neither
the author nor IBM is liable for any of the contents in this article. The
accuracy of the information in this article is based on the authors knowledge
at the time of writing.