DB2 9.5 and IBM Data Studio - Things I Couldn't Tell You Before DB2 9.5 Was Announced

Tuesday Dec 4th 2007 by Paul Zikopoulos

Paul Zikopoulos shares the vision and the ideology behind the new IBM Data Studio toolset.

At the IBM Information Management Information on Demand (IOD) conference in Las Vegas in October 2007, IBM made a number of major announcements, which included the christening of the DB2 Viper 2 beta as DB2 Version 9.5 (DB2 9.5), and the introduction of a new toolset for IBM data servers (DB2 for Linux, UNIX, and Windows, DB2 for i5/OS, DB2 for z/OS, and Informix IDS) called IBM Data Studio.

I’ve been writing a series on some of the features that were part of the DB2 Viper 2 beta program. Now that DB2 9.5 is generally available, I can share with you the vision and the ideology behind the new IBM Studio data toolset.

What the DB2 toolset looked like before DB2 9.5

Before the official announcement of the IBM Data Studio, IBM focused most of its attention (and coordination) on client-side deliverables around the run time. For example, IBM delivered cross-DB2-family APIs for Call Level Interface (CLI), Java Database Connectivity (JDBC), SQL Java (SQLJ), .NET, Perl, Ruby on Rails (RoR), PHP, Python, and more.

From a development perspective, there wasn’t a coordinated focus on the graphical user interface (GUI). In DB2 Universal Database (DB2 UDB) Version 8, IBM delivered the DB2 Developer Center, which evolved into the DB2 Developer’s Workbench in DB2 9, but the audience for this tool was focused somewhere between developers and application-focused database administrators (DBAs). From a development perspective, one challenge was that developers don’t just build stored procedures and user-defined functions (commonly called routines); they had to build Web sites using JavaServer Pages (JSP) and applications using a framework such as Java EE (formerly called J2EE), and so on. For Java developers, our toolset may have helped with 10% of their job, but they couldn’t do everything they wanted to do using a single IDE. Multiple tools imply a larger client-side footprint, not to mention the learning curve, all to manage the end-to-end life cycle of an application.

Our previous toolsets also appealed to application DBAs and provided them with a framework to build routines (if this wasn’t the Java developer’s responsibility), but there was no tooling to do schema evolution. For schema work, the application DBA had to turn to the Control Center; however, the Control Center was built on an entirely different set of technologies from the DB2 9 Developer Workbench. Depending on your job responsibilities, you may have had to use both tools.

Another group of users, operational DBAs, could use the Control Center or the command line processor (CLP). However, they likely wanted a lightweight infrastructure to manage the health of their data servers. For this task, these folks likely leveraged yet another set of tools, the Web-based Command Center and the Web-based Health Center.

In the end, your IT staff’s DB2 data server toolkit looked more like a bunch of separate tools than a tool suite:

Click for larger image

Click for larger image

The IBM data server toolset with the announcement of IBM Data Studio and beyond

The IBM Data Studio announcement is a major first step towards a realignment of the disparate DB2 tools into a single customizable toolset. And these tools extend beyond the DB2 mandate; they are tools for all IBM relational data servers. But it’s even more than that.

IBM Data Studio provides perspectives that appeal to any tasks within the entire application life cycle, which IBM defines as:

Different vendors likely define this process with different labels, but for the most part, every application rolled out by your IT department experiences these activities.

When you consider the breadth of the IBM software group (IBM SWG) portfolio, you can appreciate the kind of expertise IBM has to offer. From this broad-based perspective, IBM strategists identified the following roles (with a data server focus) that the IBM Data Studio toolset needed to serve. The IBM Data Studio addresses these roles directly with add-ins and perspectives.

These roles are shown below:

You can see that I tried to map the application life cycle outlined earlier in this article with the roles that IBM strategists identified. For example, the Design phase encompasses work across job responsibilities (in this case, the business analyst and the database architect). As you look more closely at this figure, you can see that the high-level aggregate role of DBA gets broken down into granular DBA-related roles.

To simplify things, I’ve shown you some roles, and some of what folks in these roles do in their day-to-day jobs below. The toolset is based on Eclipse – the architecture of the IDE where personnel work. Notice, however, that there is a Web component. The deployment and management of these solutions are moving toward Web-based interfaces (such as the Data Studio Administration Control), and thus the Data Studio toolset set has Eclipse-based and Web 2.0-based facets.

Notice in the previous figure that I’ve grouped application developers and database developers in the same quadrant. Of course, there is administration work to be done such as change management, schema evolution, and so on, which a DBA needs to do. There is also a whole category dedicated to governance. For example, if you’re a retailer, are you compliant with the Payment Credit Industry Data Security Standard (PCI DSS – or PCI, for short)?

Quite simply, no other vendor that I’m aware of has an integrated suite that addresses all of the roles illustrated in the previous figure. One of the primary advantages of such a toolset would be its ability to help the entire organization gather information at the design phase and use that information to make the development stages easier. It follows that during the development phase, the organization would gather even more information about the project, and then infuse that information (which builds on the information gathered at the design phase) to facilitate the deployment phase. With improved information, the deployment phase would naturally run more smoothly. The momentum builds, of course, and by the time you get to governance, many of the manual requirements to adhere to a standard are already fulfilled because they’ve been identified and built throughout the whole process with a toolset that is ubiquitous to all of its stakeholders.

By the governance phase, the framework would be able to take care of most of the things you need to know about the application life cycle; the toolset would know what it is you are trying to do and might even be able to automate some of it for you.

Some of the possibilities with the IBM Data Studio vision

Let’s assume you’re a retailer and you offer credit cards as a form of payment and thus need to comply with the PCI standard. Let’s further assume you use IBM Rational Data Architect (IBM RDA) to model your physical and logical design.

A Moment on PCI:It’s outside the scope of this article to delve into the details of the PCI standard. In a nutshell, major credit card companies have mandated that vendors who use their services must comply with the PCI standard. The ultimate goal is to provide ‘coaching’ to those companies that handle credit card data to protect cardholder information vis-à-vis secure networks, encryption, and so on.

Vendors who fail to comply with the PCI standard are subject to some really hefty fines; in addition, any retailer found to have leaked sensitive cardholder data can be required to submit to external audits or risk losing their right to offer credit card services (from those credit vendors that participate in the program). Most recently, a major US retailer’s credit card processing company was fined $880,000 in the summer of 2007 and will continue to be fined $100,000/month until compliance is met. The IT challenges posed by the PCI standard are significant: recent studies estimate that only 40% of the major retailers in North America are compliant!

There are 12 parts to the PCI standard. An example of just one of its declarations is that credit card numbers on any report for those without a ‘need to know’ has to put Xs instead of the credit card number’s leading digits (except the last block of digits).

For example, a billing report:

Click for larger image

Another declaration states that PIN codes such as those you enter to access your account online must be represented with *s, ·s, or some masking symbol.

Another specification (Part 3.1) notes that the storage of cardholder data should be kept to a minimum, and that network diagrams that illustrate all connections to cardholder data (Part 1) are required too. There are a lot of other declarations within this standard, such as how to handle test data. You can learn more about the PCI standard at: www.pcisecuritystandards.org/.

So let’s assume you are building a retail database. What if you could specify in the IBM RDA data model that a specific column contains credit card numbers, or that another column contains PINs? If you could, then a governance tool could know that these two fields need to be governed by the PCI standard and represent these columns in your business glossary, the governance solution could automatically put fine-grained access control (FGAC) rules in place to mask the data whenever these columns are accessed (except for applications that are specifically authorized to bypass these rules).

Consider another scenario for this same retailer. The PCI standard notes that live customer data cannot be used as test data. This means that real credit card numbers, real addresses, real names, and so on, cannot be used. If you do, you can be fined. Now your IT department is faced with a challenge to create a test database for the development organization. This is a perfect example of different roles being able to leverage the same toolset to solve a business problem. The DBA has to change the real data such that when it goes into the test database it is PCI-compliant. But what would happen if the DBA didn’t do this but instead loaded up a new sample database with this data? Typically, nothing would happen, which could lead to some large fines. Now imagine a scenario where that same governance tool from the previous paragraph could detect that PCI-compliant data was being loaded into a test database. If that tool could flag this violation at data movement time, it would be a very valuable asset.

Now let’s assume that you want to extract all the records in your database for an advertising campaign but are not governed by the PCI standard. However, your enterprise has its own set of privacy rules that guarantee clients that their information won’t be distributed to other vendors without their permission. Now expand this scenario to the data model. If the data server understands that column A is the opt-in or opt-out field to make your home address visible, then every time a query is run against the data server for a campaign extract, the data server can consult this field to see if the address should be returned to the application or not.

In addition to this retail scenario, there are lots of other examples from other industries. The Health Information Protection and Portability Act (HIPPA) is a medical privacy initiative that pretty much says you can’t show medical records in reports (similar to the PCI standard for credit card information). If a column has personal medical history information, that column can only be shown to the patient’s physician. You can’t show it to just anyone who has DBA privileges. Same idea -- different industry.

The goal is to make the design layer smart enough to understand the different kinds of fields that have these kinds of characteristics. That will provide you with a very efficient environment.

Now, don’t let all my talk about the tooling and the role identification fool you; IBM is still focusing on the client-side runtime APIs because if you can’t run the application in a robust and high-performing manner, there is no sense in having any tooling whatsoever. We plan to enhance our already strong API story with a new suite of tools around the jobs and roles of the personnel that actually do the work.

Note: The DB2 and Studio.NET experience has had a long and rich history of deep integration. Since Visual Studio is the IDE where .NET developers tend to work, IBM has provided plug-ins into this environment since the DB2 UDB Version 8.1.2 release. I’ve written extensively on this particular integration and this API and IDE, so it isn’t discussed in this article.

Here’s my attempt to express IBM’s intentions for the new IBM Data Studio:

The goal of a comprehensive solution for the entire life cycle of an application is unique and ambitious. I’m talking about everything from early up-front design of business processes and logical data models for all of your applications, to the time when you start to design, code, and test those applications. But the process isn’t finished at that point. You then have to create a physical database model including their schemas; you have to allocate storage. You have to perform a lot of security work. You have to accommodate for the day-to-day management of these systems such as backup and recovery or meeting service level agreements. Finally, don’t forget all the governance work you have to perform (a growing percentage of the work required in the application life cycle) to satisfy the regulatory compliance decrees placed upon your business and more.

While there are lots and lots of tools out there that address part of the application life cycle, there are no other single toolsets that currently cover the full life cycle. Indeed, as I look across the market, most tools are piece-part solutions, which means you need two or more of them to address the methodology I outlined in this article.

Quite frankly, the IBM Data Studio vision is an ambitious undertaking, but one with extremely high value and high potential for our clients since no one else really has this vision. What’s more, the vision has the potential to grow beyond the set of relational IBM data servers that IBM Data Studio currently supports. There would be a lot of value in a cross-data server toolset to solve the inefficient conundrum with respect to how enterprises align their data server personnel today: skills and resources tend to be aligned and constrained in direct correlation to a database vendor. In my opinion, this hurts productivity. A cross-brand full life cycle toolset would lead to better control over staffing, better control over resource allocation, and an unbeatable edge over the competition!

How the year 2007 will finish up for IBM Data Studio

IBM has now announced the general availability of IBM Data Studio – and best of all it’s free for everyone. Although the first iteration of this IDE is far from the full life cycle vision I’ve discussed in this article, some valuable stuff is being delivered. For example, Data Studio contains a subset of some of the functionality in IBM RDA. (See Part 3 in this series for an example of the overview data dependency diagrams, which allow you to see tables and their relationships, and so on.) It also has entity-relationship diagrams, a simple data distribution viewer (which allows you to see your data’s distribution), and more.

From a development perspective, you’ll get most of the things that you’re used to seeing as a developer, such as building SQL and XQuery statements, routines, and so on. In addition to what its predecessor, DB2 9 Developer Workbench, gave you, IBM Data Studio has a set of rich enhancements designed to make developers even more productive. There’s also a new Web services framework called IBM Data Web Services. Finally, a lot of features were added for DBA activities. From a governance perspective, IBM Data Studio gives you the ability to work with security roles.

Obviously, I can’t detail all the features in this article because that’s what the point of this series is – but the following figure should give you an idea of the things you can do free of charge with IBM Data Studio and an IBM data server:

The capabilities outlined in the previous figure are all free. Since the toolset is extensible, you can buy add-ons that add new features to context menus or enable disabled features (in the same way that you can add to the base tooling provided by vendors such as Quest or Embarcadero). For example, when IBM Data Studio became generally available, IBM announced two such chargeable add-ons: IBM Data Studio Developer and IBM Data Studio pureQuery Runtime, which are geared towards the incredibly cool Java enhancements that go with pureQuery (a subject for a future installment of this series).

Where is IBM Data Studio heading?

In this section, I outline some of the top priorities for the IBM Data Studio. (Of course, these are not commitments but rather goals.)

The first goal is to reduce the footprint associated with this toolset and, since client-side offerings aren’t tied to server releases, IBM Data Studio can have its own availability schedule that’s more aggressive than that of a data server.

In addition, DB2 or Informix IDS must be seen as the unquestionable choice for IBM WebSphere Application Server. The debut of pureQuery is a great first step here, but expect to see initiatives along other integration points too. Some possible examples include JDBC Capture to enable pureQuery for any Java program, significant performance monitoring and problem determination aids, openJPA support for pureQuery, Spring, iBatis, and so on. Also being considered is tight integration with Object Grid (SQL syntax for caching, DB2 or Informix IDS persistence options, replication from DB2 or Informix IDS to Object Grid, and more).

Since IBM Data Studio now represents a standard toolset with a pluggable interface for all sorts of roles and features, you’re likely to see key IBM Information Management products integrated into this toolset such as DB2 Change Management Expert, DB2 Performance Expert, the DB2 Design Studio from DB2 Warehouse Edition, IBM Optimum suite (from the Princeton Softech acquisition), and DB2 High Performance Unload, to name a few. Expect to see some sort of performance management add-ons for IBM Data Studio too. For example, such a feature could provide SQL statement-level performance details and historical trend analysis. In addition, data server personnel could have the ability to report database resource usage in multiple facets such as by SQL statement, package or collection, application, application server, Java class name, and more.

Finally, look for expanded functionality in the administration piece with “alter anything” capabilities (essentially, integrating the DB2 Change Management Expert in the toolset): the ability to view and change registry, database, and database manager configuration parameters; FTAs for deadlock and timeout events; and other items from the DB2 Control Center that improve the up-and-running experience for an IBM data server. Remember, these are goals, not commitments.

Wrapping it up…

In this article, I tried to describe the vision behind the newly announced IBM Data Studio. I attempted to outline just how unique and beneficial our envisioned toolset platform would be for all those involved in the application life cycle. Specifically, such a toolset would help by:

  • Slashing development time up to 50% with an integrated data management environment
  • Promoting collaboration across roles to optimize data server and application performance
  • Accelerating Java development productivity with new pureQuery data access
  • Simplifying development of applications by implementing industry-specific XML standards
  • Monitoring data server operation and performance anywhere, anytime from a Web browser.

IBM Data Studio is expected to simplify and speed the development of new skills by providing a “learn once, use with all supported data servers” toolset that’s engineered with an easy-to-use and integrated user interface that’s compatible with the IBM Rational Software development platform.

Finally, IBM Data Studio is expected to accelerate information as a service by allowing you to develop and publish data-related services as Web services without programming; and since it’s Info 2.0 ready, with support for Web 2.0 protocols and formats, it’s going to power your applications into the next technology wave.

With all this excitement about IBM Data Studio, you should know where to get it. Check out www.ibm.com/software/data/studio for IBM Data Studio-related FAQs, tutorials, downloads, blogs, and more. A supporting forum and set of blogs is available at: www.ibm.com/developerworks/forums/dw_forum.jsp?forum=1086&cat=19.

» See All Articles by Columnist Paul C. Zikopoulos

About the Author

Paul C. Zikopoulos, BA, MBA, is an award-winning writer and speaker with the IBM Database Competitive Technology team. He has more than 13 years of experience with DB2 and has written more than 150 magazine articles and is currently working on book number 12. Paul has authored the books Information on Demand: Introduction to DB2 9.5 New Features, DB2 9 Database Administration Certification Guide and Reference (6th Edition), DB2 9: New Features, Information on Demand: Introduction to DB2 9 New Features, Off to the Races with Apache Derby, DB2 Version 8: The Official Guide, DB2: The Complete Reference, DB2 Fundamentals Certification for Dummies, DB2 for Dummies, and A DBA's Guide to Databases on Linux. Paul is a DB2 Certified Advanced Technical Expert (DRDA and Cluster/EEE) and a DB2 Certified Solutions Expert (Business Intelligence and Database Administration). In his spare time, he enjoys all sorts of sporting activities, including running with his dog Chachi, avoiding punches in his MMA class, and trying to figure out the world according to Chloë – his daughter. You can reach him at: paulz_ibm@msn.com.


IBM, DB2, DB2 Universal Database, i5/OS, Informix, pureXML, Rational, Rational Data Architect, WebSphere, WebSphere Application Server, and z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

Copyright International Business Machines Corporation, 2007. All rights reserved.


The opinions, solutions, and advice in this article are from the author’s experiences and are not intended to represent official communication from IBM or an endorsement of any products listed within. Neither the author nor IBM is liable for any of the contents in this article. The accuracy of the information in this article is based on the author’s knowledge at the time of writing.

Mobile Site | Full Site