Julian Stuhler shares his pick of the most important current trends in the world of IBM Information Management. Some are completely new and some are evolutions of existing technologies, and he's betting that every one of them will have some sort of impact on data management professionals during the next 12-18 months.
The Greek philosopher Heraclitus is credited with the saying "Nothing endures but change". Two millennia later those words still ring true, and nowhere more so than within the IT industry. Each year brings exciting new technologies, concepts and buzzwords for us to assimilate. Here is my pick of the most important current trends in the world of IBM Information Management. Some are completely new and some are evolutions of existing technologies, but I'm betting that every one of them will have some sort of impact on data management professionals during the next 12-18 months.
1. Living on a Smarter Planet
You don't have to be an IT professional to see that the world around us is getting smarter. Let's just take a look at a few examples from the world of motoring: we've become used to our in-car GPS systems giving us real-time traffic updates, signs outside car parks telling us exactly how many spaces are free, and even the cars themselves being smart enough to brake individual wheels in order to control a developing skid. All of these make our lives easier and safer by using real-time data to make smart decisions.
However, all of this is just the beginning: everywhere you look the world is getting more "instrumented", and clever technologies are being adopted to use the real-time data to make things safer, quicker and greener. Smart electricity meters in homes are giving consumers the ability to monitor their energy usage in real time and make informed decisions on how they use it, resulting in an average reduction of 10% in a recent US study. Sophisticated traffic management systems in our cities are reducing congestion and improving fuel efficiency, with an estimated reduction in journey delays of 700,000 hours in another study covering 439 cities around the world.
All of this has some obvious implications for the volume of data our systems will have to manage (see trend #2 below) but the IT impact goes a lot deeper than that. The very infrastructure that we run our IT systems on is also getting smarter. Virtualization technologies allow server images to be created on demand as capacity increases, and just as easily torn down again when the demand reduces. More extensive instrumentation and smarter analysis allows the peaks and troughs in demand to be more accurately measured and predicted so that capacity can be dynamically adjusted to cope. With up to 85% of server capacity typically sitting idle on distributed platforms, the ability to virtualize and consolidate multiple physical servers can save an enormous amount of power, money and valuable IT center floor space.
If you live in the mainframe space, virtualization is an established technology that you've been working with for many years. If not, this might be a new way of thinking about your server environment. Either way, most of us will be managing our databases on virtual servers running on a more dynamic infrastructure in the near future.
2. The Information Explosion
As IT becomes ever more prevalent in nearly every aspect of our lives, the amount of data generated and stored continues to grow at an astounding rate. According to IBM, worldwide data volumes are currently doubling every two years. IDC estimates that 45GB of data currently exists for each person on the planet: that's a mind-blowing 281 billion gigabytes in total. While a mere 5 percent of that data will end up on enterprise data servers, it is forecast to grow at a staggering 60 percent per year, resulting in 14 exabytes of corporate data by 2011.
Major industry trends such as the move towards packaged ERP and CRM applications, increased regulatory and audit requirements, investment in advanced analytics and major company mergers and acquisitions are all contributing to this explosion of data, and the move towards instrumenting our planet (see trend #1 above) is only going to make things worse.
As the custodians of the world's corporate data, we are at the sharp end of this particular trend. We're being forced to get more inventive with database partitioning schemes to reduce the performance and operational impact of increased data volumes. Archiving strategies, usually an afterthought for many new applications, are becoming increasingly important. The move to a 64-bit memory model on all major computing platforms allows us to design our systems to hold much more data in memory rather than on disk, further reducing the performance impact. As volumes continue to increase and new types of data such as XML and geospatial information are integrated into our corporate data stores (see trend #5), we'll have to get even more inventive.
3. Hardware Assist
OK, so this is not a new trend: some of the earliest desktop PCs had the option to fit coprocessors to speed up floating point arithmetic, and the mainframe has used many types of supplementary hardware over the years to boost specific functions such as sort and encryption. However, use of special hardware is becoming ever more important on all of the major computing platforms.
In 2004, IBM introduced the zAAP (System z Application Assist Processor), a special type of processor aimed at Java workloads running under z/OS. Two years later, it introduced the zIIP (System z Integrated Information Processor) which was designed to offload specific types of data and transaction processing workloads for business intelligence, ERP and CRM, and network encryption. In both cases, work can be offloaded from the general-purpose processors to improve overall capacity and significantly reduce running costs (as most mainframe customers pay according to how much CPU they burn on their general-purpose processors). These "specialty coprocessors" have been a critical factor in keeping the mainframe cost-competitive with other platforms, and allow IBM to easily tweak the overall TCO proposition for the System z platform. IBM has previewed its Smart Analytics Optimizer blade for System z (see trend #9) and is about to release details of the next generation of mainframe servers: we can expect the theme of workload optimization through dedicated hardware to continue.
On the distributed computing platform, things have taken a different turn. The GPU (graphics processing unit), previously only of interest to CAD designers and hard-core gamers, is gradually establishing itself as a formidable computing platform in its own right. The capability to run hundreds or thousands of parallel processes is proving valuable for all sorts of applications, and a new movement called CPGPU (General-Purpose computation on Graphics Processing Units) is rapidly gaining ground. It is very early days, but many database operations (including joins, sorting, data visualization and spatial data access) have already been proven and the mainframe database vendors won't be far behind.
4. Versioned/Temporal Data
As the major relational database technologies continue to mature, it's getting more and more difficult to distinguish between them on the basis of pure functionality. In that kind of environment, it's a real treat when a vendor comes up with a major new feature, which is both fundamentally new and immediately useful. The temporal data capabilities being delivered as part of DB2 10 for z/OS qualify on both counts.
Many IT systems need to keep some form of historical information in addition to the current status for a given business object. For example, a financial institution may need to retain the previous addresses of a customer as well as the one they are currently living at, and know what address applied at any given time. Previously, this would have required the DBA and application developers to spend valuable time creating the code and database design to support the historical perspective, while minimizing any performance impact.
The new temporal data support in DB2 10 for z/OS provides this functionality as part of the core database engine. All you need to do is indicate which tables/columns require temporal support, and DB2 will automatically maintain the history whenever an update is made to the data. Elegant SQL support allows the developer to query the database with an "as of" date, which will return the information that was current at the specified time.
With the ongoing focus on improving productivity and reducing time-to-market for key new IT systems, you can expect other databases (both IBM and non-IBM) to implement this feature sooner rather than later.
5. The Rise of XML and Spatial Data
Most relational databases have been able to store "unstructured" data such as photographs and scanned images for a while now, in the form of BLOBS (Binary Large OBjects). This has proven useful in some situations, but most businesses use specialized applications such as IBM Content Manager to handle this information more effectively than a general-purpose database. These kind of applications typically do not have to perform any significant processing on the BLOB itself - they merely store and retrieve it according to externally defined index metadata.
In contrast, there are some kinds of non-traditional data that need to be fully understood by the database system so that it can be integrated with structured data and queried using the full power of SQL. The two most powerful examples of this are XML and spatial data, supported as special data types within the latest versions of both DB2 for z/OS and DB2 for LUW.
More and more organizations are coming to rely on some form of XML as the primary means of data interchange, both internally between applications and externally when communicating with third-parties. As the volume of critical XML business documents increases, so too does the need to properly store and retrieve those documents alongside other business information. DB2's pureXML feature allows XML documents to be stored natively in a specially designed XML data store, which sits alongside the traditional relational engine. This is not a new feature any more, but the trend I've observed is that more organizations are beginning to actually make use of pureXML within their systems. The ability to offload some XML parsing work to a zAAP coprocessor (see trend #3) is certainly helping.
Nearly all of our existing applications contain a wealth of spatial data (customer addresses, supplier locations, store locations, etc): the trouble is we're unable to use it properly as it's in the form of simple text fields. The spatial abilities within DB2 allow that data to be "geoencoded" in a separate column, so that the full power of SQL can be unleashed. Want to know how many customers live within a 10-mile radius of your new store? Or if a property you're about to insure is within a known flood plain or high crime area? All of this and much more is possible with simple SQL queries. Again, this is not a brand new feature but more and more organizations are beginning to see the potential and design applications to exploit this feature.
6. Application Portability
Despite the relative maturity of the relational database marketplace, there is still fierce competition for overall market share between the top three vendors. IBM, Oracle and Microsoft are the main protagonists, and each company is constantly looking for new ways to tempt their competitor's customers to defect. Those brave souls that undertook migration projects in the past faced a difficult process, often entailing significant effort and risk to port the database and associated applications to run on the new platform. This made large-scale migrations relatively rare, even when there were compelling cost or functionality reasons to move to another platform.
Two trends are changing this and making porting projects more common. The first is the rise of the packaged ERP/CRM solution from companies such as SAP and Siebel. These applications have been written to be largely database agnostic, with the core business logic isolated from the underlying database by an "I/O layer". So, while there may still be good reasons to be on a specific vendor's database in terms of functionality or price, the pain of moving from one to another is vastly reduced and the process is supported by the ERP solution vendor with additional tooling. Over 100 SAP/Oracle customers are known to have switched to DB2 during the past 12 months for example, including huge organizations such as Coca-Cola.
The second and more recent trend is direct support for competitor's database APIs. DB2 for LUW version 9.7 includes a host of new Oracle compatibility features that makes it possible to run the vast majority of Oracle applications natively against DB2 with little or no change required to the code. IBM has also announced the "DB2 SQL Skin" feature, which provides similar capabilities for Sybase ASE applications to run against DB2. With these features greatly reducing the cost and risk of changing the application code to work with a different database, all that is left is to physically port the database structures and data to the new platform (which is a relatively straightforward process that is well supported by vendor tooling). There is a huge amount of excitement about these new features and IBM is expecting to see a significant number of Oracle customers switch to DB2 in the coming year. I'm expecting IBM to continue to pursue this strategy by targeting other databases such as SQL Server, and Oracle and Microsoft may well return the favor if they begin to lose significant market share as a result.
7. Scalability and Availability
The ability to provide unparalleled scalability and availability for DB2 databases is not new: high-end mainframe users have been enjoying the benefits of DB2 Data Sharing and Parallel Sysplex for more than 15 years. The shared-disk architecture and advanced optimizations employed in this technology allow customers to run mission-critical systems with 24x7 availability and no single point of failure, with only a minimal performance penalty. Major increases in workload can be accommodated by adding additional members to the data sharing group, providing an easy way to scale.
Two developments have resulted in this making my top 10 trends list. Firstly, I'm seeing a significant number of mainframe customers who had not previously taken advantage of data sharing begin to take the plunge. There are various reasons for this, but we've definitely moved away from the days when DB2 for z/OS data sharing customers were a minority group huddling together at conferences and speaking a different language to everyone else.
The second reason that this is set to be big news over the next year is DB2 pureScale: the implementation of the same data sharing shared-disk concepts on the DB2 for LUW platform. It's difficult to overstate the potential impact this could have on distributed DB2 customers that run high volume mission critical applications. Before pureScale, those customers had to rely on features such as HADR to provide failover support to a separate server (which could require many seconds to take over in the event of a failure) or go to external suppliers such as Xkoto with their Gridscale solution (no longer an option since the company was acquired by Teradata and the product was removed from the market). pureScale brings DB2 for LUW into the same ballpark as DB2 for z/OS in terms of scalability and availability, and I'm expecting a lot of customer activity in this area over the next year.
8. Stack 'em high...
For some time now, it has been possible for organizations to take a "pick and mix" approach to their IT infrastructure, selecting the best hardware, operating system, database and even packaged application for their needs. This allowed IT staff to concentrate on building skills and experience in specific vendor's products, thereby reducing support costs.
Recent acquisitions have begun to put this environment under threat. Oracle's previous purchase of ERP vendors such as Peoplesoft, Siebel and JD Edwards had already resulted in big pressure to use Oracle as the back-end database for those applications (even if DB2 and other databases are still officially supported). That reinforced SAP's alliance with IBM and the push to run their applications on DB2 (again, other databases are supported but not encouraged).
Two acquisitions during the past 12 months have further eroded the "mix and match" approach, and started a trend towards single-vendor end-to-end solution "stacks" comprising hardware, OS, database and application. The first and most significant of these was Oracle's acquisition of Sun Microsystems in January 2010. This gave the company access to Sun's well-respected server technology and the Solaris OS that runs on it. At a single stroke, Oracle was able to offer potential customers a completely integrated hardware/software/application stack.
The jury is still out on the potential impact of the second acquisition: SAP's purchase of Sybase in May 2010. Although the official SAP position is that the Sybase technology has been purchased for the enhanced mobile and in-memory computing technologies that Sybase will bring, there is the possibility that SAP will choose to integrate the Sybase database technology into the SAP product. That will still leave them dependent on other vendors such as IBM for the hardware and operating system, but it would be a major step forward in any integration strategy they may have.
Older readers of this article may see some startling similarities to the bad old days of vendor lock-in prevalent in the 1970s and 1980s. IBM's strategy to support other vendor's database APIs (see trend # 6) is in direct contrast to this, and it will be interesting to see how far customers are willing to go down the single vendor route.
9. BI on the Mainframe
The concept of running Business Intelligence applications on the mainframe is not new: DB2 was originally marketed as a back-end decision support application for IMS databases. The ability to build a warehouse within the same environment as your operational data resides (and thereby avoid the expensive and time-consuming process of moving that data to another platform for analysis) is attractive to many customers.
IBM is making significant efforts to make this an attractive proposition for more of their mainframe customers. The Cognos tools have been available for zLinux for a couple of years now, and the DB2 for z/OS development team have been steadily adding BI-related functions to the core database engine for years. Significant portions of a typical BI workload can also be offloaded to a zIIP coprocessor (see trend # 3), reducing the CPU costs.
More recently, IBM unveiled its Smart Analytics System 9600 - an integrated, workload balanced bundle of hardware, software and services based on System z and DB2 for z/OS. It has also begun to talk about the Smart Analytics Optimizer - a high performance appliance-like blade for System z capable of handling intensive BI query workloads with minimal impact to CPU.
IBM is serious about BI on the mainframe, and is building an increasingly compelling cost and functionality case to support it.
10. Data Governance
Ensuring that sensitive data is properly secured and audited has always been a concern, but this has received more attention in recent years due to legislation such as Sarbanes-Oxley, HIPAA and others. At the same time, there has been an increasing focus on data quality: bad data can result in bad business decisions, which no one can afford in today's competitive markets. There has also been an increasing awareness of data as both an asset and a potential liability, making archiving and lifecycle management more important.
All of these disciplines and more and beginning to come together under the general heading of data governance. As our database systems get smarter and more self-managing, database professionals are increasingly morphing from data administrators to data governors. A new generation of tools is being rolled out to help, including Infosphere Information Analyser, Guardium and the Optim data management products.
IBM's Smarter Planet initiative
IBM's zIIP Home Page
Database operations using the GPU
DB2 10 for z/OS
DB2 9.7: Run Oracle applications on DB2 9.7 for Linux, Unix, and Windows
IBM Smart Analytics Optimize
IBM Smart Analytics System 9600
IBM Data governance