Generating and storing large volumes of data has reached a critical mass. Where companies once managed megabytes and gigabytes, they now handle terabytes and petabytes of data and storage. Comprehensive application software, the Internet and new computing and storage technologies have made it easier to create, collect and store all types of data. In addition to duplicating data for backup and recovery purposes, enterprise data must also be retained for years to comply with government-mandated retention requirements.
The meaningful collection of data comprises the information that companies rely on to improve decision-making and to gain a competitive advantage. As companies accumulate increasing volumes of enterprise data, managing, accessing and storing this information cost effectively is one of the most critical IT challenges. As a result, the trend toward Information Lifecycle Management (ILM) is giving rise to a class of solutions designed to help companies meet these goals, and database archiving is a critical component in any ILM implementation.
Reducing Storage Costs throughout the Information Lifecycle
The concept that all information has a lifecycle is not really new. This lifecycle begins when data or information is acquired and ends when it is no longer needed and can be deleted. Over time, as the information is accessed less frequently, its business value decreases. However, when this rarely accessed data is needed, its value increases immediately, and it must be accessible on demand. In fact, penalties may be involved if the data is not readily available. For this reason, most companies continue to store rarely accessed data in high-cost, fast performance systems, increasing operational costs and wasting valuable processing resources.
ILM has evolved to define an approach for managing and storing information on the most cost-effective storage medium over time, based on its business value and access requirements. At first glance, ILM shares many similarities with the more familiar concept of Hierarchical Storage Management, which allows for automatically managing information and moving it to a higher or lower performance storage medium based on access rates. However, while HSM is best suited for managing files, ILM is designed to manage all types of data and provides a framework for data and storage management.
Although a simple concept, ILM is not without its challenges in today's heterogeneous IT environments because data can be managed and stored in structured relational databases; in semi-structured file systems, such as email; and as unstructured fixed content, like documents and graphic files. As companies strive to implement ILM, they need to select storage and data management solutions carefully. Among the storage strategies for implementing ILM, the ability to archive information is of paramount importance for reducing costs, managing data efficiently and complying with data/information retention requirements.
How to Get Started Implementing ILM
Before developing an ILM strategy, an organization must first identify all the types of enterprise information managed in its environment. This process provides a comprehensive understanding of what the data is and how it is used, as well as its data retention and storage requirements.
Typical corporate information includes all transactional data from enterprise business applications and the associated databases, such as payroll, customer information systems and purchasing systems. In addition to mission-critical systems, companies must also consider their online collaboration information, such as email, instant messaging (IM) and web-based conferencing. Information residing on local workstations, including documents, spreadsheets and presentations, must also be considered. It is critical to account for all types of information in order to determine the lifecycle, retention requirements and appropriate archiving and storage strategies for managing each type of data.
During this analysis, companies can identify the data that must remain online, as well as the data that should be archived. Upon completing an analysis of corporate data retention and archiving requirements, the next step is to perform gap and risk analysis to identify the types of data that must be archived and data that does not need to be archived immediately. For example, if the decision is to keep data online rather than archiving it, a company must assess the risk of not archiving, in the context of degraded application performance and availability, as well as the increased cost of operations. The goal of an effective ILM strategy is to keep historical data as long as required, but no longer. Consider this approach "Just-in-Time" data accessibility. Information retained past the required time can increase costs associated with storing and managing the data after it is no longer needed.
Understanding Data Retention Requirements
Managing and storing structured, semi-structured and unstructured fixed content data presents unique challenges. In addition there are associated costs for storage and easy access to comply with regulatory requirements. For example, data retention requirements can range from seven years for financial data, 20 years for pharmaceutical research data, and 50+ years for data associated with nuclear facilities. The following questions can help determine how to manage different types of data to comply with regulatory requirements and eliminate unnecessary costs:
- What are the specific regulatory requirements that pertain to each type of data and how do they affect data retention?
- After the data retention period has expired, when can data be deleted?
- Are there business reasons for keeping data beyond the time required to comply with regulatory data retention requirements?
- Are there penalties associated with retaining this data beyond the retention period?
- What is the risk of not archiving this data?
- What is the risk if this data is archived and cannot be retrieved in a timely manner?
- How often will access to this data be required and what is the expected response time for data access?
- What type of media will be used for storage (tape, magneto-optical, compact disk, etc.)?
- What is the shelf life of the selected storage media?
- Can the shelf life be extended through media renewal (copy)?
- Is technology obsolescence a consideration?
- Is off-site storage required?
- Which applications are affected and how are they impacted?
- Are there application requirements for accessing older data?
- What are the application compatibility rules?
After the data has been analyzed, the next step is to select the appropriate solution(s) to help achieve these objectives. The remainder of this discussion will focus on archiving structured data stored in relational databases.
What is Database Archiving?
Companies invest millions of dollars each year in maintaining and upgrading business critical applications that rely on complex relational databases. These databases collect increasing amounts of data for business operations and decision-making. As a consequence, overloaded databases degrade performance and limit the availability of the comprehensive capabilities these applications were designed to deliver. Ironically, most of this data is stored online in production databases but is rarely accessed.
Database archiving allows for archiving and removing this rarely accessed data and storing it on a variety of storage mediums while providing easy access - a critical requirement of most data retention legislation. IT organizations can analyze the best mix of storage alternatives. Maintaining current, active data online and selecting the most appropriate storage medium for archived data ensures a cost-effective balance throughout the information lifecycle. This process also ensures that enterprise application databases are maintained at a manageable size to improve performance and availability of critical systems.
Database archiving also allows IT organizations to maximize the benefits of existing SANs, NAS and HSM storage solutions. Database archiving complements these technologies, especially HSM systems, to enable a best-practice "staged" approach for managing historical relational data. This approach can be an integral part of enterprise ILM.
Selecting an Ideal Database Archiving Solution
In selecting the ideal database archiving solution, companies should consider whether the technology under consideration allows them to:
- Safely archive and remove precise subsets of rarely used data from complex relational databases with 100 percent accuracy,
- Preserve the referential integrity and business context of the data even for the most complex data model,
- Intelligently index archived data to ensure fast retrieval,
- Selectively delete data to retain relevant contextual information in the production database and remove only specific subsets,
- Preview the data after it is archived and before it is deleted from the production database to prevent deleting data inadvertently,
- Store archived data on a variety of alternative storage mediums and keep it "active" for easy access when needed,
- Quickly locate, browse and manage specific archived data no matter where it is stored,
- Selectively restore referentially intact subsets of archived data in a single step. (It should not be necessary to restore large amounts of data for the sake of a few rows),
- Accommodate current and future archiving needs across applications, databases, operating systems and hardware platforms.
Benefits of ILM and Database Archiving
Database archiving is a key component for implementing ILM to manage complex relational data. Safely archiving and removing rarely accessed data frees processing power to improve application performance and availability and to implement new applications - all without expensive hardware and capacity upgrades. In addition, regularly scheduled database archiving frees up significant amounts of disk capacity that can be made available for other uses, saving millions of dollars in hardware and software upgrades. Database archiving provides an effective long-term solution to the problem of explosive database growth, while reducing the Total Cost of Ownership (TCO).
A comprehensive enterprise database archiving methodology must provide the capability to archive data from a variety of relational databases and platforms. The ideal database archiving solution must also ensure the referential integrity and business context of the archived data and provide for easy access. In addition, there must be a capability for managing and storing archived data on the most cost-effective storage medium (online in an archive database, near-line on a file server, optical devices, or offline to tape or disk-based WORM devices). Implementing database archiving is an integral part of every ILM strategy to ensure that data and storage resources are well managed throughout the information lifecycle.
Jim Lee joined Princeton Softech in 1997 and under his direction as the Development Manager, the project teams have delivered several new releases of Princeton Softech's Relational Tools" and Archive for DB2". He also contributed his technical expertise at several Database Archiving Workshops" to assist companies in determining the benefits of an effective database archiving strategy.
In his new role, Jim will direct Princeton Softech's Product Marketing efforts, coordinating with Development, Marketing Communications and Sales. With over 15 years of experience in application development and consulting, Jim's background includes application development, short and long-term product planning, risk assessment, cost-benefit analysis, customer consulting and evaluating emerging technologies.
Jim holds a Bachelor of Science Degree in Computer Applications and Information Systems and a Bachelor of Science Degree in Accounting from New York University, College of Business and Public Administrations.