Big Data: Planning for Peak Season

The end of the calendar year is an important one for many companies. Retailers and manufacturers must be ready for the holiday season when consumer buying increases by as much as 40%. In concert with these increased activities, shipping companies will deliver more products and financial institutions will see a marked increase in transactions. Many consumers will purchase products online.

This is a critical time for big data applications. Initial implementations of big data applications focused on product and customer analytics, with the goal of improving customer service, decreasing time-to-market, and providing analysts with actionable intelligence to make business decisions.

In its first phase of implementation, the big data application received and stored data from operational systems, allowing business analysts to use so-call business analytics software to analyze the data for trends. Now we are in the next phase. Big data applications must now create value by feeding data back into operational systems. This becomes even more important during the busiest time of the year. What are the most important things for the IT staff to prepare?

Be Prepared for Scaling Up and Out

Scaling up refers to raw media storage and processing capacity. More data means more storage, and processing larger amounts of data requires either greater computer processor power or a longer period of time to execute.

This leads to the business and IT considering simple raw capacity numbers in forecasting future needs. Will we double the amount of data we will store in the next year? If so, we will need to double the size of our storage media, and double the number of central processor units or servers used to process the data. These rules are predicated on the assumption that the data is typical business data: orders, customers, products, and so forth, and that the data arrives in well-known data types such as currency, dates, and text fields.

With the advent of new data types such as XML, web click streams, and the like, the calculations change regarding capacity planning. A 50% increase in activity may translate to a 100% (or more) increase in required storage, or even a doubling of CPU power required to process this data.

Luckily, these capacity requirements are easily determined by testing in non-production environments. The IT support staff should be ready to re-direct excess capacity (in the form of storage media, network bandwidth and computer power) to support increased production activity.

Purge or Archive Stale Data

Big data applications only get bigger. This is not merely a function of the additional historical data regularly uploaded from operational systems. Getting actionable intelligence from analysis of your data is most efficiently accomplished when sufficient data is available. Your big data application will grow in scope, as additional data from other operational systems is added.

The other side of this growth is decay. As data gets older, it tends to be less relevant. This happens for several reasons:

  • Older data tends to be less accurate, as systems are enhanced to increase data accuracy and minimize missing data;
  • Older data may refer to products no longer available, customers that no longer exist, etc.;
  • Older data will be less relevant as analytics are used to make changes to operational systems;
  • Older data may not be available for new operational systems.

To prevent filling storage media with unneeded data you should already have in place a purge or archive process. Old data that will never be used again is deleted. Data that must be retained in some form due to legal or compliance reasons is usually stored in an archive, where it can be reloaded if needed.

One option is to archive large volumes of stale data on tape, or some other large volume storage media. The data need not be in an easily-readable format, and in order to be re-used must be re-loaded into the big data application. This choice is favored if you are required to retain the data but do not reasonably expect to process it again.

An alternative is to store the stale data on high-volume media that is low-cost or low-performance. This removes the data from your big data application, while allowing for potential query or analysis of the stale data, albeit with a performance penalty.

As you enter peak season, consider increasing the frequency of your purge or archive processes.

Pre-prepare Non-production Environments

Most organizations implement multiple non-production environments, called variously development, test, load testing, user acceptance, pre-production, and so forth. Many of these have specific purposes. For example, the development environment will contain files and tables of relatively small size, and is only suitable for unit testing single programs. In contrast, the load testing environment will usually contain production-sized databases and resources such as disk space and CPU power.

This changes for big data applications. There is no sense creating or testing a big data application in a development environment. Big data applications, by their nature, require large data volumes in order to work properly. For example, consider an analytical application that reads a store of customer data and searches for trends. Without a nearly production sized database to analyze, results would be meaningless.

However, during peak season there are several reasons for pre-preparing one or more of your non-production environments to execute a big data application:

  • With big data applications now providing data to operational systems, these applications become part of any diagnostic or debugging environment. If there is a production issue, you will need a non-production environment containing your big data application in order to debug the problem;

  • A non-production environment with a big data application is an excellent place to certify any new applications or code prior to moving to production;

  • Your organization may be rolling out new systems or functionality to take advantage of peak season; additional testing and user training will take place in a non-production environment, and the big data application may be an essential piece.

Be Ready for Disaster Recovery

Most organizations consider analytical data and queries to be low priority during a disaster. Clearly, they reason, customer-facing systems, accounting systems, order entry applications, shipping and receiving, and so forth need priority. While big data analytics is nice to have, it is not mission-critical. But can it be?

Consider the following scenario. You implement a big data application that analyzes trends in your customer’s buying habits. Internal analysts execute queries against the big data, perhaps through an appliance that gives them extremely fast turnaround. Crazy fast query execution means that more analysts can run more and more queries in a day.

As your application matures various queries and reporting provide valuable information. This is good! So good, in fact, that management implements regular execution of these reports. More and more valuable queries and reports begin to run monthly. Then weekly. Then daily. All providing valuable information.

At some point, management may decide that the valuable and profitable decisions they make based on these regular reports are critical to the business. Yes, critical. At that point, you are now faced with pressure to ensure that this application is available if a disaster occurs. This may occur with no prior warning. Hence, I recommend considering disaster recovery options during implementation of any big data application.

While it may not be needed in the immediate future, the potential exists for such a requirement. Pre-planning will allow you to foresee hardware, software, storage and network requirements for various disaster needs.

Summary

Big data has evolved from a large data store attached to your data warehouse into a fully-integrated application that feeds back to operational systems. As companies enter peak season they should expect larger transaction volumes, leading to requirements for additional computing power and storage media. Your big data application is now a mission-critical part of your enterprise.

Prepare for scaling up and out by making available extra resource capacity. Ensure that you have defined intelligent stale data purge processes, and execute them frequently. Consider the use of non-production environments that include your big data application in order to support potential production issues. Finally, make sure that you have defined and practiced your disaster recovery plan.

See all articles by Lockwood Lyon

Lockwood Lyon
Lockwood Lyon
Lockwood Lyon is a systems and database performance specialist. He has more than 20 years of experience in IT as a database administrator, systems analyst, manager, and consultant. Most recently, he has spent time on DB2 subsystem installation and performance tuning. He is also the author of The MIS Manager's Guide to Performance Appraisal (McGraw-Hill, 1993).

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles