Many big data applications are designed, built and installed without a formal load test. This is unfortunate, as load testing gives the database administrator quite a lot of valuable information. It may make the difference between poor and acceptable big data performance. This article reviews big data application implementation practices with some proactive tips on load testing to make your production implementation a success.
Why is Big Data Different?
Load testing includes a combination of one or more of the following:
- Integration testing. Ensure application modules join together well.
- Systems testing. Confirm the application works well with other applications.
- Recovery testing. Can the application recover correctly and within service level agreements after a failure?
- Security testing. Verify that users have necessary and sufficient security to perform functions.
- Stress testing. Tests various combinations of transaction loads, including a number of concurrent users, volume of input data, and transaction load on the database. In some cases, it is necessary to determine at what load point the application or database will fail (sometimes called test-to-failure).
A big data application implementation is usually accompanied by a hybrid hardware/software solution such as IBM's special-purpose appliance: the IBM DB2 Analytics Accelerator (IDAA). Appliances and other such solutions promise a significant increase in SQL performance, sometimes reducing query runs times from hours to minutes. In this environment some DBAs and managers may not be concerned about performance.
However, despite promises of extremely fast query processing time, DBAs should still insist on a load test. There are several concerns that are still relevant; in addition, there are new performance issues that are unique to the big data environment. Indeed, there may be no existing non-production environment that has both an appliance and enough data to simulate production.
The DBA, application team and release manager should discuss the testing schedule and the level of testing to be performed prior to releasing the application to production. Reasons for executing various test scenarios may depend upon various factors including staff availability and budget, project deliver date, and availability of a suitable database environment for testing.
Some of the performance concerns that can be simulated by a load test include:
- Confirm object definitions. The moment of production implementation is not the time to discover that tables, indexes, or associated structures are not defined correctly. A successfully completed load test gives everyone confidence that all table columns have the correct attributes and that all SQL statements update and retrieve the proper data.
- Hot spots.These are a common occurrence in transactional systems that assign keys to new entities or transactions. Typical examples are an order entry system where new orders are assigned sequentially ascending keys or a financial management system where money transactions are keyed by the date/time of their entry (again, an ascending key). Tables and indexes may be structured to support application functionality and performance for such key assignments. One common method is to store new table rows at the physical end of the table. While this may help performance by clustering the data in key sequence, the table pages at the end of the table will be very frequently referenced. The same is true for indexes that support ascending keys.
In a big data environment use of the appliance avoids these issues, assuming that SQL queries execute within the appliance. Regrettably, some issues may prevent this, such as ineligible SQL statements (see the IBM manuals for these exceptions). Another issue is timing. After loading rows into your base DB2 tables you must then execute a procedure that loads the data into the appliance. During this period, the query subsystem should be configured to access the base DB2 tables, or incorrect results may be produced.
- Deadlocks and timeouts. These symptoms may be caused by one or more separate issues. The most common in load testing is the “initialization problem”. Since the application is processing for the first time, many transactional tables may be empty (e.g., customers, current orders, account activity). Since table and index I/O is done at the block level (usually by the 4K page), many active table rows and index entries will be stored on the same page. Simultaneous insert and update activity for different rows on the same page may cause deadlock or timeout issues. As the tables and indexes grow this issue tends to disappear. Later, the occurrence of these symptoms is most likely due to poor application design, transactions that are too long(i.e., do far too many database updates per unit of work), or poor database design.
Again, while these issues don't occur when data is accessed within the appliance, not all queries will be directed there.
- SQL capture for access path analysis. Load testing may be the first time that users have a chance to execute business intelligence (BI) queries against production-like data. This is the perfect time for the DBA to capture these SQL statements for future analysis.
- Monitor growth patterns. As the application matures, performance may suffer due to table and index size. DBAs consider a pre-defined purge process to be a critical factor to prevent this. However, many times the application development team does not define or create a purge process. The usual reason: “We won’t be purging for a year, so there’s plenty of time to get it defined.” Regrettably, in many cases, the purge process is not defined, and the DBA is left to deal with a combination of huge tables and application performance problems. Far better to deal with this before production implementation, and the load test provides a good place for the DBA to demonstrate how data volume can affect performance.
During the Load Test
In preparation for the load test, the DBA needs to have several processes ready. In the big data environment these processes become even more critical due to the sheer size of data and volume of transactions. Here are the most common.
- Reorgs and RunStats. DB2 chooses access paths for SQL statements based on object statistics. Some of these statistics may determine whether or not a query is executed within the appliance. For the initial execution of the load test, many tables will be empty and their statistics will reflect the current sizes of the tables. However, as the tables and indexes grow the statistics become outdated and possibly detrimental to performance. A typical example: For a simple table access, DB2 may bypass the appliance and choose a tablespace scan access path within DB2 (after all, the statistics show that the table is empty!). To avoid this issue the DBA should be prepared to run index and tablespace reorgs and runstats during the beginning stages of the load test.
- Full Backups after Initial Testing. Since the initial state of the application (many empty tables) will not be a normal state, performance measures during the load test may not be valid. The usual fix for this is to begin load testing and exercise the application for a predetermined period. At this point, the DBA would run reorgs and runstats as described previously, and then execute full image copies of all tables involved in the test. This provides a recovery point that can be used in subsequent rounds of testing, thus bypassing the need for any priming transactions and repeated reorgs and runstats.
- Metrics for Resource Constraints. The most common resource constraints encountered in load testing by mainframe DB2 applications are I/Os, application throughput, and waits due to object locking. Decide ahead of time how these will be measured and what criteria will be used for database service levels.
Tips for a Successful Load Test
- Achieve Agreement on Purge Processes. Initialization issues provide a good lesson for developers and management. Application staff now have a direct experience where data volume and distribution patterns lead to performance problems. Future growth may cause problems too; so, they need to make sure that all purge processes are defined and implemented prior to the move to production.
- Capture and Save SQL. In addition to executing EXPLAINs on captured SQL, the DBA can store the captured statements for future documentation or analysis purposes. This can be of great assistance when introducing a new DBA to the application, or when analyzing a production performance issue.
- Full Backups after Seeding. After initialization is complete, execute image copies for all tables. This provides a recovery point for later testing, speeding up the process.
- Address Production Issues before Production. DBAs prefer to be proactive and deal with potential issues before they become production nightmares. In reacting to production performance problems, the DBAs commonly use tools in the following list. Read it, and imagine being able to apply the correct fixes during load testing rather than during a production crisis.
- Implement a purge process. Load testing can uncover unexpected table and index growth that you may have missed during capacity planning. Before you run out of disk storage or submit a request to buy more, implement a process that purges old or stale data.
- Re-partition tables. Partitioning has a significant effect on performance. During design an account table may have been partitioned by account number range. This allows multiple applications to access account data simultaneously but in different partitions, thus avoiding hot spots. A load test may determine that this is not needed, so you now have the choice of partitioning by date, making it easier and faster to purge old data.
- Change locking granularity. Locking at a high level (pages or tablespaces) may reduce CPU usage; however, it may also cause timeouts and deadlocks. A load test will make these issues more visible, and the DBA may then decide to implement row-level locking.
- Tune SQL statements. The best practice is to have SQL written as part of the application design, allowing the DBA to review it with plenty of time to change the database design if necessary. Regrettably, many times new applications do not define the SQL until later, sometimes as late as implementation. Imagine the time saved if a load test can be used to capture SQL statements and access paths in a production-like environment.
As project managers plan, design and implement big data applications they should plan for a formal load test. Not only does this provide valuable information for the DBA, it gives developers an opportunity to execute SQL statements in a production-like environment. This can result in confirmation of performance expectations, especially in cases where the application has a service level agreement (SLA) with its users. It may also help to point out potential performance issues prior to the production implementation.