Arshad Ali explores Proactive Caching, available in SQL Server Analysis Services 2005 and later versions. Proactive Caching helps DBAs to better control the cube data automatic refresh frequency. Proactive Caching offers near real time data for reports, while at same time providing a query performance similar to MOLAP storage mode.
SQL Server Analysis Services offers three basic storage modes (MOLAP, ROLAP and HOLAP) to store cube data and aggregations. These storage modes determine storage location, storage space requirements, cube processing and cube query performance. For example. the default MOLAP storage mode stores both the cube data and its aggregation in the OLAP server itself and hence it requires more space on the OLAP server and provides the fastest query performance. In ROLAP mode, both data and aggregation are stored in the relational data store and no additional space is required at the OLAP server (other than to store meta data, which is stored on the OLAP server anyway, irrespective of any storage modes) but query performance is slowest in this case. HOLAP storage mode is a hybrid between MOLAP and ROLAP; it does mean in this storage mode data is stored in the relational data store whereas the aggregations are stored in the OLAP server; in this mode there is a trade-off between required storage space and query performance.
SQL Server Analysis Services 2005 and later versions have a feature called Proactive Caching, which allows the database administrator to better control the cube data automatic refresh frequency so that reports based on these can get near real time data while at the same time providing a query performance similar to MOLAP storage mode.
Mostly data warehousing and Business Intelligence (BI) applications work on historical data, which are normally refreshed once in a daily, weekly or monthly basis. Although there are some near real time data warehousing and BI applications, their query performance is painfully slow. To help in this scenario, Analysis Services (Microsoft SQL Server 2005 Analysis Services and later versions) provides an amazing feature called "Proactive Caching". This new feature gives a query performance, similar to what you had with historical data, with near real time data and minimal impact on the overall system. As a BI developer, you just need to configure proactive caching, and then SQL Server Analysis Services (SSAS) will ensure that you get the performance of a fully processed data warehouse on near real time data.
How it works
When you configure proactive caching for a cube, a measure group partition or a dimension, SSAS starts listening to data change notifications from the underlying relational database. On the relational database side, if there is any change in the data of the related objects, an event is generated and notification is sent. SSAS intercepts this notification and start refreshing the MOLAP cache. Depending on your configuration, during that time frame SSAS either drops the Multidimensional OLAP structure (MOLAP cache) and serves the query request from the underlying relational database (behaves as ROLAP storage mode) or creates another MOLAP cache in parallel to the outdated one. As long as the new MOLAP cache is being built, the queries are served from the outdated MOLAP cache (of course query result will be based on old data while the new MOLAP cache is being built). The moment the new refreshed MOLAP cache is ready, all queries are switched back to this new MOLAP cache and the outdated MOLAP cache is dropped. This process of synchronization happens in the background so there is minimal impact on the overall system.
While configuring the proactive caching you can specify the following:
- How frequently the MOLAP cache is to be rebuilt
- From where SSAS should get data while the data refresh is in progress; it should get it from underlying relational database or from the outdated MOLAP cache.
- The MOLAP cache should be refreshed automatically when a notification arrives or on specified interval/scheduled basis.
After enabling proactive caching, you get the best query performance of MOLAP storage mode on the most recent data. You should consider if your clients want to see recent data without compromising with the query performance.
In MOLAP storage mode (default mode), detail data and aggregations are stored in multidimensional format at the SSAS server and no notifications are received by the SSAS server because proactive caching does not work in this mode. MOLAP storage mode is applicable if you want the best query performance, freshness of data is not that important and you have enough storage space on the SSAS server. You can either schedule processing of MOLAP cache or process it manually whenever required. Apart from this mode, all other storage modes discussed below support enabling proactive caching.
Like MOLAP storage mode, in Scheduled MOLAP mode the detail data and aggregations will be stored in multidimensional format at the SSAS server. The only difference is, MOLAP cache is automatically rebuilt on a 24 hour schedule by SSAS itself (SSAS does not listen for change notification from relational data source; it just does the processing every 24 hours). As a Data Warehouse/BI developer, to do the processing you aren't required to create an SSIS Package or SQL Server agent job with this mode.
Scheduled MOLAP is suitable for applications that have daily data refresh requirement.
Again, like MOLAP storage mode, in Automatic MOLAP mode the detail data and aggregations will be stored in multidimensional format at the SSAS server but in this mode, SSAS listens for change notifications from the underlying relational database source. If any update happens in the relational database, an event is triggered and SSAS is notified about the data changes. On receiving this change notification, SSAS automatically starts refreshing the cube or partition data. It actually a creates a new MOLAP cache and also maintains the old one in parallel so that the existing request can be served from the outdated/old MOLAP cache. The moment creation and processing of new MOLAP cache is complete, all query requests are switched back to the new MOLAP cache and the old one is dropped.
If you want processing to occur, to refresh the data in MOLAP, with a latency interval of 2 hours and if you also want high query performance, this mode would be your choice. During rebuild and processing of the new MOLAP cache, the query gets stale data as it is accessing the outdated MOLAP cache.
With this mode there are two important settings (Silence Interval and Silence Override Interval) that you need to be aware of. Silence Interval has a default value of 10 seconds, which means after receiving a change notification SSAS waits for another 10 seconds to see if there is any more change notification coming (this ensures more updates can be considered in a single processing). If another change notification does not arrive in 10 seconds, SSAS will start processing the cube or partition. If any change notification arrive within 10 seconds then it waits for another 10 seconds until the Silence Override Interval is reached (default is 10 minutes). Silence Override Interval is required because without Silence Override Interval, your cube will never be processed if continuous updates are happening at relational data source.
Medium Latency MOLAP
Like the MOLAP storage mode, in this mode the detail data and aggregations will be stored in multidimensional format at the SSAS server. Also, as in case of Automatic MOLAP, SSAS listens for change notifications from the underlying relational database. On receiving this change notification, SSAS automatically invalidates the existing MOLAP caches and starts reprocessing it (notice it does not create another MOLAP cache in parallel to the old one, as in the case of Automatic MOLAP). While reprocessing, the queries are switched to real time ROLAP (discussed in the next article) and once the processing of the MOLAP cache is complete; queries are again switched back to the MOLAP cache. Since during reprocessing real time ROLAP is used to serve the queries, performance would obviously be slower during this time frame.
This mode has a target latency of 4 hours; it means SSAS will drop out the MOLAP cache after 4 hours of receiving any change notification from the underlying relational database. Similar to automatic MOLAP, in this case Silence Interval has a default value of 10 seconds and Silence Override Interval has a default value of 10 minutes.
You would use this storage mode when the relational data source is updated infrequently and query performance is important for your users.
Low Latency MOLAP
As you might have guessed, Low Latency MOLAP is, to some extent, similar to Medium Latency MOLAP mode. The only basic difference is the latency; in this mode, the processing occurs automatically as data changes with a target latency of 30 minutes after the first change notification is received.
A point to note here is that since this mode does switching to real time ROLAP frequently (30 minutes after a change notification), the performance of a query would be poorer.
You would use this storage mode when the relational data source is updated frequently and refreshed data is more important than query performance.
In this article, I've discussed a new feature, called Proactive Caching, of SSAS 2005 and later versions. This new feature gives similar query performance (as you had with historical data) with near real time data, with minimal impact on the overall system. As a BI developer, you just need to configure proactive caching, and then SQL SSAS will ensure you get performance of a fully processed data warehouse on near real time data without creating any SSIS packages or SQL Server Agent jobs. In this article, I have also covered different variants of settings if the storage mode is MOLAP. In my next article, I will be covering settings for HOLAP and ROLAP, and how to configure proactive caching.