by Paul C. Zikopoulos
I know it's an old saying, but it certainly is an accurate one in this case. Visualizing data makes it easier to understand and act on. In fact, quite often, patterns or correlations that you never thought existed suddenly become apparent with visualization. The IBM DB2 Universal Database Version 8.2 product (DB2 UDB) comes with free spatial analysis capabilities in the form of the DB2 UDB Spatial Extender. In previous releases, the DB2 UDB Spatial Extender was a chargeable add-on product, but now, with every server edition of DB2 UDB, you can leverage this capability for free. Spatially analyzing your data is not mainstream...yet. In this article, I give you some details about spatial analysis, its history, and its role in DB2 UDB.
The Next Frontier of Business Intelligence - A Little More About Spatial Data
When most people think about spatial data, they think of maps and latitude and longitude (LAT/LONG) coordinates. Spatial components encompass these but so much more. Think about it for a moment. We all live somewhere and that somewhere typically has an address. Addresses can be graphically represented and spatial operations can be performed on them. A typical large distribution warehouse has many aisles of various products that are part of a just-in-time (JIT) distribution channel. These products are quickly located by some sort of placement code. For example, A23D may indicate that the required part is housed in Building A, in aisle 23, on shelf D. This too can be spatially viewed and provide a heartbeat of inventory levels to avoid stock-outs.
Here are just a few examples of industries where spatial analysis can provide strategic or competitive advantage:
- Branch locations
- Merchant locations
- Fire station locations
- Police station locations
- Locations of lines/pipes
- Locations of offices/personnel
- Hospital and doctors' office locations
- Specialized care facility locations, such as sports injury or physiotherapy clinics
- Store locations
- Supplier locations
- Customer addresses
- Billboard locations
- Plant location
- Nearest railhead/transportation
- Address of ticket agencies
There are many other less traditional ways to leverage spatial data. For example, the following figure might depict the progress of a customer's steps through a retail store. The customer's shopping cart is equipped with a Wi-Fi device that receives informational ads based on their location in the store, and broadcasts the path-to-purchase for the store's top selling products. As a customer makes different turns through the store, cross-selling promotional advertisements are displayed on the screen to try and bring the customer back to the high-margin items.
The following figure might represent the signal strength of a cellular phone provider in a specific coverage area based on continual health-probing information returned to broadcast towers from the device itself:
It would be a lot easier to spot the problem areas in this diagram than it would to inspect thousands of credits that were paid back to customers because of weak signal strength and poor service.
And a business initiative that caters to higher income families might leverage census and income distribution data to help them figure out where to open their new location:
So, almost all data has the potential to be spatially analyzed. Here is a list of some of the most common types of spatially analyzed objects:
- Information about anything that can be located on the earth's surface:
- Natural objects: rivers, lakes, mountains, etc.
- Man-made objects: buildings, utility facilities, cell phones, etc.
- Cadastral: property boundary, voting districts, etc.
- Anything represented by a geometry:
- Points: the location of a specific entity such as an ATM
- Lines: the distance between two cities
- Polygons: the areas subject to flooding
- Location and geometry defined by:
- Coordinates: LAT/LONG or X/Y coordinates
- Specific addresses: Use a geocoder to generate the map location
- Landmarks: Use a gazetteer to get coordinates of the White House, for example
Types of Spatial Queries
Being able to spatially analyze data means making it available to end users for query. Clearly, the possibilities are endless, but the following examples give you an idea of the kind of business-related questions you can ask when you are able to think spatially. (I will give you some examples specific to DB2 UDB later in this article.)
- Simple "where" questions (simple predicates)
- Who lives close enough to this group of employees that have set up a carpool?
- Which of my insured properties are subject to the flood zone warning recently issued by the municipality?
- Location with other attributes (complex predicates)
- Show me the videos of the areas around transmission towers near which new tall buildings are proposed.
- Show me all the cities that this waterway touches.
- Spatial coincidence (spatial joins)
- Which sets of employees could be eligible for a carpool?
- Which of my insured properties are at risk because an overflowing river intersects them?
As disk storage becomes cheaper and cheaper, and companies collect more data, competitive advantage can be leveraged in new approaches to data analysis. Since almost every type of data that you can imagine has some sort of spatial component to it, chances are that your data could reveal more to you if it were spatially extended.
The Epochs of Spatial Analysis
The most prevalent relational databases in today's marketplace have some sort of spatial capabilities, but it is not the focus of this article to differentiate among their implementations (though the differences are vast). However, to get a better feeling for the richness of the DB2 UDB implementation, it is important to understand the generations of spatial analysis.
The first generation of spatial analysis uses a geospatial information system (GIS) application that interacts with spatially-aware middleware to perform the analysis that, in turn, interacts with the local file systems for storage requirements. The spatial application's API calls are translated by a middle tier and tools to the proprietary file format on the operating system. Since the storage mechanism is a file system, it is not surprising that none of today's database vendors subscribe to this method of spatial analysis.
While this first generation of spatial analysis provides for rich spatial analysis features, the data is not integrated with the rest of the corporate data. There are many consequences to this type of implementation. First, it impedes corporate-wide decision making since it, by nature, fragments the single version of the truth. The data is transposed and stored away from the watchful eye of IT, which could lead to unique departmental interpretations of the data. Another problem is that issues such as access, performance, backup, security, privileges, concurrency, and more may arise - these are issues that databases are really good at addressing.
To help address the previously mentioned issues, some database vendors chose to support spatial analysis by substituting their relational database for the file system, as shown below:
This second generation addresses the data management issues of the first generation but does not solve another major issue: a 'translator' is required for the spatial application to talk to the database. In other words, the database does not know about spatial components. It does not understand about distances between two points, what a polygon is, what an intersect method is, and more. Loading up the translator with the hefty task of converting spatial dialect to a relational one is not going to help. For example, how do you store a polygon in a database that is not spatially aware? It has to be stored as a large object (LOB), where all input coordinates are detailed. This leads to performance issues, the requirement of specialized tools to work with the data (it isn't just SQL), and more.
All of these issues are solved in the third generation of spatial analysis, which is where DB2 UDB fits, and which is shown below:
The preceding figure shows that capabilities are added to DB2 UDB when the free DB2 UDB Spatial Extender is installed on the server. In this scenario, all tools and applications speak spatially to the database through the SQL API..
Using the DB2 UDB Spatial Extender and DB2 UDB
It is easy to work with spatial data in your DB2 UDB database, but it does require a couple of simple steps to get you going.
After installing the DB2 UDB Spatial Extender, you enable your database for spatial analysis (shown in the figure below). Enabling a DB2 UDB database for spatial analysis adds specific spatial objects to the schema. These objects include spatial abstract data types (ADTs), user-defined types (UDTs), user-defined functions (UDFs), methods, and a special 'grid' index extension for fast spatial searching.
Once the database is enabled for spatial analysis, you simply enable a table that you want to work spatially with. This action can also be performed from the Control Center or the Command Line Processor.
When you enable a table for spatial analysis, an extra column is added to the table to hold the data types and indicate how the spatial values are computed (for example, using a projection with a built-in geocoder). Finally, you load the data and query it.
Now, there is more to it than this. For example, leveraging your existing address information for spatial analysis involves geocoding LAT/LONG coordinates for those addresses (which would get stored as points in the added spatial column). The DB2 UDB Spatial Extender comes with a built-in geocoder, but you can also plug-in your own. However, it is outside the scope of this article to provide detailed setup instructions.
DB2 UDB Spatial Extender Queries
When the database and table are enabled for spatial analysis, and you have loaded data into your tables, you are ready to perform some queries. Since DB2 UDB is a spatially aware database, you can interact with it using your spatial tools or plain old SQL - which is the whole point. DB2 UDB understands the spatial "dialect" and the operations that you want to perform with it.
For example, a telematics application on a PDA may provide its user with a list of nearby Chinese restaurants that serve Peking duck based on the dynamic request of this user. In this case, after the hardware creates a location box using Global Positioning System (GPS) coordinates, it could generate SQL statements similar to these:
SELECT name, description, address FROM restaurants
WHERE Overlaps(location, box(getGPS(),2000,2000))
AND category = 'chinese' AND docContains(menu,
Note the interaction of spatial (box) and relational (category) data.
A geological application may ask DB2 UDB for a list of volcanic eruptions in a specific region. In this case, the SQL could look like this:
SELECT name, year, mag, location from volcano
WHERE ST_Within(location,'0 polygon((-125 43, -125 46,
-120 46, -120 43, -125 43))')
ORDER BY name, year;
It is not so important that you fully understand the SQL in the previous two examples. However, note the highlighted keywords after the WHERE clause for each example I provided. These are spatial functions that are provided by the DB2 UDB Spatial Extender - this is what I mean when I say that DB2 UDB is spatially aware. In this example, DB2 UDB knows that a data type of 'polygon' serves to envelope areas of interest, and the method ST_Within looks for objects within this area.
The DB2 UDB spatial capability becomes even more powerful when you mix it with the built-in federation capabilities found in any DB2 UDB server. For example, the following figure shows a distributed join performed by DB2 UDB:
This type of distributed join brings together data stores that may have been bound to separate line-of-business applications in the past, and presents the data graphically for corporate-wide decision making.
The Geodetic Spatial Extender
DB2 UDB Version 8.2 also introduced a new spatial add-on product (this one is chargeable) called the DB2 UDB Geodetic Extender. While the free DB2 UDB Spatial Extender is well suited for many types of geospatial applications, the curvature of the earth can present problems for certain types of applications.
With the DB2 UDB Spatial Extender, you 'flatten' the earth using different projections (for example, the Mercator projection) and then account for the margin of error they introduce. This may work fine when trying to locate a caller's house for a 911 response call (being off by 10 feet isn't likely to hurt). However, national defense applications such as a missile defense system need more precise measurements and must remove distortion caused by 'flattening' the earth's surface. This is where the DB2 UDB Geodetic Extender comes in. It is outside the scope of this article to discuss this extender in detail, but cartographers and geography students would certainly be interested in it. For example, I was surprised to learn that Greenland is 1/14 the size of Africa - which did not seem to be the case when I think back to my elementary school atlas! Achieving greater spatial precision is really the point of the Geodetic Extender.
Wrapping it All Up
The move to make the DB2 UDB Spatial Extender a free add-on product for all IBM, DB2 Universal Database for Linux, UNIX, and Windows servers opens new opportunities for companies of all sizes. From large to small, we all collect data, and it undoubtedly has some sort of spatial component to it. The question is, are you using it? Or are your competitors?
About the Author
Paul C. Zikopoulos, BA, MBA, is an award-winning writer and speaker with the IBM Database Competitive Technology team. He has more than nine years of experience with DB2 UDB and has written numerous magazine articles and books about it. Paul has co-authored the books: DB2 Version 8: The Official Guide, DB2: The Complete Reference, DB2 Fundamentals Certification for Dummies, DB2 for Dummies, and A DBA's Guide to Databases on Linux. Paul is a DB2 Certified Advanced Technical Expert (DRDA and Cluster/EEE) and a DB2 Certified Solutions Expert (Business Intelligence and Database Administration). You can reach him at: email@example.com.