Michael M. David describes how relational and hierarchical data processing can be seamlessly combined in a way that supports and preserves both hierarchical and relational processing advantages while avoiding their disadvantages.
Relational and hierarchical data processing are both controlled by very principled structured data processing operations. However, they are very different in operation and for this reason, they have different advantages and disadvantages in data processing characteristics. These characteristics are listed and described below. This article will describe how these two different principled data processing disciplines can be seamlessly combined in a way that supports and preserves both hierarchical and relational processing advantages while avoiding their disadvantages.
Relational and Hierarchical Processing Advantages and Disadvantages
Relational and hierarchical processing is very different so there is no surprise that what is good for one is often bad for the other. This will be reflected directly below where these characteristics are listed and explained. These good and bad characteristics reflect what is still assumed today such as relational processing cannot support hierarchical processing, which is not an accurate statement today and will be covered later in this article.
- Data Independence
- Can handle M-to-M data relationships
- No duplication of stored data with normalization
Data Independence means that data can be dynamically related through data relationships so that the structures are not fixed and can be created dynamically on the fly. This was the reason that relational databases supplanted hierarchical databases. This data independence also allows Many-to-Many Data Relationships to be performed easily today. This is a capability not supported in physical hierarchical databases. Flat relational data with its Normalized Tables in relational databases and its data independence capability does prevent data duplication in stored data.
- Explosive data replications during processing
- Joins are expensive
- Only flat structures are available
While relational database systems can store data without duplication as described in the paragraph above, relational join processing with its explosive data replication can occur during its Cartesian product processing. This is also a significant reason relational joins can be very expensive and time consuming to process compared to other non relational methods. Flat structures can also be very limiting on their own because of their lack of structure and semantics.
- Useful hierarchical structure with inherent semantics
- Easy and powerful semantic queries
- Data value naturally grows nonlinearly
Hierarchical structures get their inherent powerful processing capability from the semantics contained in their data structure enabling more powerful semantic queries. This self-contained structure semantics allows for powerful semantic queries enabling terse queries that are quick and easy to specify. As hierarchical structures naturally grow larger over time, so does their power with their ever-increasing inherent semantics. This allows the value of their data and querying power to keep increasing naturally and nonlinearly as described further below.
- Fixed inflexible structure
- Inflexibility resulting in duplicated data
- No M-to-M data structure support
While hierarchical structures offer the ability to model structures, they are fixed making them inflexible. This inflexibility also causes the problem of duplicate data because of the need to have different physical structures comprised of the same data. This lack of data independence means that M-to-M structures cannot be fully supported.
How Relational and Hierarchical Processing are Combined
The advent of the Left Outer Join in ANSI SQL-92 with its hierarchical data preservation operation and its new ON clause replaces the WHERE clause enabling specifying join criteria at each join point. This capability enables full hierarchical processing naturally and inherently without ambiguity problems. This also means that SQL's navigationless processing also continues to work normally with hierarchical structured data. The LEFT Outer Join syntax and the new ON clause allow the precise data modeling of full multipath hierarchical structures while its hierarchical semantics perform hierarchical processing. This complete SQL hierarchical processing is handled directly by the ANSI SQL processor.
In addition and unexpectedly, this natural hierarchical processing from the LEFT Outer Join also inherently supports the Lowest Common Ancestor (LCA) processing required to support powerful multipath queries. This allows the unrestricted query to reference any number of multiple pathways without concern of the data structure. This overlapping of relational and hierarchical processing allows for the creation and seamless processing of logical hierarchical structures, which possess the capabilities of both relational and hierarchical capabilities. This means that ANSI SQL can now naturally and dynamically define logical hierarchical structures, which are not fixed. Logical hierarchical structures defined by ANSI SQL keep all of their hierarchical principles while taking on relational principles too. This produces results that are both relationally and hierarchically accurate.
How Relational Structures Support Hierarchical Processing
Relational logical hierarchical processing can support the best of both worlds, but how do flat tabular structures contained in and processed in relational rowsets maintain and support hierarchical processing? They also have to naturally support multiple pathways, which can also be variable in length. This is performed by the hierarchical semantics modeling the multipath hierarchical structure in flat relational structures. The newer LEFT Outer Join with its ON clause operates hierarchically on the working rowset allowing for the specific hierarchical processing of fragment portions of the structure by using the multiple separate ON clauses as it is being built. The null data filler naturally produced by the LEFT Outer Join allows for variable length pathways in relational rowsets.
Combined Relational and Hierarchical Advantages
The following list of combined relational and hierarchical processing advantages below allows the user to pick and choose from a list of characteristics to match their specific need. Each characteristic usually has a good and bad characteristic depending on its use. Relational and hierarchical capabilities can be intermixed because hierarchical processing is actually a subset of relational processing. Intermixed relational and hierarchical operations include the seamless integration of relational and hierarchical data; combining unrelated fixed hierarchical structures using a relational association table; and the direct SQL hierarchical joining of hierarchical structures. Logical hierarchical structures can also take on most relational and hierarchical advantages, which can now support previously mutually exclusive capabilities. An example is relational data independence while utilizing inherent hierarchical semantics.
|Relational:|| * Data Independence|
* Can handle M-to-M data relationships
* No replication of stored data with normalization
|Hierarchical:|| * Useful structure with inherent semantics|
* Easy powerful semantic queries
* Data value naturally increases as structure grows
Combined Relational and Hierarchical Disadvantages
The above list of advantages allows avoiding items from the list of disadvantages below. The relational disadvantages below are avoided by selecting from the hierarchical advantages above. And the hierarchical disadvantages below are avoided by using the relational advantages above.
|Relational:|| * Cartesian product explosive data replications|
* No inherent structure semantics
* Joins are expensive operations
|Hierarchical:|| * Can't model other structures|
* No support for M-to-M data relationships
* Hierarchical duplicate data
Synergy Derived New Capabilities
The synergy from the combined relational and hierarchical processing above also produces powerful new capabilities and advancements in hierarchical data processing. Utilizing relational processing's flexible dynamic querying along with its basic hierarchical processing capability, hierarchical structures can be dynamically and hierarchically mashedup. This processing still results in a valid hierarchical structure to be query processed. By utilizing hierarchical processing occurring naturally in relational rowsets, powerful new data structure transformations and data structure virtualizations can also be performed easily in SQL using the synergy of relational and hierarchical processing.
ANSI SQL's inherent hierarchical processing offers the capability to freely select from relational and hierarchical advantages and capabilities. This paper is meant as an overview for this SQL hierarchical processing topic. For more information on this topic, see a list and description of SQL hierarchical and relational capabilities performed in ANSI SQL described in my previous article, The Top 10 SQL Hierarchical Data Processing Capabilities.