next up previous
Next: 5 Indexing and searching Up: Schema and Tuple Trees: Previous: 3 Schema trees

4 Tuple Trees

A tuple tree contains all data for a particular tuple in the database. It represents data from all the joined relational tables described by the associated schema tree. Structurally, it is an exact duplicate of the schema tree with a level of instance nodes placed after each level of attribute nodes (including the leaves).

The tuple tree allows programs to manipulate entire sets of related rows with simple operations. We can view, add, delete, or modify branches. We can produce rows relating to a branch by traversing only the part of the tree that is of interest.

For example, an application might create a new branch of a tuple tree. Each attribute is initialized to have a single instance. The application might then build a new instance for an expandable subtree. Then it might assign values to leaf attributes.

We might also want to delete an entire branch of the tuple tree. Suppose we have the following schema and corresponding tuple:

        # Schema
        Name
        Address ( Street City State Zip Phones* )*
        Phones ( Desc Number )*

        # Tuple
        Name = "Joe Rohn"
        Address (
            Street = "123 Anyway Dr."
            City = "Georgetown
            State = "Virginia"
            Zip = "41908"
            Phones = "(654) 111-1121"
            Phones = "(654) 111-1122"
        )
        Address (
            Street = "312 Anyway Dr."
            City = "Georgetown
            State = "Virginia"
            Zip = "41908"
            Phones = "(654) 111-1123"
            Phones = "(654) 111-1124"
        )
        Phones (
            Desc = "Home"
            Number = "(654) 234-1232"
        )

If we wish to view a table consisting of the Address.Street and Address.Phones fields on this tuple, we traverse the tuple tree and produce the following table:

Address.Street Address.Phones
123 Anyway Dr. (654) 111-1121
123 Anyway Dr. (654) 111-1122
312 Anyway Dr. (654) 111-1123
312 Anyway Dr. (654) 111-1124

If we delete the branch of the tuple tree corresponding to the Address field containing ``312 Anyway Dr.'' we also implicitly delete the two phone numbers ``(654) 111-1123'' and ``(654) 111-1124'' associated with that address.

4.1 Identifying leaves

We must be able to uniquely identify individual leaves in the tuple tree. Without this ability, we cannot construct the tree from the values nor can we tell which attribute values are related.

   figure113
Figure 2: Enumerating the leaves of a tuple tree

A tuple tree has the following convenient properties: (1) each attribute node has a unique local name (that distinguishes it from its siblings), and (2) each instance node has a unique local number (that distinguishes it from its siblings). In Figures 1 and 2, attribute-node siblings are shown surrounded by a gray box. We number all the attribute-node siblings (within a single gray box) 1, 2, 3, and so forth. Then we assign each leaf a label formed by the concatenation of the local numbers on the path from the root to that leaf, using a dot as a separator, as shown in Figure 2. The superscripts in the figure refer to the attribute number. Therefore, the first instance of Invoice.Item.Id (first item in the seventh invoice) is labelled ``2.7.2.1.1.1''. We call these leaf labels leaf identifiers. Prefixes are called path labels. Integers at odd positions in a path label (that is, 2, 2, 1) are due to attribute numbers; integers at even positions (7, 1, 1) are due to instance numbers.

4.2 Expandable attributes and their relation to tables

We can think of each expandable Qddb attribute A as a separate relational table. The subattributes of A form the columns of the table. (If A is not structured, the table has one column.) In addition, the equivalent relational tables must contain a link column with a unique identifier for joining with other tables. For example, an attribute A whose schema is A ( B C D )* is a table with columns B, C, and D. A single tuple gives rise to several rows in that table, one for each instance of attribute A.

Since both structured and non-structured attributes can be expandable, we can easily describe very complicated relationships between the tables. For example, we might have the following schema:

        Location (
            Name*
            Address ( Street City State Zip Contact* )*
            Phone ( Desc AreaCode Number )*
        )

This Qddb example is equivalent to a relational Location table containing columns Name, Address, and Phone. The Address column contains unique identifiers linking it to a separate Address table, which contains, besides the link column, columns Street, City, State, Zip, and Contact. The Contact column contains unique identifiers linking it to a separate Contact table, which contains, besides the link column, a single data column. Similarly, the Name and Phone columns of the Location table are links to other tables. In all, there are five tables, interlinked with a web of unique identifiers.

The tuple tree is a convenient abstraction for manipulating related rows in possibly many relational tables. Relational rules may be enforced at the application level to ensure full compliance with the relational model (for example, to enforce non-empty fields or indivisible search values.) Any set of tuple trees can be decomposed into proper relational tables, although unique link fields must be invented when performing this decomposition. Any set of relational tables may also be composed into a set of tuple trees; existing link fields are extraneous and may be removed.

4.3 Generating all rows

 

The tuple tree lets us construct all rows pertaining to the tuple without expensive join operations on large tables. The following algorithm produces all rows from a given node in a tuple tree:

    procedure ProduceAttributes (AttributeNode) : set of rows
        answer := NULL
        for each Inst := instance of AttributeNode do
            answer := union(answer, ProduceInstances(Inst))
        return answer

    procedure ProduceInstances (InstanceNode) : set of rows
        if leaf(InstanceNode) then
            return InstanceNode.Value
        else
            answer := NULL
            for each Attr := attribute of InstanceNode do
                answer := cartesian product(answer, 
                    ProduceAttributes(Attr))
            return answer

The union operator returns a table that combines all the rows of its two operands, which agree on number and type of columns. The cartesian product operator returns a table that combines all the columns of its two operands, with as many rows as the product of the rows of the two operands.

We can build all the complete rows of the tuple tree for a given tuple:

ProduceInstances(RootNode)

We can also build partial rows pertaining to subtrees. For example, we could find all information about the 7th invoice for a particular tuple as follows:

ProduceInstances(2.7)

The result is a table with columns:
    Invoice.Id Invoice.Item.Id Invoice.Item.Price Invoice.Total
The values for the Invoice.Id and Invoice.Total columns are the same for all rows. We can narrow our focus and consider only items in the 7th invoice:

ProduceAttributes(2.7.2)

The result is a table with columns Invoice.Item.Id and Invoice.Item.Price.

Figure 3 depicts a tuple tree for the schema A ( B C* D )*. An invocation of

ProduceAttributes(RootNode)

on this tuple tree returns the following table of complete rows:

A.B A.C A.D
once upon dreary
once midnight dreary
while pondered weak
over many curious
over quaint curious

   figure145
Figure 3: Traversing a tuple tree

Now suppose we aren't interested in the A.C attribute. Excluding the undesired attributes from the traversal builds the following table, which accurately restricts the previous one:

A.B A.D
once dreary
while weak
over curious

4.4 Sparse data

Many database applications manipulate sparse data; that is, many fields are empty in any given tuple. One advantage of the tuple tree is that we can reconstruct the entire tuple tree from only those leaves containing data. Qddb's external representation only stores attributes with non-empty data. For example, consider the following modification of the tuple tree in Figure 3 to include fewer populated leaves:

1.1.2.1 upon
1.1.2.2 midnight
1.2.1.1 while
1.3.2.1 many
1.3.2.2 quaint
1.3.3.1 curious

From these leaves, we know there are three instances of attribute A, the first and third of which contain two instances of A.C. When Qddb reads in these leaves, it can build an entire internal tuple tree. The algorithm for building tuple trees from a given set of leaf values (each with a leaf identifier) is as follows:

  1. Construct a tuple tree containing one instance for every attribute in the schema. All leaves start with empty values.
  2. For each given leaf value, follow its path label, diving down the tuple tree according to the label's components.
    1. If we we reach an instance that doesn't exist, create the instance and its entire subtree, giving all leaves empty values.
    2. When we reach a leaf, set its value to the given value.

The implementation of this algorithm is most efficient when the leaves are presented in left-to-right order by leaf identifier. The external representation of Qddb data always maintains order because it is generated from a left-to-right depth-first traversal of an internal tuple tree.

4.5 Locking

Locking rows in a traditional relational database can be a non-trivial task. A set of related rows may be spread across many tables, requiring many locks. Locating the rows to lock may require queries on many tables.

A tuple tree is stored externally as a contiguous list of leaves. The entire contiguous region of the database file can be locked to achieve a lock on a given tuple. Qddb does not lock at a finer grain (such as a subtree of the tuple tree), although subtrees are also stored contiguously, and Qddb therefore could easily accomplish finer-grained locking. It is straightforward to lock multiple tuple trees if the need arises.


next up previous
Next: 5 Indexing and searching Up: Schema and Tuple Trees: Previous: 3 Schema trees

Herrin Software Development, Inc.