In this section we discuss ideas that might make qddb more suitable to a wider range of applications. We do not contemplate implementing them at this point.
A tunable definition of a key would be a useful way to trim Index to a more reasonable length. A key-definition language could be supplemented by a list of specific words to exclude from Index. The query processor could warn users when queries contain invalid keys.
We have chosen to use delimited words as keys; this choice colors what queries are simple and which are complex. We could use suffix arrays [MM90] instead of hashing in order to allow a quick query for any substring. This data structure requires about four times as much storage as Stable, but provides search time for arbitrary strings.
Our approach is not appropriate for large relations that are heavily updated. The penalties incurred by search increase linearly with each update. However, stabilization is too expensive to undertake frequently in a large relation.
We will present two approaches to this dilemma. The first partially stabilizes after every few changes. The other incrementally stabilizes after every change.
Each partial stabilization is a group of Stable and structure files, each group containing a subset of the total relation. Any query needs to search all the groups, so we would prefer to limit how many there are. However, such a limit implies that the groups have many members, making them expensive to create.
One policy for maintaining groups is to merge groups when there are too many. Let n be a threshold value, say 100. We will allow at most one group of size for any . As soon as there are n instable tuples, they are stabilized into a group by themselves. Such a stabilization may lead to two groups of size n. In general, if there are already groups of size for , but not for i = j, all are merged with the new group and become one new stable group of size .
Incremental stabilization is an alternative approach that aims at reducing the costs involved in searching a updated relation. These costs are
We can greatly reduce these costs by performing intermediate calculations during each update. We maintain secondary structure files that pertain to changes and additions: Index.Change, HashTable.Change, Index.Addition, and HashTable.Addition, identical in function to their stable counterparts. The optional structure files could also be included. The locators in the new Index files contain only a file number (same as the file name in the Changes and Additions directories), that is, the serial number of the updated tuple. The file number is used instead of an offset in the locator because it is the file name and not the offset in Stable that must be used to retrieve the data.
Each search performs the following operations: (1) search via the primary structure files, (2) search via the secondary files, (3) ignore stable TupleList nodes that have the same identifier as a changed node (we use the offset in Stable as a unique identifier and use that identifier to name the files that contain changed entries).
Each search suffers two additional open/read system-call pairs (one for additions and one for changes) with an additional open/read system call pair for each entry that matches the query. This method totally eliminates cost 1 and greatly reduces the computation necessary for costs 2 and 3.
An update to the relation is performed by deleting all previous references to the updated entry in all secondary Index files (to avoid inaccurate locators) and appending a new locator to all appropriate entries in the correct secondary Index. An updated entry is marked invalid in Stable. The cost of this approach is quite reasonable when the number of change and addition entries is relatively small. As the number of updates increases, the cost of performing an update increases due to the increasing size of the corresponding Index file.
We do not currently support transactions in qddb. Failure atomicity could be implemented by giving each transaction private supplemental Changes, Additions, and Deletions directories. These directories would contain only those tuples that are updated from the standard directories. The Stable file would not be updated during a transaction, but entries in the supplemental Deletions or Changes directories would override entries in Stable. The ordinary Changes and Additions directories would maintain their usual meaning.
Aborting a transaction only requires discarding the private directories. Committing a transaction requires merging private changes, additions, and deletions into the ordinary files. New changes replace conflicting old ones. Where there is no conflict, the new change invalidates the corresponding entry in Stable. New additions are renamed if necessary to give them unique numbers. Deletions invalidate corresponding entries in Stable.
Qddb currently uses record locking to insure that individual tuples remain consistent. Implementation of appropriate locking strategies would be necessary to guarantee serializable transactions.
Qddb can be easily enhanced to distribute replicated data. Given the structures outlined above for failure atomicity, we can transfer updates performed on one site to the other sites by sending the Changes, Additions, and Deletions directories. Restabilization can occur on each site independently and concurrently, so there is no need for a central site to transfer the entire relation at any time.
We could allow the specification of default values in the Schema file for applicable attributes. Default values are useful when a particular attribute has a value that occurs frequently. For example, the Country attribute in the student database of Figure 1 might usually have the value USA, which would be a reasonable default.
If the relation designer wishes each tuple to satisfy some consistency constraint, we could easily provide a hook for checking after each update. However, most such consistency checking can be layered on our library routines. We do not currently plan to build any consistency checking utilities or routines.
We could allow some attributes to hold values not stored in Stable but computed on demand from the values of other attributes. For example, in our student database of Figure 1, the value of a MailingLabel attribute could be computed from the Address attribute, and GradePointAverage could be computed from the Course attribute. Again, hooks could be provided for the programmer to provide these routines.