next up previous
Next: 11 Packaging Up: An ASCII Database Previous: 9 Comparisons to commonly

10 Discussion

We have shown that a large performance increase over general databases can be achieved when managing relatively stable data. Our approach allows a very flexible database format yet provides performance that is as good or better than other databases that have inflexible formats. Our Ascii format allows portability between architectures, programmer readability, and a degree of error tolerance. The error tolerance can be important if a particular disk error cannot be corrected by the underlying operating system. A bad sector in our relation files does not affect the entire relation, and the relation can be manually corrected by an experienced user. TupleLists allow the entire solution set of a query to be found without reading any tuples from the relation.

We have detailed enhancements to the structure files that allow fairly efficient regular-expression and numeric-range searches and limited update without severely affecting search performance. Database administrators may choose to build optional structure files at stabilization time to provide only those features that enhance performance for a particular application. These options are currently specified in Scheme.

We have compared our approach to the approaches commonly used in generic database design. We purposely disregarded transaction management details such as file and record locking during update, since we feel that such necessities are obvious. The fact that the instable parts of a relation are segregated may make rollback particularly easy in qddb. Increasing version numbers may be appended to the file names of old updates to facilitate rollback of multiple updates.

We suggest that a specific implementation of our approach should stabilize the relation on a regular basis; but of course, the interval should depend upon the specific application. We foresee that some large relations may need stabilization only a few times per year, whereas others may need stabilization on a nightly basis. A consequence of our current implementation is that the relation must be inactive during a stabilization. We might be able to reduce the time a relation is unavailable by allowing new Additions and Changes directories to be built during stabilization.

The major disadvantages of our approach are threefold: performance degradation on updated relations, the computation needed for stabilization and the attendant unavailability of the data, and the storage required to hold the structure files. We accept the performance penalties for updated relations, since our initial assumption was that we are dealing with relations that are rarely updated. The computation required to stabilize the relation cannot be avoided; the temporary unavailability of the data, however, may make our approach unacceptable to applications that require high availability. We have noticed the storage required for our Index files is usually at least as large as the relation, and in many cases, larger. Much of this size is attributable to our definition of a key; some words such as ``Road'' (for the student relation) or ``the'' (for a bibliography/abstract relation) appear many times and produce large Index files. A more restrictive definition of a key would significantly decrease the size of the file. Even with large relations, the storage cost should not be a prohibitive factor, since the cost of large disk drives is rapidly decreasing. The stable part of the relation could even be stored on write-once optical disks, with records indicating which stable tuples are invalid kept on ordinary writable store.

We have observed that our methods perform equally well regardless of the size of the relation. A relation with 1,000 entries requires roughly the same amount of time to satisfy a query as a relation with 1,000,000 entries. The major difference in speed is the startup time for the query program to read HashTable. RandomHashTable seems to virtually eliminate this difference. Query times do not increase noticeably when multiple-attribute searches are performed. We suggest that our methods are quite suitable for PC class machines or slow optical disks.


next up previous
Next: 11 Packaging Up: An ASCII Database Previous: 9 Comparisons to commonly

Herrin Software Development, Inc.