Usenet.com

www.Usenet.com

Group Index

Comp Thread Archive from Usenet.com

<-- __Chronological__ --> <-- __Thread__ -->

Direct Access to Duplicated Data Items



Hi,

I am developing an application with BerkeleyDB. It is expected to be
built on top of a database where for each object, history of its
changes is stored along with its current state. Once written, versions
are immutable. I found that using a BTREE database with duplicate data
items does good job here. Supplying a duplicates comparison function
that returns -1 regardless of the data items stored, DB->put creates
sort of LIFO, so when retrieving with DB->get, application always gets
the newest (current) version, and using a cursor, previous versions
may be retrieved. So I can store as many versions as storage allows,
not wasting keys.

This was a trivial part, but here is more intriguing stuff.

Changes need to be stored incrementally, using some "instruction"
codes that the application interprets. For example, if version N-1 of
an object contains some list, say, ["a","b","c"] (this is some
imaginary syntax), and version N of the same object must be
["a","b","c","d"] then instead of storing the whole list, it may be
represented as CONCAT <version N-1> ["d"] where CONCAT is some
instruction that the aplication recognizes, and when version N is
needed finally, it is evaluated by the application yielding the
complete list. <version N-1> is some stable reference to the
duplicated data item representing version N-1. Some kind of "lazy
evaluation" known from functional languages like Haskell.

To implement this technique, it is necessary to be able to access each
duplicated data item directly, without iterating with a cursor. Can
this be done? References need to be stable and long-lived (same
lifetime as keys have). I thought about record numbers, but they are
not guaranteed to be stable in BTREE, and are not allowed with
DB_DUP(SORT).
So is this totally impossible? Or there exists some way to do this? I
didn't dig into the BDB sources, and I would be reluctant to use
anything not documented in the Manual, but I couldn't find anything to
achieve my goal in the Manual either.
Of course I may use a key for each version, but this seems to be
wasteful of keys considering that number of objects will be large, and
number of versions may also be large for each object.

Any ideas?

PS Perhaps, relative number of cursor iterations "from version to
version" might be used, but this I am afraid may degrade performance
and will require to keep an open cursor for every object being
accessed.



<-- __Chronological__ --> <-- __Thread__ -->


Usenet.com



Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.