
www.Usenet.com
| <-- __Chronological__ --> | <-- __Thread__ --> |
Hi, I am developing an application with BerkeleyDB. It is expected to be built on top of a database where for each object, history of its changes is stored along with its current state. Once written, versions are immutable. I found that using a BTREE database with duplicate data items does good job here. Supplying a duplicates comparison function that returns -1 regardless of the data items stored, DB->put creates sort of LIFO, so when retrieving with DB->get, application always gets the newest (current) version, and using a cursor, previous versions may be retrieved. So I can store as many versions as storage allows, not wasting keys. This was a trivial part, but here is more intriguing stuff. Changes need to be stored incrementally, using some "instruction" codes that the application interprets. For example, if version N-1 of an object contains some list, say, ["a","b","c"] (this is some imaginary syntax), and version N of the same object must be ["a","b","c","d"] then instead of storing the whole list, it may be represented as CONCAT <version N-1> ["d"] where CONCAT is some instruction that the aplication recognizes, and when version N is needed finally, it is evaluated by the application yielding the complete list. <version N-1> is some stable reference to the duplicated data item representing version N-1. Some kind of "lazy evaluation" known from functional languages like Haskell. To implement this technique, it is necessary to be able to access each duplicated data item directly, without iterating with a cursor. Can this be done? References need to be stable and long-lived (same lifetime as keys have). I thought about record numbers, but they are not guaranteed to be stable in BTREE, and are not allowed with DB_DUP(SORT). So is this totally impossible? Or there exists some way to do this? I didn't dig into the BDB sources, and I would be reluctant to use anything not documented in the Manual, but I couldn't find anything to achieve my goal in the Manual either. Of course I may use a key for each version, but this seems to be wasteful of keys considering that number of objects will be large, and number of versions may also be large for each object. Any ideas? PS Perhaps, relative number of cursor iterations "from version to version" might be used, but this I am afraid may degrade performance and will require to keep an open cursor for every object being accessed.
| <-- __Chronological__ --> | <-- __Thread__ --> |