Table of Contents
OODB - The own OOBD database format
Why an own database format?
There are some reasons/advantages to set up an own format for the OOBD database:
- the main one: There is no real need to have a full flavored super-duper query engine like “
SELECT this, that FROM here, there WHERE all=nothing..
”, a simple key → value(s) lookup is all we mostly need - low memory usage: the whole search is file based, memory is only used temporary for the found data
- generic input streams: The db only needs an input stream as source, which is strictly read only in forward direction. By that e.g. encrypted data files can be used.
- high speed: the whole search is just a balanced binary tree lookup, which makes it fast also on slow devices
How to generate such OODB data files?
Such OODB data files are generated by using an csv (comma separated value) input file, where in fact the values are not separated by commas, but by tabs.
Such an input file, which must fulfill the requirements written below, is been translated with the oodbCreateCli php script:
oodbCreateCli inputfile.csv > outputfile.oobd
The outputfile.oobd then belongs into the same directory as the Lua script, which wants to use the database.
The Input file Format
The file format of the input file must be as follow:
HeaderLine \n
Line 1 \n
…
Line n \n
where
HeaderLine =
(colum_name 0)
[\t (colum_name 1)]
…
[\t (colum_name n)]
Line =
Key \t Values
Values =
(Value_of_Colum 0)
[\t (Value_of_Colum 1)]
…
[\t (Value_of_Colum n)]
The input file must be sorted ascending by its keys
If there are more as one Value per key, the Values must sorted in that sequence as they should appear in the later query.
Lines, which begins with an # are seen as comment lines and will be surpressed
The OODB (Output) file Format
The generated output format will be as follows:
HeaderLine 0x0
Entry 1
..
Entry n
Entry =
Key 0x0
Offset (for key > Searchstring)
Offset (for key < Searchstring)
Value 1 0x0
[..Values n 0x0]
0x0
Offset =
binary unsigned 32-Bit Big Endian, calculated as skip() value from the fileposition after the second 4-Byte value up to the start of the next key string to be evaluated. To distingluish
between an offset of 0 to the next key string and a 0 as the indicator for the end of the search tree, the skip() offset given in the file is always 1 higher as in reality, so 1 needs to be
subtracted to have the correct jump width (e.g. Offset in file: 9 means real jump width 9 -1 = 8 = skip(8)
How to read this file:
1 - Read Headerline (from the file beginning until the first 0x0). Store this data for later naming of the found columns. 2 - read key value (0x0- terminated String) and the both next 4 Byte long skip() offsets (= relative file positions) for the greater and smaller key value. If they are 0 (zero), there's no more smaller or greater key available 3 - compare key with search string: - if equal, read attached values in an array. This array then contains the search result(s). Return this and the header line as positive search result. - if smaller: if smaller file position is 0 (zero), then return from search with empty result array. if smaller file position is not 0, jump per skip( value - 1 ) to the file postion of the next index string and continue again with step 2 - if bigger: if bigger file position is 0 (zero), then return from search with empty result array. if bigger file position is not 0, jump per skip(value - 1 ) to the file postion of the next index string and continue again with step 2