User Tools

Site Tools


doc:tools_oodbcreate

OODB - The own OOBD database format

Why an own database format?

There are some reasons/advantages to set up an own format for the OOBD database:

  • the main one: There is no real need to have a full flavored super-duper query engine like “SELECT this, that FROM here, there WHERE all=nothing..”, a simple key → value(s) lookup is all we mostly need
  • low memory usage: the whole search is file based, memory is only used temporary for the found data
  • generic input streams: The db only needs an input stream as source, which is strictly read only in forward direction. By that e.g. encrypted data files can be used.
  • high speed: the whole search is just a balanced binary tree lookup, which makes it fast also on slow devices

How to generate such OODB data files?

Such OODB data files are generated by using an csv (comma separated value) input file, where in fact the values are not separated by commas, but by tabs.

Such an input file, which must fulfill the requirements written below, is been translated with the oodbCreateCli php script:

 oodbCreateCli inputfile.csv > outputfile.oobd

The outputfile.oobd then belongs into the same directory as the Lua script, which wants to use the database.

The Input file Format

The file format of the input file must be as follow:

HeaderLine \n
Line 1 \n

Line n \n

where

HeaderLine =
(colum_name 0)
[\t (colum_name 1)]

[\t (colum_name n)]

Line =
Key \t Values

Values =
(Value_of_Colum 0)
[\t (Value_of_Colum 1)]

[\t (Value_of_Colum n)]

The input file must be sorted ascending by its keys

If there are more as one Value per key, the Values must sorted in that sequence as they should appear in the later query.

Lines, which begins with an # are seen as comment lines and will be surpressed

The OODB (Output) file Format

The generated output format will be as follows:

HeaderLine 0x0
Entry 1
..
Entry n

Entry =
Key 0x0
Offset (for key > Searchstring)
Offset (for key < Searchstring)
Value 1 0x0
[..Values n 0x0]
0x0

Offset =
binary unsigned 32-Bit Big Endian, calculated as skip() value from the fileposition after the second 4-Byte value up to the start of the next key string to be evaluated. To distingluish between an offset of 0 to the next key string and a 0 as the indicator for the end of the search tree, the skip() offset given in the file is always 1 higher as in reality, so 1 needs to be subtracted to have the correct jump width (e.g. Offset in file: 9 means real jump width 9 -1 = 8 = skip(8)

How to read this file:

1 - Read Headerline (from the file beginning until the first 0x0). Store this data for later naming of the found columns.
2 - read key value (0x0- terminated String) and the both next 4 Byte long skip() offsets (= relative file positions) for the greater and smaller key value. If they are 0 (zero), there's no more smaller or greater key available
3 - compare key with search string:
    - if equal, read attached values in an array. This array then contains the search result(s). Return this and the header line as positive search result.
    - if smaller:
          if smaller file position is 0 (zero), then return from search with empty result array.
          if smaller file position is not 0, jump per skip( value - 1 ) to the file postion of the next index string and continue again with step 2
    - if bigger:
          if bigger file position is 0 (zero), then return from search with empty result array.
          if bigger file position is not 0, jump per skip(value - 1 ) to the file postion of the next index string and continue again with step 2
This website uses cookies. By using the website, you agree with storing cookies on your computer. Also you acknowledge that you have read and understand our Privacy Policy. If you do not agree leave the website.More information about cookies
doc/tools_oodbcreate.txt · Last modified: 2014/03/02 08:08 by admin