User Tools

Site Tools


doc:oodbcreate

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Last revisionBoth sides next revision
doc:oodbcreate [2012/07/25 19:41] – created admindoc:oodbcreate [2012/07/26 08:15] admin
Line 1: Line 1:
 ====== OODB - The own OOBD database format ====== ====== OODB - The own OOBD database format ======
  
-Why an own database format?+===== Why an own database format? =====
  
-There are some reasons/advantages to set up an own format for the OOBD:+ 
 +There are some reasons/advantages to set up an own format for the OOBD database:
   * the main one: There is no real need to have a full flavored super-duper query engine like "''SELECT this, that FROM here, there WHERE all=nothing..''", a simple key -> value(s) lookup is all we mostly need   * the main one: There is no real need to have a full flavored super-duper query engine like "''SELECT this, that FROM here, there WHERE all=nothing..''", a simple key -> value(s) lookup is all we mostly need
-  * low memory usage: the whole search is file based, memory is only used for the found data +  * low memory usage: the whole search is file based, memory is only used temporary for the found data 
-  * generic inputstreams: The db only needs an inputstream as source, which is strictly read only in forward direction. By that e.g. encrypted data file can be used.+  * generic input streams: The db only needs an input stream as source, which is strictly read only in forward direction. By that e.g. encrypted data files can be used.
   * high speed: the whole search is just a balanced binary tree lookup, which makes it fast also on slow devices   * high speed: the whole search is just a balanced binary tree lookup, which makes it fast also on slow devices
  
  
  
-How to generate such OODB data files?+===== How to generate such OODB data files? ===== 
  
 Such OODB data files are generated by using an csv (comma separated value) input file, where in fact the values are not separated by commas, but by tabs. Such OODB data files are generated by using an csv (comma separated value) input file, where in fact the values are not separated by commas, but by tabs.
Line 18: Line 20:
  
    oodbCreateCli inputfile.csv > outputfile.oobd    oodbCreateCli inputfile.csv > outputfile.oobd
-    + 
-   +
 The outputfile.oobd then belongs into the same directory as the Lua script, which wants to use the database. The outputfile.oobd then belongs into the same directory as the Lua script, which wants to use the database.
  
-===== Input file Format =====+===== The Input file Format =====
  
  
 The file format of the input file must be as follow: The file format of the input file must be as follow:
  
-HeaderLine \n +HeaderLine \n \\ 
-Line 1 \n +Line 1 \n \\ 
-.. +... \\ 
-Line n \n+Line n \n \\
  
-HeaderLine = (colum_name 0) \t (colum_name 1) \t (.. colum_name n) + 
-Line = Key \t Values  +where  
-Values = (Value_of_Colum 0) \t (Value_of_Colum 1) \t (..Value_of_Colum n)+ 
 +HeaderLine =  \\ 
 +(colum_name 0) \\ 
 +[\t (colum_name 1)\
 +... \\ 
 +[\t (colum_name n)
 + 
 + 
 +Line =  \\ 
 +Key \t Values  
 + 
 +Values = \\ 
 +(Value_of_Colum 0) \\ 
 +[\t (Value_of_Colum 1)\
 +... \\ 
 +[\t (Value_of_Colum n)]
  
  
 The input file must be sorted ascending by its keys The input file must be sorted ascending by its keys
  
-If there are more as one Value per key, the Values must sorted in the sequence as they should be used later+If there are more as one Value per key, the Values must sorted in that sequence as they should appear in the later query.
  
  
 +Lines, which begins with an # are seen as comment lines and will be surpressed
  
-===== Input file Format =====+===== The OODB (Output) file Format =====
  
 The generated output format will be as follows: The generated output format will be as follows:
  
-HeaderLine 0x0 +HeaderLine 0x0 \\ 
-Entry 1 +Entry 1 \\ 
-.. +.. \\ 
-Entry n+Entry n \\
  
-Entry = Key 0x0 (Offset, if key > Searchstring) (Offset, if key < Searchstring) Values 1  0x0 [..Values n  0x0] 0x0+Entry = \\ 
 +Key 0x0 \\ 
 +Offset (for key > Searchstring) \\ 
 +Offset (for key < Searchstring) \\ 
 +Value 0x0  \\ 
 +[..Values n  0x0] \\ 
 +0x0
  
-Offset = binary unsigned 32-Bit Big Endian, calculated as skip() value from the fileposition after the second 4-Byte value up to the start of the next key string to be evaluated. To distingluish +Offset =  \\ 
 +binary unsigned 32-Bit Big Endian, calculated as skip() value from the fileposition after the second 4-Byte value up to the start of the next key string to be evaluated. To distingluish 
 between an offset of 0 to the next key string and a 0 as the indicator for the end of the search tree, the skip() offset given in the file is always 1 higher as in reality, so 1 needs to be between an offset of 0 to the next key string and a 0 as the indicator for the end of the search tree, the skip() offset given in the file is always 1 higher as in reality, so 1 needs to be
-substracted to have the correct jump width (e.g. Offset in file: 9 means real jump width 9 -1 = 8   = skip(8)+subtracted to have the correct jump width (e.g. Offset in file: 9 means real jump width 9 -1 = 8   = skip(8)
  
  
  
 How to read this file: How to read this file:
 +<code>
 1 - Read Headerline (from the file beginning until the first 0x0). Store this data for later naming of the found columns. 1 - Read Headerline (from the file beginning until the first 0x0). Store this data for later naming of the found columns.
-2 - read key value (string until the next 0x0) and the both next 4 Byte long skip() offsets (= relativefile positions) for the greater and smaller key value. If they are 0 (zero), there's no more smaller or greater key available+2 - read key value (0x0- terminated String) and the both next 4 Byte long skip() offsets (= relative file positions) for the greater and smaller key value. If they are 0 (zero), there's no more smaller or greater key available
 3 - compare key with search string: 3 - compare key with search string:
     - if equal, read attached values in an array. This array then contains the search result(s). Return this and the header line as positive search result.     - if equal, read attached values in an array. This array then contains the search result(s). Return this and the header line as positive search result.
     - if smaller:     - if smaller:
           if smaller file position is 0 (zero), then return from search with empty result array.           if smaller file position is 0 (zero), then return from search with empty result array.
-          if smaller file position is not 0, jump per skip() to the file postion of the next index string and continue again with step 2+          if smaller file position is not 0, jump per skip( value - 1 ) to the file postion of the next index string and continue again with step 2
     - if bigger:     - if bigger:
           if bigger file position is 0 (zero), then return from search with empty result array.           if bigger file position is 0 (zero), then return from search with empty result array.
-          if bigger file position is not 0, jump per skip() to the file postion of the next index string and continue again with step 2 +          if bigger file position is not 0, jump per skip(value - 1 ) to the file postion of the next index string and continue again with step 2 
- +</code>
- +
- +
-*/+