icon ${title}

Qvikly stores data in binary format, largely as field tags, types and associated values. This article outlines the basic data structures and assumptions and principles behind the architectural approach taken. - Calculated datasets, which can get very large (millions of records), are stored in data files. - Human editable data is stored in a database, in the same format. - Data definitions are stored separately in a database. - It is allowed for the data type of the field value to be different from the field's defined type. This often happens when you are expecting a numeric in a certain field, but the data isn't clean. Qvikly treats these scenarios not as failures, but as a core aspect of the problem space. - Encoding is always big endian, which seems to be the current standard - Text is stored in UTF-8 encoding - Checksums are currently not implemented since most erroneous data storage is handled by the embedded database ### Calculated Datasets Calculated datasets are stored in files and meta-data for these are stored in the embedded Apache Derby database. These allow for faster processing compared to retrieving records from a database. A file stores both the type definition and the records for a calculated dataset. The data is streamed in for reads and writes, and transient references are kept to the next set of records by offset, when paginating on reads. Refreshes create new files so until the refresh operation is successful, the existing data is retained and accessible. #### File Format The specified format is used for storing a common set of records for a single dataset. Different datasets are stored in different files. However, a single file can store both multiple different data records, and also have multiple different block types. ```java {file block header} {type definition} {record data block}*N ``` ##### File Header The file header is designed to support evolution of formats and hence stores format information as well as flags for options. * (0) 1 byte - file header type (1) * (1) 1 byte - bit map - 0,0,0,0,0,0,reverse support,delete flag * (2-5) 4 bytes - length of the entire control block * (6-9) 4 bytes - block checksum (not used) * (10-25) 16 bytes - GUID id for the format - currently 'c0a83a99-0505-4d2a-9f2b-7b9e2e3859f7' * (26-27) 2 bytes - version number for the buffer format - currently 1 * (28-29) 2 bytes - newline characters to determine how CRLF is encoded * (30) 1 byte - Ctrl-Z to mark a EOF for a text file * (31-35) 4 bytes - size of the buffer (not used) * (36-39) 4 bytes - available bytes at end (not used) * (40-43) 4 bytes - number of cleanup bytes (not used) * (44 - 63) 20 bytes - reserved and currently undefined ##### Type Definition * (0) 1 byte - file header type (7) * (1) 1 byte - not used * (2-5) 4 bytes - length of the type definition block * (6-9) 4 bytes - block checksum (not used) * (10-25) 16 bytes - object id (GUID) for the type definition * (26-X) variable - string containing the definition in xml format ````xml ```` ###### Reserved Field Numbers Qvikly has the last 500 field numbers (max short value - 500) reserved for commonly used fields. The following is the list of special field ids. * 32765 - object id for the parent object * 32764 - date record created * 32763 - date record last modified * 32762 - date record last published * 32761 - object id of the datastore (for uniqueness of record numbers) * 32760 - list of object ids of contained objects * 32759 - object id of the type definition * 32758 - object id of this record * 32757 - order of the object as a number * 32756 - boolean flag if the record is deleted (different from if a block is deleted) * 32755 - boolean flag if the record is archived * 32754 - object id of the previous object, in case we need to merge the object back to its ancestor * 32753 - drilldown level of an object, in case there is a heirarchy - 0 is lowest level, drilling up to 1, etc.. * 32752 - xml/json string containing user-defined meta-data about the field value (ex. could be used for storing highlighting, formatting at cell level as in a spreadsheet) #### Record Data Format A file is composed of multiple records, each record itself being formed of a combination of headers and data fields. ```java {universal block header} {field value}*N ``` ##### Universal Block Header * (0) 1 byte - record block type (11) - 0 - buffer end of record - 1 - file header record type - 2 - object record type - 3 - object list - 4 - string - 5 - blob - 6 - collaboration record block - 7 - type definition - 8 - field definition - 9 - buffer header - 10 - list of primitives - 11 - streamed record (most commonly used), basically a queue of field values * (1) 1 byte - record block bit variables, 0,0,0,0,0,0,{varints used},{delete status - 1 means deleted} * (2-5) 4 bytes - block length as an integer (not including the block header itself) ##### Field Value * (0-1) 2 bytes - field id - each field has a number * (2) 1 byte - data type (may be different from the type of the field) - 0 - boolean - 1 - short (2 byte number) - 2 - int (4 byte number) - 3 - long (8 byte number) - 4 - double (8 byte decimal - 4 bytes significant, 4 bytes for decimal portion) - 5 - date (8 byte number) - 6 - string - 7 - id (16 byte GUID/UUID) - 8 - bytes - 11 - embedded block - 12 - reference to another block (unused) * (3-N) variable length based on data type stored, for strings and bytes, the first 4 bytes are the size in bytes of the stored field value