Fixed EncodingΒΆ
Each MapD datatype takes up a certain amount of space in memory and on disk. The default sizes of datatypes are listed in the following table.
Datatype | Size (bytes) |
---|---|
TEXT ENCODED DICT | 4 |
TEXT ENCODED NONE | Variable (size of the string + 6 bytes) |
TIMESTAMP | 8 |
TIME | 8 |
DATE | 8 |
FLOAT | 4 |
DOUBLE | 8 |
INTEGER | 4 |
SMALLINT | 2 |
BIGINT | 8 |
BOOLEAN | 1 |
DECIMAL/NUMERIC | 8 |
For certain datatypes, you can use a more compact representation of these values. The options for these datatypes are listed in the following table.
Encoding | Size (bytes) | Notes |
---|---|---|
TIMESTAMP ENCODING FIXED(32) | 4 | Range: 1901-12-13 20:45:53 - 2038-01-19 03:14:07 |
TIME ENCODING FIXED(32) | 4 | Range: 00:00:00 - 23:59:59 |
DATE ENCODING FIXED(32) | 4 | Range: 1901-12-13 - 2038-01-19 |
TEXT ENCODED DICT(16) | 2 | Max cardinality 64K |
TEXT ENCODED DICT(8) | 1 | Max cardinality 255 |
INTEGER ENCODING FIXED(16) | 2 | Same as SMALLINT |
INTEGER ENCODING FIXED(8) | 1 | Max range -127 to 127 |
SMALLINT ENCODING FIXED(8) | 1 | Max range -127 to 127 |
BIGINT ENCODING FIXED(32) | 4 | Same as INTEGER |
BIGINT ENCODING FIXED(16) | 2 | Same as SMALLINT |
BIGINT ENCODING FIXED(8) | 1 | Max range -127 to 127 |
To be able to effectively use these fixed length fields, the range or cardinality of the data must fit into the constraints as described.
The best use for these encodings is on low cardinality TEXT fields where you can achieve large savings, and on TIMESTAMP fields where the timestamps range between 1901-12-13 20:45:53 and 2038-01-19 03:14:07.
All options are shown, but many of the INTEGER options overlap.
If a text encoded field does not match the defined cardinality, MapD substitutes a NULL value records the change as a log entry.
Once your schema is well understood, you can achieve significant savings through careful application of these fixed encoding types.