Datatypes and Fixed EncodingΒΆ
Each MapD datatype uses space in memory and on disk. The default datatype sizes are listed in the following table:
Datatype | Size (bytes) |
---|---|
TEXT ENCODING DICT |
4 |
TEXT ENCODING NONE |
Variable (size of the string + 6 bytes) |
TIMESTAMP |
8 |
TIME |
8 |
DATE |
8 |
FLOAT |
4 |
DOUBLE |
8 |
INTEGER |
4 |
SMALLINT |
2 |
BIGINT |
8 |
BOOLEAN |
1 |
DECIMAL/NUMERIC |
8 |
For certain datatypes, you can use a more compact representation of these values. The options for these datatypes are listed in the following table:
Encoding | Size (bytes) | Notes |
---|---|---|
TIMESTAMP ENCODING FIXED(32) |
4 | Range: 1901-12-13 20:45:53 - 2038-01-19 03:14:07 |
TIME ENCODING FIXED(32) |
4 | Range: 00:00:00 - 23:59:59 |
DATE ENCODING FIXED(32) |
4 | Range: 1901-12-13 - 2038-01-19 |
TEXT ENCODING DICT(16) |
2 | Max cardinality 64K |
TEXT ENCODING DICT(8) |
1 | Max cardinality 255 |
INTEGER ENCODING FIXED(16) |
2 | Same as SMALLINT |
INTEGER ENCODING FIXED(8) |
1 | Max range -127 to 127 |
SMALLINT ENCODING FIXED(8) |
1 | Max range -127 to 127 |
BIGINT ENCODING FIXED(32) |
4 | Same as INTEGER |
BIGINT ENCODING FIXED(16) |
2 | Same as SMALLINT |
BIGINT ENCODING FIXED(8) |
1 | Max range -127 to 127 |
To effectively use these fixed length fields, the range or cardinality of the data must fit into the constraints as described.
The best use for these encodings is on low-cardinality TEXT
fields where you can achieve large savings, and on TIMESTAMP
fields where the timestamps range between 1901-12-13 20:45:53 and 2038-01-19 03:14:07.
All options are shown, but many of the INTEGER options overlap.
If a text encoded field does not match the defined cardinality, MapD substitutes a NULL
value and logs the change.
When your schema is well understood, you can achieve significant savings by carefully applying these fixed encoding types.