Datatypes and Fixed Encoding¶
Each MapD datatype uses space in memory and on disk. The default datatype sizes are listed in the following table:
Datatype | Size (bytes) | Notes |
---|---|---|
TEXT ENCODING DICT |
4 | Max cardinality 1 billion distinct string values |
TEXT ENCODING NONE |
Variable | Size of the string + 6 bytes |
TIMESTAMP |
8 | |
TIME |
8 | |
DATE |
8 | |
FLOAT |
4 | |
DOUBLE |
8 | |
INTEGER |
4 | |
SMALLINT |
2 | |
BIGINT |
8 | |
BOOLEAN |
1 | |
DECIMAL/NUMERIC |
8 |
For certain datatypes, you can use a more compact representation of these values. The options for these datatypes are listed in the following table:
Encoding | Size (bytes) | Notes |
---|---|---|
TIMESTAMP ENCODING FIXED(32) |
4 | Range: 1901-12-13 20:45:53 - 2038-01-19 03:14:07 |
TIME ENCODING FIXED(32) |
4 | Range: 00:00:00 - 23:59:59 |
DATE ENCODING FIXED(32) |
4 | Range: 1901-12-13 - 2038-01-19 |
TEXT ENCODING DICT(16) |
2 | Max cardinality 64K distinct string values |
TEXT ENCODING DICT(8) |
1 | Max cardinality 255 distinct string values |
INTEGER ENCODING FIXED(16) |
2 | Same as SMALLINT |
INTEGER ENCODING FIXED(8) |
1 | Range: -127 – 127 |
SMALLINT ENCODING FIXED(8) |
1 | Range: -127 – 127 |
BIGINT ENCODING FIXED(32) |
4 | Same as INTEGER |
BIGINT ENCODING FIXED(16) |
2 | Same as SMALLINT |
BIGINT ENCODING FIXED(8) |
1 | Range: -127 – 127 |
To use these fixed length fields, the range or cardinality of the data must fit into the constraints as described.
These encodings are most effective on low-cardinality TEXT
fields, where you can achieve large savings of storage space and improved processing speed, and on TIMESTAMP
fields where the timestamps range between 1901-12-13 20:45:53 and 2038-01-19 03:14:07.
All encoding options are shown. Some of the INTEGER options overlap. For example, INTEGER ENCODINGFIXED(8) and SMALLINT ENCODINGFIXED(8) are essentially the same.
If a TEXT ENCODING field does not match the defined cardinality, MapD substitutes a NULL
value and logs the change.
When you understand your schema and the scope of potential values in each field, you can achieve significant savings by carefully applying these fixed encoding types.