Datatypes and Fixed EncodingΒΆ

Each MapD datatype uses space in memory and on disk. The default datatype sizes are listed in the following table:

Datatype Size (bytes)
TEXT ENCODING DICT 4
TEXT ENCODING NONE Variable (size of the string + 6 bytes)
TIMESTAMP 8
TIME 8
DATE 8
FLOAT 4
DOUBLE 8
INTEGER 4
SMALLINT 2
BIGINT 8
BOOLEAN 1
DECIMAL/NUMERIC 8

For certain datatypes, you can use a more compact representation of these values. The options for these datatypes are listed in the following table:

Encoding Size (bytes) Notes
TIMESTAMP ENCODING FIXED(32) 4 Range: 1901-12-13 20:45:53 - 2038-01-19 03:14:07
TIME ENCODING FIXED(32) 4 Range: 00:00:00 - 23:59:59
DATE ENCODING FIXED(32) 4 Range: 1901-12-13 - 2038-01-19
TEXT ENCODING DICT(16) 2 Max cardinality 64K
TEXT ENCODING DICT(8) 1 Max cardinality 255
INTEGER ENCODING FIXED(16) 2 Same as SMALLINT
INTEGER ENCODING FIXED(8) 1 Max range -127 to 127
SMALLINT ENCODING FIXED(8) 1 Max range -127 to 127
BIGINT ENCODING FIXED(32) 4 Same as INTEGER
BIGINT ENCODING FIXED(16) 2 Same as SMALLINT
BIGINT ENCODING FIXED(8) 1 Max range -127 to 127

To effectively use these fixed length fields, the range or cardinality of the data must fit into the constraints as described.

The best use for these encodings is on low-cardinality TEXT fields where you can achieve large savings, and on TIMESTAMP fields where the timestamps range between 1901-12-13 20:45:53 and 2038-01-19 03:14:07.

All options are shown, but many of the INTEGER options overlap.

If a text encoded field does not match the defined cardinality, MapD substitutes a NULL value and logs the change.

When your schema is well understood, you can achieve significant savings by carefully applying these fixed encoding types.