Datatypes and Fixed Encoding

Each MapD datatype uses space in memory and on disk. The default datatype sizes are listed in the following table:

Datatype Size (bytes) Notes
TEXT ENCODING DICT 4 Max cardinality 1 billion distinct string values
TEXT ENCODING NONE Variable Size of the string + 6 bytes
TIMESTAMP 8  
TIME 8  
DATE 8  
FLOAT 4  
DOUBLE 8  
INTEGER 4  
SMALLINT 2  
BIGINT 8  
BOOLEAN 1  
DECIMAL/NUMERIC 8  

For certain datatypes, you can use a more compact representation of these values. The options for these datatypes are listed in the following table:

Encoding Size (bytes) Notes
TIMESTAMP ENCODING FIXED(32) 4 Range: 1901-12-13 20:45:53 - 2038-01-19 03:14:07
TIME ENCODING FIXED(32) 4 Range: 00:00:00 - 23:59:59
DATE ENCODING FIXED(32) 4 Range: 1901-12-13 - 2038-01-19
TEXT ENCODING DICT(16) 2 Max cardinality 64K distinct string values
TEXT ENCODING DICT(8) 1 Max cardinality 255 distinct string values
INTEGER ENCODING FIXED(16) 2 Same as SMALLINT
INTEGER ENCODING FIXED(8) 1 Range: -127 – 127
SMALLINT ENCODING FIXED(8) 1 Range: -127 – 127
BIGINT ENCODING FIXED(32) 4 Same as INTEGER
BIGINT ENCODING FIXED(16) 2 Same as SMALLINT
BIGINT ENCODING FIXED(8) 1 Range: -127 – 127

To use these fixed length fields, the range or cardinality of the data must fit into the constraints as described.

These encodings are most effective on low-cardinality TEXT fields, where you can achieve large savings of storage space and improved processing speed, and on TIMESTAMP fields where the timestamps range between 1901-12-13 20:45:53 and 2038-01-19 03:14:07.

All encoding options are shown. Some of the INTEGER options overlap. For example, INTEGER ENCODINGFIXED(8) and SMALLINT ENCODINGFIXED(8) are essentially the same.

If a TEXT ENCODING field does not match the defined cardinality, MapD substitutes a NULL value and logs the change.

When you understand your schema and the scope of potential values in each field, you can achieve significant savings by carefully applying these fixed encoding types.

Shared Dictionaries

You can improve performance of string operations and optimize storage using shared dictionaries. You can share dictionaries within a table or between different tables in the same database. The table with which you want to share dictionaries must exist when you create the table that references the TEXT ENCODING DICT field.

For example, this DDL is a portion of the schema for the flights database. Because airports are both origin and destination locations, it makes sense to reuse the same dictionaries for name, city, state, and country values.

create table flights (
*
*
*

dest_name TEXT ENCODING DICT,
dest_city TEXT ENCODING DICT,
dest_state TEXT ENCODING DICT,
dest_country TEXT ENCODING DICT,

*
*
*

SHARED DICTIONARY (origin_name) REFERENCES flights(dest_name),
SHARED DICTIONARY (origin_city) REFERENCES flights(dest_city),
SHARED DICTIONARY (origin_state) REFERENCES flights(dest_state),
SHARED DICTIONARY (origin_country) REFERENCES flights(dest_country),

*
*
*
)

To share a dictionary in a different existing table, replace the table name in the REFERENCES instruction. For example, if you have an existing table called us_geography, you can share the dictionary by following the pattern in the DDL fragment below.

create table flights (

*
*
*

SHARED DICTIONARY (origin_city) REFERENCES us_geography(city),
SHARED DICTIONARY (origin_state) REFERENCES us_geography(state),
SHARED DICTIONARY (origin_country) REFERENCES us_geography(country),
SHARED DICTIONARY (dest_city) REFERENCES us_geography(city),
SHARED DICTIONARY (dest_state) REFERENCES us_geography(state),
SHARED DICTIONARY (dest_country) REFERENCES us_geography(country),

*
*
*
)