Tables¶

`CREATE TABLE`¶

CREATE TABLE [IF NOT EXISTS] <table>
  (<column> <type> [NOT NULL] [ENCODING <encodingSpec>],
  [SHARD KEY (<column>)],
  [SHARED DICTIONARY (<column>) REFERENCES <table>(<column>)], ...)
  [WITH (<property> = value, ...)];

Create a table named <table> specifying <columns> and table properties.

`<type>`¶

Supported types:

BOOLEAN

SMALLINT

INT [ EGER ]

BIGINT

NUMERIC | DECIMAL (PRECISION[,SCALE]) max precision 19

FLOAT | REAL

DOUBLE [ PRECISION ]

[VAR ] CHAR(length)

TEXT

TIME

TIMESTAMP

DATE

Example¶

Create a table named tweets and specify the columns, including type, in the table.

CREATE TABLE IF NOT EXISTS tweets (
  tweet_id BIGINT NOT NULL,
  tweet_time TIMESTAMP NOT NULL ENCODING FIXED(32),
  lat REAL,
  lon REAL,
  sender_id BIGINT NOT NULL,
  sender_name TEXT NOT NULL ENCODING DICT,
  location TEXT ENCODING DICT,
  source TEXT ENCODING DICT,
  reply_to_user_id BIGINT,
  reply_to_tweet_id BIGINT,
  lang TEXT ENCODING DICT,
  followers INT,
  followees INT,
  tweet_count INT,
  join_time TIMESTAMP ENCODING FIXED(32),
  tweet_text TEXT,
  state TEXT ENCODING DICT,
  county TEXT ENCODING DICT,
  place_name TEXT,
  state_abbr TEXT ENCODING DICT,
  county_state TEXT ENCODING DICT,
  origin TEXT ENCODING DICT);

`<encodingSpec>`¶

Supported encoding:

Encoding	Descriptions
`DICT`	Dictionary encoding on string columns (default for `TEXT` columns).
`NONE`	No encoding. Valid only on `TEXT` columns. No Dictionary is created. Aggregate operations are not possible on this column type.
`FIXED` (bits)	Fixed length encoding of integer or timestamp columns. See Datatypes and Fixed Encoding.

`WITH` Clause `<property>`¶

WITH clause properties:

Property	Description
`fragment_size`	Number of rows per fragment that is a unit of the table for query processing. Default = 32 million rows, which is not expected to be changed.
`max_rows`	Set the maximum number of rows allowed in a table to create a capped collection. When this limit is reached, the oldest fragment is removed. Default = 2^62.
`page_size`	Number of I/O page bytes. Default = 1MB, which does not need to be changed.
`partitions`	Partition strategy option: `SHARDED`: Partition table using sharding. `REPLICATED`: Partition table using replication.
`shard_count`	Number of shards to create, typically equal to the number of GPUs across which the data table is distributed.

Sharding¶

Sharding partitions a database table across multiple servers so each server has a part of the table with the same columns but with different rows. Partitioning is based on a sharding key defined when you create the table.

Without sharding, the dimension tables involved in a join are replicated and sent to each GPU, which is not feasible for dimension tables with many rows. Specifying a shard key makes it possible for the query to execute efficiently on large dimension tables.

Currently, specifying a shard key is useful for joins, only:

If two tables specify a shard key with the same type and the same number of shards, a join on that key only sends a part of the dimension table column data to each GPU.

For multi-node installs, the dimension table does not need to be replicated and the join executes locally on each leaf.

Constraints¶

A shard key must specify a single column to shard on. There is no support for sharding by a combination of keys.
One shard key can be specified for a table.
Data are partitioned according to the shard key and the number of shards (shard_count).
A value in the column specified as a shard key is always sent to the same partition.
The number of shards should be equal to the number of GPUs in the cluster.
Only integers and dictionary-encoded columns can be shard keys.
Tables must share the dictionary for the column to be involved in sharded joins. If the dictionary is not specified as shared, the join does not take advantage of sharding. Dictionaries are reference-counted and only dropped when the last reference drops.

Recommendations¶

Set shard_count to the number of GPUs you eventually want to distribute the data table across.
Referenced tables must also be shard_count -aligned.
Sharding should be minimized because it can introduce load skew accross resources, compared to when sharding is not used.

Examples¶

Basic sharding:

CREATE TABLE customers(
  accountId text,
  name text,
  SHARD KEY (accountId))
  WITH (shard_count = 4);

Sharding with shared dictionary:

CREATE TABLE transactions(
  accountId text,
  action text,
  SHARD KEY (accountId),
  SHARED DICTIONARY (accountId) REFERENCES customers(accountId))
  WITH (shard_count = 4);

`CREATE TABLE AS SELECT`¶

CREATE TABLE <newTableName> AS (<SELECT statement>);

Create a table with the specified columns, copying any data that meet SELECT statement criteria.

Note

MapD Core Database does not support CREATE TABLE AS SELECT in a distributed cluster.

Note

MapD Core Database does not support CREATE TABLE AS SELECT for columns with the following data types:

Non-encoded strings
Columns that contain array values
Fixed-encoding integers (see Datatypes and Fixed Encoding)

Examples

Create a duplicate table named newTable from the existing oldTable.

CREATE TABLE newTable AS (SELECT * FROM oldTable);

Create a table named trousers, copying the data in the columns name, waist, and inseam from the existing table wardrobe.

CREATE TABLE trousers AS (SELECT name, waist, inseam FROM wardrobe);

Create a table named options, copying the data in the columns star and planet from the table cosmos for planets whose class is M.

CREATE TABLE options AS (SELECT star, planet FROM cosmos WHERE class='M');

`ALTER TABLE`¶

ALTER TABLE <table> RENAME TO <table>;
ALTER TABLE <table> RENAME COLUMN <column> TO <column>;

Examples

ALTER TABLE tweets RENAME TO retweets;

ALTER TABLE retweets RENAME COLUMN source TO device;

`DROP TABLE`¶

DROP TABLE [IF EXISTS] <table>;

Example

DROP TABLE IF EXISTS tweets;

`TRUNCATE TABLE`¶

TRUNCATE TABLE <table>;

Use the TRUNCATE TABLE statement to remove all rows from a table without deleting the table structure.

This releases table on-disk and memory storage and removes dictionary content unless it is a shared dictionary.

Removing rows is more efficient than using DROP TABLE. Dropping followed by recreating the table invalidates dependent objects of the table requiring you to regrant object privileges. Truncating has none of these effects.

Example¶

TRUNCATE TABLE tweets;

Tables¶

CREATE TABLE¶

<type>¶

Example¶

<encodingSpec>¶

WITH Clause <property>¶

Sharding¶

Constraints¶

Recommendations¶

Examples¶

CREATE TABLE AS SELECT¶

ALTER TABLE¶

DROP TABLE¶

TRUNCATE TABLE¶

Example¶

`CREATE TABLE`¶

`<type>`¶

`<encodingSpec>`¶

`WITH` Clause `<property>`¶

`CREATE TABLE AS SELECT`¶

`ALTER TABLE`¶

`DROP TABLE`¶

`TRUNCATE TABLE`¶