Configuration

MapD Core Database has minimal configuration requirements with a number of additional configuration options. This topic describes the required and optional configuration changes you can use in your MapD Core Database instance.

Data Directory

Before starting MapD Core Database, the persistent data directory must be initialized. To do so, create an empty directory at the desired path, such as /var/lib/mapd. Create the environment variable $MAPD_STORAGE.

export MAPD_STORAGE = /var/lib/mapd

Change the owner of the directory to the user that the server will run as ($MAPD_USER):

sudo mkdir -p $MAPD_STORAGE
sudo chown -R $MAPD_USER $MAPD_STORAGE

Where $MAPD_USER is the system user account that the server runs as, such as mapd, and $MAPD_STORAGE is the desired path to the parent of the MapD Core Database data directory.

Finally, run $MAPD_PATH/bin/initdb with the data directory path as the argument:

$MAPD_PATH/bin/initdb $MAPD_STORAGE

Configuration file

MapD Core supports storing options in a configuration file. This is useful if, for example, you need to run the MapD Core Server and Web Server on different ports than their defaults.

If you store a copy of mapd.conf in the $MAPD_STORAGE directory, the configuration settings are picked up automatically by the sudo systemctl start mapd_server and sudo systemctl start mapd_web_server commands.

Set the flags in the configuration file using the format <flag> = <value>. Strings must be enclosed in quotes. The following is a sample configuration file. The entry for data path is a string and must be in quotes. The entry for the optional read-only flag is the boolean value true and is not in quotes.

port = 9091
http-port = 9090
data = "/var/lib/mapd/data"
read-only = true


[web]
port = 9092
frontend =
    "/home/osboxes/installs/mapd-3.0.0-20170502-9e5ba95-Linux-x86_64-render/frontend"

Configuration Flags for MapD Server

Configuration Flags for MapD Server
Flag Description Default Why Change It?
cpu Set this flag on a GPU installation to instruct it to use only CPUs. (You do not have to explicitly set this flag for a CPU-only installation.) FALSE One use case for disabling GPUs is during database conversion, which requires moving a large amount of data with minimal processing.
gpu Run on GPUs and CPUs. TRUE Default.
read-only Enable read-only mode. FALSE Prevents inadvertent (or nefarious) changes to the dataset.
port Port number 9091 Change the port number if it collides with another service on the host. Ideally, your host only runs MapD services.
ldap-uri ldap server uri N/A N/A
ldap-ou-dc ldap Organizational Unit and Domain Component =ou=users,dc=mapd,dc=com N/A
http-port HTTP port number 9090 Change the port number if it collides with another service on the host. Ideally, your host only runs MapD services.
flush-log Force aggressive log file flushes. FALSE When you set this the system writes messages to disk as they are generated, rather than holding them until a particular threshold is reached.
num-gpus Number of gpus to use -1 In a shared environment, you can assign the number of GPUs to a particular application. The default is -1, which means use all available GPUs.
start-gpu First gpu to use 0 In a shared environment, if you want to reserve a set number of GPUs for a particular process, you can configure another process to use GPUs starting at a higher device ID.
cluster Indicates that the MapD Core Database instance is an aggregator node, and where to find the rest of its cluster. $MAPD_STORAGE This setting is not likely to change in a production environment.
string-servers Path to string servers list JSON file $MAPD_STORAGE This setting is not likely to change in a production environment.

Configuration Flags for MapD Web Server

Configuration Flags for MapD Web Server
Flag Description Default Why Change It?
-b | backend-url Url to http-port on mapd_server http://localhost:9090 Change to avoid collisions with other services.
--cert Certificate file for HTTPS cert.pem Change for testing and debugging.
-c | --config Path to MapD configuration file   Change for testing and debugging.
-d | --data Path to MapD data directory data Change for testing and debugging.
--docs Path to documentation directory docs  
--enable-https Enable HTTPS support   Change to enable secure HTTP.
-f | --frontend Path to frontend directory frontend  
--key Key file for HTTPS key.pem Change for testing and debugging.
-p | --port Frontend server port 9092 Change to avoid collisions with other services.
-r | --read-only Enable read-only mode   Prevent inadvertent (or nefarious) changes to the data.
--servers-json Path to servers.json   Change for testing and debugging.
--timeout Maximum request duration in #h#m#s format 1h0m0s The --timeout option controls the maximum duration of individual HTTP requests. This is used to manage resource exhaustion caused by improperly closed connections. One side effect of this option is that it limits the execution time of queries made over the Thrift HTTP transport. This timeout duration must be increased if queries are expected to take longer than the default duration of one hour: for example, if you perform a COPY FROM on a large file when using mapdql with the HTTP transport.
--tmpdir Path for temporary file storage /tmp The temporary directory is used as a staging location for file uploads. It is sometimes desirable to place this directory on the same file system as the MapD Core data directory. If not specified on the command line, mapd_web_server also respects the standard TMPDIR environment variable as well as a specific MAPD_TMPDIR environment variable, the latter of which takes precedence. If you use neither the command line argument nor one of the environment variables, the default, /tmp/ is used.
-v | --verbose Print all log messages to stdout   Change for testing and debugging.
--version Return version    

Using Configuration Flags on the Command Line

To use options provided in a configuration file, set the --config flag to the path of the configuration file for mapd_server and mapd_web_server. For example:

$MAPD_PATH/bin/mapd_server --config $MAPD_STORAGE/mapd.conf

You also have the option of specifying configuration settings at the command line. MapD recommends that you use the systemctl command to start and stop the servers, but you can use the mapd_server and mapd_web_server commands with configuration flags for testing and debugging.

Command Line Configuration Flags for mapd_server

Command Line Configuration Flags for mapd_server
Flag Description Default Why Change It?
--config arg Path to mapd.conf none One use case might be to temporarily set a different configuration file during testing and troubleshooting.
--data arg Directory path to MapD catalogs $PWD/data You can set the path anywhere you choose.
--cpu Set this flag on a GPU installation to instruct it to use only CPUs. (You do not have to explicitly set this flag for a CPU- only installation.) FALSE One use case for disabling GPUs is during database conversion, which requires moving a large amount of data with minimal processing.
--gpu Run on GPUs and CPUs. TRUE Default.
--read-only Enable read-only mode. FALSE Prevents inadvertent (or nefarious) changes to the dataset.
-p [ --port ] arg Port number 9091 Change the port number if it collides with another service on the host. Ideally, your host only runs MapD services.
--ldap-uri arg ldap server uri N/A N/A
--ldap-ou-dc arg ldap Organizational Unit and Domain Component =ou=users,dc=mapd,dc=com N/A
--http-port arg HTTP port number 9090 Change the port number if it collides with another service on the host. Ideally, your host only runs MapD services.
--flush-log Force aggressive log file flushes. FALSE When you set this the system writes messages to disk as they are generated, rather than holding them until a particular threshold is reached.
--num-gpus arg Number of gpus to use -1 In a shared environment, you can assign the number of GPUs to a particular application. The default is -1, which means use all available GPUs.
--start-gpu arg First gpu to use 0 In a shared environment, if you want to reserve a set number of GPUs for a particular process, you can configure another process to use GPUs starting at a higher device ID.
-v [ --version ] Print release version number. N/A N/A
--cluster arg Indicates that the MapD Core Database instance is an aggregator node, and where to find the rest of its cluster. $MAPD_STORAGE This setting is not likely to change in a production environment.
--string-servers arg Path to string servers list JSON file $MAPD_STORAGE This setting is not likely to change in a production environment.

Command Line Configuration Flags for mapd_web_server

Command Line Configuration Flags for mapd_web_server
Flag Description Default Why Change It?
-b | backend-url string Url to http-port on mapd_server http://localhost:9090 Change to avoid collisions with other services.
--cert string Certificate file for HTTPS cert.pem Change for testing and debugging.
-c | --config string Path to MapD configuration file   Change for testing and debugging.
-d | --data string Path to MapD data directory data Change for testing and debugging.
--docs string Path to documentation directory docs  
--enable-https Enable HTTPS support   Change to enable secure HTTP.
-f | --frontend string Path to frontend directory frontend  
--key string Key file for HTTPS key.pem Change for testing and debugging.
-p | --port int Frontend server port 9092 Change to avoid collisions with other services.
-r | --read-only Enable read-only mode   Prevent inadvertent (or nefarious) changes to the data.
--servers-json string Path to servers.json   Change for testing and debugging.
--timeout duration Maximum request duration in #h#m#s format. For example 0h30m0s represents a duration of 30 minutes. 1h0m0s The --timeout option controls the maximum duration of individual HTTP requests. This is used to manage resource exhaustion caused by improperly closed connections. One side effect of this option is that it limits the execution time of queries made over the Thrift HTTP transport. This timeout duration must be increased if queries are expected to take longer than the default duration of one hour: for example, if you perform a COPY FROM on a large file when using mapdql with the HTTP transport.
--tmpdir string Path for temporary file storage /tmp The temporary directory is used as a staging location for file uploads. It is sometimes desirable to place this directory on the same file system as the MapD Core data directory. If not specified on the command line, mapd_web_server also respects the standard TMPDIR environment variable as well as a specific MAPD_TMPDIR environment variable, the latter of which takes precedence. If you use neither the command line argument nor one of the environment variables, the default, /tmp/ is used.
-v | --verbose Print all log messages to stdout   Change for testing and debugging.
--version Return version