Optimizing Performance

These are some ways you can ensure peak performance from your MapD Core system.

Hardware

  • Even though MapD is an “in-memory” database, when the database first starts up it needs to read data from disk. A large database can take a long time to read from a slow hard disk. Import and execution performance rely on disks with high performance characteristics to match the general nature of the database. MapD recommends fast SSD drives on a good hardware controller in RAID 10 configuration as reasonable starting hardware. If you use a virtual machine such as Amazon Web Services, MapD recommends you use Provisioned IOPS SSD disks in RAID configuration for storage.
  • Do not run unnecessary daemons. Ideally, only MapD services would run on your MapD server.
  • For a production server, set the performance setting to performance rather than power saving. The performance setting is typically controlled by the system BIOS and prevents throttling back of the CPU. You also have to change the settings in the Linux power governor setup.
  • If there is a large amount of swap activity on the machine, you probably have a memory shortage. Review the amount of data the database is attempting to process in memory compared with how much memory is available.
  • CPU speed matters to MapD, as there is always some work done on the CPUs. MapD recommends you use systems with a balance of high core count plus high CPU speed.
  • Use the nvidia-smi -pm and nvidia-smi -ac commands to set the clock speeds of the GPUs to their maximum. On a K80, the commands look like this:
sudo nvidia-smi -pm 1
sudo nvidia-smi -ac 3004,875
--ecc-config=0

Database Design

Review a representative sample of the data from which your table is to be created. This helps you determine the datatypes best suited to your columns. Where possible, place data into columns with the smallest representation that can fit the cardinality involved.

Look for these areas of potential optimization:

  • Can you apply FIXED ENCODING to TIMESTAMP fields?
  • Can you apply fixed sizes to FIXED ENCODING DICT fields?
  • What kind of INTEGER is appropriate for the values involved?
  • Is DOUBLE required, or is FLOAT enough to store expected values?
  • Set ENCODING NONE for high cardinality TEXT fields.
  • Can the data be converted from its current form to a more denormalized form?

Using the smallest possible encoding speeds up all aspects of MapD from the initial load to query execution.

Loading Data

  • Loading large flat files of 100M or more is the most efficient way to import data to MapD.
  • Consider increasing the block sizes of StreamInserter or SQLImporter to reduce the overhead per set of records loaded or streamed.
  • If you use a particular column on a regular basis to restrict the queries to a table, load the table sorted on the data in that column. For example, if most queries have a DATE dimension, then load data in date order for the best performance.