Optimizing OmniSci

This topic provides information on tuning your OmniSci configuration to improve performance.

Hardware
Database Design
Loading Data
Using Joins
Parallel GPUs
CUDA JIT Cache

Hardware

Even though OmniSci is an “in-memory” database, when the database first starts up, it reads data from disk. A large database can take a long time to read from a hard disk. Import and execution performance rely on disks with high-performance characteristics to match the general nature of the database.
As a starting point, OmniSci recommends fast SSD drives on a good hardware controller in RAID 10 configuration. If you use a virtual machine such as Amazon Web Services, OmniSci recommends you use Provisioned IOPS SSD disks in RAID configuration for storage.
Do not run unnecessary daemons. Ideally, only OmniSci services run on your OmniSci server.
For a production server, set the performance setting to performance instead of power saving. The performance setting is typically controlled by the system BIOS and prevents throttling back of the CPU. You must also change the settings in the Linux power governor setup.
A large amount of swap activity on the machine can indicate a memory shortage. Compare the amount of data the database is attempting to process in memory to the amount of memory available.
Because some work is always done on CPUs, speed is important. OmniSci recommends you use systems that balance a high core count with high CPU speed.
Use the nvidia-smi -pm and nvidia-smi -ac commands to maximize GPU clock speeds. On an NVIDIA Tesla K80, use the following commands:

  sudo nvidia-smi -pm 1
sudo nvidia-smi -ac 3004,875

Enable the NVIDIA persistence daemon nvidia-persistenced. See Persistence Daemon Usage.
Disable Error-correcting Code memory (ECC). This setting is persistent, but requires a reboot. See the NVIDIA-SMI documentation.

sudo nvidia-smi --ecc-config=0

Database Design

Review a representative sample of the data from which your table is to be created. This helps you determine the datatypes best suited to your columns. Where possible, place data into columns with the smallest representation that can fit the cardinality involved.

Look for these areas of potential optimization:

Can you apply fixed encoding to TIMESTAMP fields?
Can you apply fixed sizes to FIXED ENCODING DICT fields?
What kind of INTEGER is appropriate for the values involved?
Is DOUBLE required, or is FLOAT enough to store expected values?
Is ENCODING NONE set for high-cardinality TEXT fields?
Can the data be converted from its current form to a more denormalized form?

Using the smallest possible encoding increases the speed of all aspects of OmniSci, from initial load to query execution.

Loading Data

Loading large flat files of 100 MB or larger is the most efficient way to import data to OmniSci.
Consider increasing the block sizes of StreamInserter or SQLImporter to reduce the overhead of records loaded or streamed.
If you use a particular column on a regular basis to restrict queries to a table, load the table sorted on the data in that column. For example, if most queries have a DATE dimension, then load data in date order for best performance.
When using a large-cardinality column frequently for GROUP BY or as a JOIN column, you can improve performance by creating the table with a sort column; for example:
```
CREATE TABLE x...WITH(sort_column = y)
```
Then, when ingesting using COPY FROM, increase the BUFFER_SIZE parameter, up to 128 MB instead of the default 8 MB, to provide a larger window for sorting the data.

Preloading Data

On startup, you can load data from a standard list of queries. You can customize the queries for each analyst and load into memory the data they commonly use first.

Creating the Query List

Create a file with a code block for each user. The keyword USER must be uppercase. Provide the user name and database name, followed by a series of SQL query statements enclosed in curly braces.

USER user_name db_name {
query_a1;
query_a2;
...
query_aN;
}

Add code blocks for all users who benefit from pre-loaded data.

...
USER user_name_2 db_name_2 {
query_b1;
query_b2;
...
query_bN;
}

Example

This code preloads queries to the flights database for the user Ziegfried in a database named omnisci:

USER Ziegfried omnisci {
select count (*) from flights_2008_10k;
select * from flights_2008_10k limit 10;
select flight_year, flight_month, flight_dayofmonth, flight_dayofweek, deptime, crsdeptime from flights_2008_10k limit 10;
select plane_model, plane_status, plane_aircraft_type, plane_engine_type, plane_year, origin_name, origin_city, origin_state, origin_country,
origin_lat, origin_lon, dest_name from flights_2008_10k limit 10;
}

Loading Data Using the Query List

Follow these steps to load the default queries on startup:

Create your query list file and store it on your OmniSci server.
If necessary, stop your OmniSci server.
Restart your server with the following option:

start_omnisci_server --db-query-list <path-to-db-query-file>

On startup, your queries are loaded automatically, speeding up defined query results.

Note	You must have ACCESS and SELECT permissions to preload data.

Using Joins

OmniSciDB supports relational joins across all scalar column types and with any Boolean binary operator. If you use joins in OmniSci, consider the following to optimize query performance:

Equijoin queries (joins between two columns with an = predicate) are typically executed using an accelerated hash join framework, and therefore perform better than range joins. For example:
```
JOIN table_2 ON table_1.column_1 = table_2.column_b
```
performs better than:
```
JOIN table_2 ON table_1.column_1 > table_2.column_b AND table1_column_1 < table_2.column_b + 1000
```
For an equijoin query, OmniSciDB runs more efficiently if the range of the join key space is small. For example, say you are joining a column defined as BIGINT that contains approximately 1 million reference numbers ranging from 1000000000 to 9000000000. In this instance, OmniSci interprets the range to be 8 billion potential values -- every potential value between 1 billion and 9 billion.
In this case, you want to store these keys as TEXT ENCODING DICT instead. This maps each reference number to a unique identity, and OmniSci recognizes that only 1 million values exist, thereby optimizing the join.
If join performance is slowing your Immerse dashboard, consider using CREATE TABLE AS SELECT to materialize a join expression as a new table. You can use the new table in your Immerse charts.
In distributed mode, either sharding of both tables or replication of the inner table is required to execute joins.
Sharding can significantly improve performance of join operations in multi-GPU single-node setups if the join is executed on the shard key.

Parallel GPUs

Parsing, optimization, and parts of rendering can overlap between queries, but most of the execution occurs single file. In general, you get the most throughput on the GPU by letting a query have all the resources. Contention is not a concern for buffer or cache memory. If queries are done very quickly, you get low latency, even with many simultaneous queries.

For simple queries on relatively small datasets, consider executing queries on subsets of GPUs. Different GPU groups can execute at the same time. This configuration benefits from parallelizing “fixed overheads” on each query between OmniSci servers on the same node.

You can implement this behavior by running multiple OmniSci servers on the same node and mapping each to different sets of GPUs with the --start-gpu and --num-gpus flags (see Configuration file).

CUDA JIT Cache

When your device driver compiles PTX code for an application, it automatically caches a copy of the generated binary code to avoid repeating the compilation in later invocations of the application. The cache — referred to as the compute cache — is automatically invalidated when you upgrade your device driver so that applications can benefit from improvements in the just-in-time compiler built into the device driver.

You can use environment variables to control just-in-time compilation.

Variable	Default	Description
`CUDA_CACHE_DISABLE`	0	Setting to 1 disables caching (no binary code is added to or retrieved from the cache).
`CUDA_CACHE_MAXSIZE`	256 MiB	Specifies the size of the compute cache in bytes; the maximum size is 4 GiB. Binary code that exceeds the cache size is not cached. Older binary code is removed to make room for newer code as needed.
`CUDA_CACHE_PATH`	MacOS: `$HOME/Library/Application Support/NVIDIA/ComputeCache` Linux: `~/.nv/ComputeCach`	Specifies the directory location of computer cache files. When running OmniSci on a Docker instance, OmniSci recommends that you set this environment variable to $OMNISCI_STORAGE/NVIDIA/ComputeCache.
`CUDA_FORCE_PTX_JIT`	0	Setting to 1 forces the device driver to ignore any binary code embedded in an application and to just-in-time compile embedded PTX code instead. If a kernel does not have embedded PTX code, it will fail to load. You can use this environment variable to confirm that an application binary contains PTX code and that just-in-time compilation works as expected to guarantee forward compatibility with future architectures.

Main Content