High Availability

OmniSci provides support for High Availability (HA) to help your business meet service-level agreements for performance and up time.

OmniSci HA provides redundancy to eliminate single points of failure, and crossover systems that continue operations when a component goes offline.

4_ha.png

When a server is running in HA mode, it is assigned to a specific HA group. All servers in the HA group eventually receive all updates applied to any server in the same HA group.

Redundancy

Incoming bulk data is seamlessly distributed to multiple OmniSci databases. As HA group members receive each update, backend synchronization orchestrates and manages replication, then updates the OmniSci servers in the HA group using Kakfa topics as a distributed resilient logging system.

To avoid double handling of larger file-based bulk operations such as COPY FROM, OmniSci HA uses a distributed redundant file system to store data. All nodes in the HA group must have access to the distributed file system. OmniSci typically uses the GlusterFS distributed file system, but you can use most any fully featured DFS.

Load Balancing and Failure Detection

The OmniSci HA server uses load balancing to maximize performance and reliability. A load balancer distributes users across the available OmniSci Servers, allowing improved concurrency and throughput as more servers are added. If a OmniSci server becomes unavailable, the load balancer redirects traffic to a different available OmniSci server, preserving availability with a reduction in capacity.

Load balancing can be done with purpose-build hardware such as F5 or mid-tier application servers such as haproxy or node.js. In some circumstances, the application might take into consideration data locality, routing similar requests to the same server to improve performance or reduce GPU memory usage.

Currently, OmniSci does not natively support in-session recovery; a new session must be established after failure. Typically, mid-tier application servers handle this by retrying a request when detecting a connection failure.

HA and OmniSci Distributed

You can use OmniSci HA with OmniSci distributed configuration, allowing horizontal scale-out. Each OmniSci Server can be configured as a cluster of machines. For more information, see distributed.

OmniSci HA Example

When starting an OmniSciDB instance as part of a HA group, you must specify the following options.

Option Description
ha-group-id The name of the HA group this server belongs to. This identifies the Kafka topic to be used. The topic must exist in Kafka before this server can start.
ha-unique-server-id A unique ID that identifies this server in the HA group. This should never be changed for the lifetime of the OmniSciDB instance.
ha-brokers A list used to identify Kafka brokers available to manage the system.
ha-shared-data The parent directory of the distributed file system. Used to check that bulk load files are placed in a directory available to all HA group members.

A omnisci.conf file would look like this.

port = 6274
http-port = 6278
data = "/home/ec2-user/prod/omnisci-storage/data"
read-only = false
ha-group-id=hag-group-1
ha-unique-server-id=mymachine
ha-brokers=mykafkabrokerlist
ha-shared-data=/mnt/dfs/omnisci-storage/

[web]
port = 6273
frontend = "/home/ec2-user/prod/omnisci/frontend"

Implementing HA

These are configuration steps to enable HA for OmniSci. These instructions assume you have mountable shared storage between the OmniSci instances you configure for HA.

OmniSci uses Apache Kafka as a message broker to facilitate HA capabilities. The following is a simplified architecture diagram.

As data is created on OmniSci Server 1 or OmniSci Server 2, OmniSci sends a message to a Kafka topic defined during setup. Once Kafka receives the topic. it transfers data to the other server that has joined the topic as a consumer. The Kafka topic must be created and configured before starting the OmniSci server in HA mode. The shared storage directory is only used for COPY FROM operations.

Configuring Kafka

This example demonstrates how to configure a single Kafka broker. To install Kafka, you must have Java installed. You can download Java from https://www.java.com/en/download/.

After installing Java, download the appropriate tarball for the version of Kafka you want to install. You can download Apache Kafka at https://kafka.apache.org/downloads. This example uses version 2.12.

To install and configure Kafka:

  1. Download your preferred version of Kafka.
    curl -O http://mirror.cc.columbia.edu/pub/software/apache/kafka/1.1.0/kafka_2.12-1.1.0.tgz
  2. Expand the tarball.
    tar xvf ./kafka_2.12-1.1.0.tgz
  3. Go to the Kafka directory.
    cd kafka_2.12-1.1.0
  4. Verify that Kafka and Zookeeper start up without error.
    bin/zookeeper-server-start.sh config/zookeeper.properties
    bin/kafka-server-start.sh config/server.properties
  5. Create your OmniSci topic.
    bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic omnisci
    NoteThese are default options used in the quick start guide of Kafka. These might need to change, depending on your end-state configuration of Kafka.
  6. Ensure that the OmniSci topic is created.
    bin/kafka-topics.sh --list --zookeeper localhost:2181
    This command should return the output of the OmniSci topic you created.
  7. Run Zookeeper and Kafka as a service.
    kafkahome=/path/to/kafka/install \
    nohup ${kafkahome}/bin/zookeeper-server-start.sh
    ${kafkahome}/config/zookeeper.properties > /dev/null 2>&1 & sleep 2 \
    nohup ${kafkahome}/bin/kafka-server-start.sh 
    ${kafkahome}/config/server.properties > /dev/null 2>&1 & sleep 2

Configuring Your OmniSci Servers to Use HA

Use omnisci.conf parameters to configure your OmniSci instances to support High Availability. The configuration parameters are:

  • ha-group-id
  • ha-unique-server-id
  • ha-brokers
  • ha-shared-data

The following are example omnisci.conf files for the OmniSci servers.

OmniSci Server 1

port = 6274
http-port = 6278
data = $OMNISCI_STORAGE/data
null-div-by-zero = true
ha-group-id=omnisci                          #topic created in previous step
ha-unique-server-id=server1                  #unique id for the server
ha-brokers=<kakfahostname>
ha-shared-data=</mountedvol1/somesharedfolder> #note this is the same directory as server2
[web]
port = 6273
frontend = "/opt/omnisci/frontend"

OmniSci Server 2

port = 6274
http-port = 6278
data = $OMNISCI_STORAGE/data
null-div-by-zero = true
ha-group-id=omnisci                          #topic created in previous step
ha-unique-server-id=server2                  #unique id for the server
ha-brokers=<kakfahostname>
ha-shared-data=</mountedvol1/somesharedfolder> #note this is the same directory as server1
[web]
port = 6273
frontend = "/opt/omnisci/frontend"

Implementation Notes

  • You can configure your Kafka queue to determine how long it will persist data. Your Kafka server must hold data long enough for full data recovery. In the official Kafka documentation, see the log.retention.[time_unit] settings for Topic-Level Configs.
  • OmniSci is not recommended as system of record. $OMNISCI_STORAGE should not be the basis for data recovery.
  • StreamImporter is not supported in HA mode.

Kafka Message Timeouts

After a defined interval, Kafka messages are not committed. This is to avoid a situation during a long HA transfer where records might be committed, but not processed. When there is a timeout, you might have some uncommitted Kafka messages that need to be deleted. You can delete them using the kafka-delete-records.sh script. The script requires two arguments.

  • --bootstrap-server: Server IP and port of the Kafka server.
  • --offset-json-file: Path to a JSON file that defines the topic, partition, and offset of a specific Kafka message on the server.
./kafka-delete-records.sh --bootstrap-server localhost:9092 --offset-json-file ~/Desktop/delete.json
The delete.json would be similar to the following example.
{
   "partitions": [
       {
           "topic": "omni_topic",
           "partition": 0,
           "offset": 1
       }
   ],
   "version": 1
}

The topic name is set in your omnisci.conf file.

The number of partitions depends on your Kafka deployment: most OmniSci deployments have only a single Kafka partition (0).

You can view a Kafka message offset that is received to be processed by setting the --verbose flag in OmniSciDB. OmniSci server prints the offset of the next message to be processed in the INFO log.

You can also use this shell command to see the current offset of the Kafka server, where $KAFKA_BROKER is the IP address and port to Kafka server (for example, localhost:9092). kafka-consumer-groups is a utility that ships with Kafka.

for grp in $(./kafka-consumer-groups.sh --bootstrap-server $KAFKA_BROKER --list) ; 
do echo "group: $grp"; 
./kafka-consumer-groups.sh --bootstrap-server $KAFKA_BROKER --group $grp --describe; 
done

Testing Your Configuration

Test your HA system to see if it is working properly.

Test Case 1

Start your Kafka broker, then start your OmniSci servers. Create a table on OmniSci Server 1 named idtbl, which contains a single integer ID column. If HA is configured properly, the table is pushed to OmniSci Server 2.

OmniSci Server 1OmniSci Server 2
omnisql>\t
omniscql>
omnisql>\t
omniscql>
omnisql> create table idtbl (id integer); omnisql>\t idtbl omnisql> omnisql>\t idtbl omnisql>

Test Case 2

Insert some values to idtbl on OmniSci Server 1 using omnisql.

OmniSci Server 1OmniSci Server 2
omnisql> \t idtbl omnisql> insert into idtbl values (1); omnisql> insert into idtbl values (2); omnisql> insert into idtbl values (3); omnisql> \timing omnisql> select * from idtbl; id 1 2 3 3 rows returned. Execution time: 42 ms, Total time: 42 ms omnisql> omnisql> \timing omnisql> select * from idtbl; id 1 2 3 3 rows returned. Execution time: 40 ms, Total time: 41 ms omnisql>
omnisql> truncate table idtbl; omnisql> omnisql> select * from idtbl; No rows returned. Execution time: 142 ms, Total time: 142 ms omnisql>
omnisql> insert into idtbl values(12345); Execution time: 54 ms, Total time: 61 ms omnisql> omnisql> select * from idtbl; id 12345 1 rows returned. Execution time: 33 ms, Total time: 33 ms omnisql>