OmniSci Data Science Foundation

An Overview of OmniSci Integrated Data Science Foundation

OmniSci provides an integrated data science foundation built on several open-source components of the PyData stack. This set of tools is integrated with OmniSci Immerse and allows users to switch from dashboards to an integrated notebook environment connected to OmniSciDB in the background. You can switch from visual data exploration with Immerse to a deeper dive on a specific dataset, build predictive models using standard python-based data science libraries and tools, and push results back into OmniSciDB for use with Immerse.

Several components make up the OmniSci data science foundation.

JupyterLab

OmniSci provides deep integration with JupyterLab, the next-generation version of the most popular notebook environment and workflow used by data scientists for interactive computing. You can access JupyterLab by clicking an icon in Immerse.

JupyterLab access in Immerse
JupyterLab access from SQLEditor

In addition to the seamless integration with Immerse, you can also use JupyterLab with OmniSci by creating an explicit connection object, either via the pymapd API.

>>> from pymapd import connect
>>> con = connect(user="admin", password="HyperInteractive", host="localhost",
... dbname="omnisci")
>>> con
Connection(mapd://admin:***@localhost:6274/omnisci?protocol=binary)

or via the Ibis API, which builds on pymapd.

con = ibis.omniscidb.connect(
host='omniscidb',
database='ibis_testing',
user='admin',
password='HyperInteractive',
)

For more information, see the JupyterLab documentation.

Ibis

Ibis is a productivity API for working in Python and analyzing data in remote SQL-based data stores such as OmniSciDB. Inspired by the pandas toolkit for data analysis, Ibis provides a Pythonic API that compiles to SQL. Combined with OmniSciDB scale and speed, Ibis offers a familiar but more powerful method for analyzing very large datasets "in-place."

Ibis supports multiple SQL databases backends, and also supports pandas as a native backend. Combined with Altair, this integration allows you to explore multiple datasets across different data sources.

Altair

Altair is another key component of the OmniSci data science foundation. Building on the same Vega data visualization engine used by Immerse for geospatial charts, Altair provides a pythonic API over Vega-Lite, a subset of the full Vega specification for declarative charting based on the "Grammar of Graphics" paradigm. The OmniSci data science foundation goes further and includes interface code to enable Altair to transparently use Ibis expressions instead of pandas data frames. This allows data visualization over much larger datasets in OmniSci without writing SQL code.

NVIDIA RAPIDs

The Nvidia RAPIDs toolkit is a collection of foundational libraries for GPU-accelerated data science and machine learning. It includes popular algorithms for clustering, classification, and linear models, as well as a GPU-based dataframe (cudf). OmniSci allows configurable output to cudf from any query (including via Ibis or pymapd), so you can quickly run machine-learning algorithms on top of query results from OmniSci.

Other Tools and Utilities

In addition, the data science foundation Docker container includes Facebook's Prophet library for forecasting, and Prefect, a lightweight but powerful workflow engine that enables you to build and manage workflows in Python.