Python JayDeBeApi

MapD Core supports Python using JayDeBeApi.

The $MAPD_PATH/SampleCode/mapd_jdbc.py script wraps jaydebeapi and returns a standard Python Connection object. The mapd_jdbc.py script depends on the MapD JDBC driver, mapdjdbc-1.0-SNAPSHOT-jar-with-dependencies.jar, residing in the same directory. You can create a cursor object using the returned connection object. Make sure you close the connection at the end of your script.

Installation

Ensure that jaydebeapi and dependencies are installed by running:

pip install jaydebeapi
pip install pandas
pip install matplotlib

The jar file is $MAPD_PATH/bin/mapdjdbc-1.0-SNAPSHOT-jar-with-dependencies.jar.

The host is <machine>:<port>, using standard port 9091.

Example

The example code can be found in the $MAPD_PATH/SampleCode/mapd_jdbc_example.py file.

This example uses the mapd_jdbc wrapper to query MapD Core and plot the results using pyplot:

Sequence

Key steps are:

  1. Connect to the database:
mapd_con = mapd_jdbc.connect(dbname=dbname, user=user, host=host, password=password)
  1. Get a database cursor:
mapd_cursor = mapd_con.cursor()
  1. Query the database:
query = "select carrier_name, avg(depdelay) as x, avg(arrdelay) as y from flights_2008 group by carrier_name"

mapd_cursor.execute(query)
  1. Get the result set:
results = mapd_cursor.fetchall()
  1. Make the results a pandas DataFrame:
df = pandas.DataFrame(results)
  1. Generate a scatterplot from the results:
plt.scatter(df[1],df[2])

plt.show()

Source Code

# !/usr/bin/env python
# Note: The following example should be run in the same directory as map_jdbc.py
# and mapdjdbc-1.0-SNAPSHOT-jar-with-dependencies.jar

import mapd_jdbc
import pandas
import matplotlib.pyplot as plt

dbname = 'mapd'
user = 'mapd'
host = 'localhost:9091'
password = 'HyperInteractive'

mapd_con = mapd_jdbc.connect(dbname=dbname, user=user, host=host, password=password)

mapd_cursor = mapd_con.cursor()

query = "select carrier_name, avg(depdelay) as x, avg(arrdelay) as y from flights_2008 group by carrier_name"

mapd_cursor.execute(query)

results = mapd_cursor.fetchall()

df = pandas.DataFrame(results)

plt.scatter(df[1],df[2])

plt.show()