Python JayDeBeApi¶

MapD Core supports Python using JayDeBeApi.

The $MAPD_PATH/SampleCode/mapd_jdbc.py script wraps jaydebeapi and returns a standard Python Connection object. The mapd_jdbc.py script depends on the MapD JDBC driver, mapdjdbc-1.0-SNAPSHOT-jar-with-dependencies.jar, residing in the same directory. You can create a cursor object using the returned connection object. Make sure you close the connection at the end of your script.

Installation¶

Ensure that jaydebeapi and dependencies are installed by running:

pip install jaydebeapi
pip install pandas
pip install matplotlib

The jar file is $MAPD_PATH/bin/mapdjdbc-1.0-SNAPSHOT-jar-with-dependencies.jar.

The host is <machine>:<port>, using standard port 9091.

Example¶

The example code can be found in the $MAPD_PATH/SampleCode/mapd_jdbc_example.py file.

This example uses the mapd_jdbc wrapper to query MapD Core and plot the results using pyplot:

Sequence¶

Key steps are:

Connect to the database:

mapd_con = mapd_jdbc.connect(dbname=dbname, user=user, host=host, password=password)

Get a database cursor:

mapd_cursor = mapd_con.cursor()

Query the database:

query = "select carrier_name, avg(depdelay) as x, avg(arrdelay) as y from flights_2008 group by carrier_name"

mapd_cursor.execute(query)

Get the result set:

results = mapd_cursor.fetchall()

Make the results a pandas DataFrame:

df = pandas.DataFrame(results)

Generate a scatterplot from the results:

plt.scatter(df[1],df[2])

plt.show()

Source Code¶# !/usr/bin/env python
# Note: The following example should be run in the same directory as map_jdbc.py
# and mapdjdbc-1.0-SNAPSHOT-jar-with-dependencies.jar

import mapd_jdbc
import pandas
import matplotlib.pyplot as plt

dbname = 'mapd'
user = 'mapd'
host = 'localhost:9091'
password = 'HyperInteractive'

mapd_con = mapd_jdbc.connect(dbname=dbname, user=user, host=host, password=password)

mapd_cursor = mapd_con.cursor()

query = "select carrier_name, avg(depdelay) as x, avg(arrdelay) as y from flights_2008 group by carrier_name"

mapd_cursor.execute(query)

results = mapd_cursor.fetchall()

df = pandas.DataFrame(results)

plt.scatter(df[1],df[2])

plt.show()