Python JayDeBeApi¶
MapD Core supports Python using JayDeBeApi.
The $MAPD_PATH/SampleCode/mapd_jdbc.py
script wraps jaydebeapi
and returns a standard Python Connection object. The mapd_jdbc.py
script depends on the MapD JDBC driver, mapdjdbc-1.0-SNAPSHOT-jar-with-dependencies.jar
, residing in the same directory. You can create a cursor object using the returned connection object. Make sure you close the connection at the end of your script.
Installation¶
Ensure that jaydebeapi
and dependencies are installed by running:
pip install jaydebeapi
pip install pandas
pip install matplotlib
The jar file is $MAPD_PATH/bin/mapdjdbc-1.0-SNAPSHOT-jar-with-dependencies.jar
.
The host is <machine>:<port>
, using standard port 9091.
Example¶
The example code can be found in the $MAPD_PATH/SampleCode/mapd_jdbc_example.py
file.
This example uses the mapd_jdbc
wrapper to query MapD Core and plot the results using pyplot:
Sequence¶
Key steps are:
- Connect to the database:
mapd_con = mapd_jdbc.connect(dbname=dbname, user=user, host=host, password=password)
- Get a database cursor:
mapd_cursor = mapd_con.cursor()
- Query the database:
query = "select carrier_name, avg(depdelay) as x, avg(arrdelay) as y from flights_2008 group by carrier_name"
mapd_cursor.execute(query)
- Get the result set:
results = mapd_cursor.fetchall()
- Make the results a pandas DataFrame:
df = pandas.DataFrame(results)
- Generate a scatterplot from the results:
plt.scatter(df[1],df[2])
plt.show()
Source Code¶
# !/usr/bin/env python
# Note: The following example should be run in the same directory as map_jdbc.py
# and mapdjdbc-1.0-SNAPSHOT-jar-with-dependencies.jar
import mapd_jdbc
import pandas
import matplotlib.pyplot as plt
dbname = 'mapd'
user = 'mapd'
host = 'localhost:9091'
password = 'HyperInteractive'
mapd_con = mapd_jdbc.connect(dbname=dbname, user=user, host=host, password=password)
mapd_cursor = mapd_con.cursor()
query = "select carrier_name, avg(depdelay) as x, avg(arrdelay) as y from flights_2008 group by carrier_name"
mapd_cursor.execute(query)
results = mapd_cursor.fetchall()
df = pandas.DataFrame(results)
plt.scatter(df[1],df[2])
plt.show()