How to Setup IPython Notebook with Spark 1.5 in a Minute

by Shahid Ashraf

IPython Notebook provides a browser-based notebook with support for code, text, mathematical expressions, inline plots and other media as well support for interactive data visualization.This tool provides users ability to create rich content  documents with embedded source code with very little effort. In 2014, Fernado Perez announced a spin-off project from IPython called Project Jupyter. IPython will continue to exist as a Python shell and a kernel for Jupyter, while the notebook and other language-agnostic parts of IPython will move under Jupyter. Jupyter added support for Julia, R, Haskell and Ruby.

In this post, we will see how to install IPython Notebook and quickly start using Ipython notebooks with pyspark. (You can read more about achieving this, here and here.) However, they did not work perfectly on Spark greater than 1.4.0, for my required configuration.

Installing IPython

pip install ipython

If you are unable to install using the command, visit install ipython.

Set Up Spark

After downloading Spark, set SPARK_HOME to spark installation path.

e.g add to .zshrc or bash profile

Export SPARK_HOME=/Users/shahid/projects/spark-1.5.1-bin-hadoop2.6

pip install findspark

Start ipython notebook by following command,

ipython notebook

A main page will open on the browser:


Create new notebook and add following in cell one,

import findspark
import pyspark
sc = pyspark.SparkContext(appName="first spark based notebook")

print sc

If all is successful, it will print sparkContext object.

This is how we can quickly start using Apache Spark in IPython notebooks without messy configurations.

To learn more about

Contact Us

For Feedback and Queries shoot me email: Follow me on twitter


Leave a Reply

Your email address will not be published. Required fields are marked *

Data Science & PopHealth

Methods, tools, systems for healthcare data analysis

Contact us now

Popular Posts