by Tirthajyoti Sarkar

How to set up PySpark for your Jupyter notebook

AKvT2RBna4K4Y0Yp0-vHrTZW2ZuMQR545HjF

Apache Spark is one of the hottest frameworks in data science. It realizes the potential of bringing together both Big Data and machine learning. This is because:

  • Spark is fast (up to 100x faster than traditional Hadoop MapReduce) due to in-memory operation.
  • It offers robust, distributed, fault-tolerant data objects (called RDDs)