IBM Cloud Databases - Structured Ideas

This Ideas portal is being closed.  Please enter new idea at http://ibm.biz/IBMAnalyticsIdeasPortal

Add support for pyarrow in spark environment in watson studio

Add support for pyarrow in spark environment in watson studio

  • charles
  • Apr 5 2019
  • Needs review
Why is it useful?
Who would benefit from this IDEA?
How should it work?

"Apache Arrow is an in-memory columnar data format that is used in Spark to efficiently transfer data between JVM and Python processes. This currently is most beneficial to Python users that work with Pandas/NumPy data. Its usage is not automatic and might require some minor changes to configuration or code to take full advantage and ensure compatibility. This guide will give a high-level description of how to use Arrow in Spark and highlight any differences when working with Arrow-enabled data."

Pyarrow is documented in spark 2.3.0 API Docs

https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#ensure-pyarrow-installed"

Idea Priority Medium
Priority Justification
Customer Name
Submitting Organization
Submitter Tags
  • Attach files