IBM Watson Data & AI - Structured Ideas

This Ideas portal is being closed.  Please enter new idea at http://ibm.biz/IBMAnalyticsIdeasPortal

Support AVRO and ORC file formats

We currently support Parquet, but AVRO and ORC are popular formats and bring unique properties to big data solutions that are not provided by Parquet, for example:

  • AVRO is suitable for data that is queried a whole row at a time.  It also supports schema evolution.  This makes it a good format for raw data such as in the landing zone from which other datasets are derived in other formats (such as those in the query zone and which are frequently queried by users).
  • ORC is similar in concept to Parquet, but it additionally supports transactional reads and writes making it suitable for Hive Streaming use cases.

For more information, see: https://www.slideshare.net/StampedeCon/choosing-an-hdfs-data-storage-format-avro-vs-parquet-and-more-stampedecon-2015

All Watson Data and related services should support these formats, e.g.

  • Watson Knowledge Catalog
  • Data Refinery
  • Cloud SQL Query
  • Streams Designer
  • Watson Studio
  • Chris Snow
  • Dec 14 2018
  • Needs review
Customer Name
Role Summary
  • Attach files