IBM Watson Data & AI - Structured Ideas

Welcome to the idea forum for IBM Watson Data & AI — our team welcomes any feedback, requests, and suggestions you have for improving our products! 

This forum allows us to connect your product improvement ideas with IBM product and engineering teams. 

Support AVRO and ORC file formats

We currently support Parquet, but AVRO and ORC are popular formats and bring unique properties to big data solutions that are not provided by Parquet, for example:

  • AVRO is suitable for data that is queried a whole row at a time.  It also supports schema evolution.  This makes it a good format for raw data such as in the landing zone from which other datasets are derived in other formats (such as those in the query zone and which are frequently queried by users).
  • ORC is similar in concept to Parquet, but it additionally supports transactional reads and writes making it suitable for Hive Streaming use cases.

For more information, see: https://www.slideshare.net/StampedeCon/choosing-an-hdfs-data-storage-format-avro-vs-parquet-and-more-stampedecon-2015

All Watson Data and related services should support these formats, e.g.

  • Watson Knowledge Catalog
  • Data Refinery
  • Cloud SQL Query
  • Streams Designer
  • Watson Studio
  • Chris Snow
  • Jun 10 2018
  • Needs review
Role Summary
  • Attach files