We currently support Parquet, but AVRO and ORC are popular formats and bring unique properties to big data solutions that are not provided by Parquet, for example:
- AVRO is suitable for data that is queried a whole row at a time. It also supports schema evolution. This makes it a good format for raw data such as in the landing zone from which other datasets are derived in other formats (such as those in the query zone and which are frequently queried by users).
- ORC is similar in concept to Parquet, but it additionally supports transactional reads and writes making it suitable for Hive Streaming use cases.
For more information, see: https://www.slideshare.net/StampedeCon/choosing-an-hdfs-data-storage-format-avro-vs-parquet-and-more-stampedecon-2015
All Watson Data and related services should support these formats, e.g.
- Watson Knowledge Catalog
- Data Refinery
- Cloud SQL Query
- Streams Designer
- Watson Studio