IBM Watson Data & AI - Structured Ideas

This Ideas portal is being closed.  Please enter new idea at http://ibm.biz/IBMAnalyticsIdeasPortal

Support discovering hadoop generated nested and partitioned data files as a single logical file

It is common for data generated in the WDP to be partitioned to support scalable reads.  At the moment, each file in the folder is catalogued as a separate asset, but the desired output is that the folder will be recognised as a logical file.

E.g.

/data/customer/
├── _SUCCESS
├── id=abc
│ ├── key=1
│ │ └── part-00003-12345-44cf-4091-9968-11111.c000.snappy.parquet
│ ├── key=2
│ │ └── part-00004-12345-44cf-4091-9968-11111.c000.snappy.parquet
│ └── key=3
│ └── part-00005-12345-44cf-4091-9968-11111.c000.snappy.parquet
└── id=def
├── key=1
│ └── part-00000-12345-44cf-4091-9968-22222.c000.snappy.parquet

I would just like to see this catalogued as a single file:

/customer
  • Chris Snow
  • Dec 14 2018
  • Accepted
Why is it useful?
Who would benefit from this IDEA? data engineer
How should it work?
Idea Priority High
Priority Justification
Customer Name
Submitting Organization F2F Sales
Submitter Tags