IBM Watson Data & AI - Structured Ideas

Welcome to the idea forum for IBM Watson Data & AI — our team welcomes any feedback, requests, and suggestions you have for improving our products! 

This forum allows us to connect your product improvement ideas with IBM product and engineering teams. 

Provide Meta -Data and -Analysis Catalog Management

Provide meta -data and -analysis catalog and associated management and collaboration capabilities to enable both the description of data elements with respect to inherent structure (e.g. domain, range of numeric fields) as well as relative structure (e.g. text fields representing enumerated set members: dog, cat, other), as well as consumption, modification, and version control across semantic release levels (n.b. see

A simple use-case is a column in a table containing text in which the contents represent an enumerated set (e.g. red, green, blue).  The UX would include the following major steps:

PART A: Building the Catalog

1) Select a column displayed in the user interface (e.g. "Color (Type: String)") and inspect the range of "String" (e.g. count, unique, most common, least common, unspecified); see CSVKIT's csvstat(1) command for example output

2) Find in a faceted catalog a defined entity (e.g. "Color.Pantone.XXX") or create a new entity (e.g. "color") that represents the enumerated type; n.b. matching range  (e.g. { red,green,blue,.. })

3) Assign selected enumerated type to column (n.b. enable additional type specific function in future)

3a) Identify aberrant data and cleanse

3b) Optimize encoding of enumerated set, e.g. bit-wise encoded (n.b. lazy evaluation)

4) Track provenance, control/versions appropriately, and repeat (w/ community, including open/shared entries in catalog for common entities and third-party entries for industry specific)

PART B: Using the Catalog

5) Quickly identify information of interest via full-text as well as faceted search (n.b. Amazon shopping)

6) Understand provenance, semantics, domain, range, etc.. and availability of information identified

7) Find and utilize information-associated methods and apparatus to access, transform, analyze/train, visualize, and inspect (e.g. Jupyter notebook requiring Parquet in COS of inputs { X, Y, Z } with X, Y, Z being defined in the Catalog or SystemML script).

8) Automatically generate  a plan to consume from available information and present as either "cheap" or "quick" options (maybe some intervals in-between, depending on plans available, ...)

  • Guest
  • Dec 14 2018
  • Needs review
Role Summary
  • Attach files
  • Guest commented
    December 14, 2018 17:58

    There are a lot of parts in there, but the basics start with breaking down my Strings into enumerated sets; everything else just cascades from there.