IBM Watson Data & AI - Structured Ideas

This Ideas portal is being closed.  Please enter new idea at http://ibm.biz/IBMAnalyticsIdeasPortal

Provide common Data Cleansing tools as part of the suite of data transformation services available

Content and data sets are acquired from many sources:

 - Public Domain data

 - Licensed content

 - Digital Exhaust

 - Client data

 - Companies own internal data assets

Content engineers and Data Scientists perform many unique cleaning processes to get these data sets prepared for their analytics and model training experiments. This request has 2 parts: 

 

Common Tooling and Scripts provided as part of a Data Refinery toolkit

As part of this preparation there are many common tools and processes that Content engineers and Data Scientists use in their own environment. This request is to make tools such as virus and malware scanners, copyright scanners, PII (Personally identifiable information) scanners and PII pattern scanners, available as part of the data transformation suite that can be used in projects and workflows to perform cleaning on data sets. With the vast array of data sources available today, it's critical that all Content follows best practices for security and compliance before allowing this content to be added to companies Repositories, Data Lakes and Catalogs.   

 

User Specific Cleaning and Data Preparation tools 

In addition to "Common" tools and processes, every team has unique scripts and procedures specific to their data formatting and data preparation requirements. This second part of the request is to enable the ability for a user to bring and add their custom scripts or docker containers that they have developed and hardened for years to projects and workflows. Making it easy to bring the tools and processes engineers and data scientists use in their environment today allows a faster easier adoption experience.

  • Gary Diamanti
  • Dec 14 2018
  • Needs review
Why is it useful?
Who would benefit from this IDEA? Content Administrator, Content Engineer, Data Scientist
How should it work?
Idea Priority High
Priority Justification
Customer Name
Submitting Organization Development
Submitter Tags
  • Attach files