My clients have requested a data classification field for a data asset which has a configurable domain of values. For example, an administrator for a customer can define this field to have Public, Confidential, Highly Confidential and Restricted (4 values) data classifications which correspond to their corporate standards. Other customers may have 5 or 6 different data classifications. It would be valuable if the customer could also specify the business rules/machine learning algorithms corresponding to these data classifications such that they could state Restricted data classification contains PII data and therefore automatic rules could be applied to look for PII data. This automatic classification scheme would be applied when the data asset is first created and then the user could change it with the appropriate privileges. Other predefined rules could include searches for dollar values (might be a bill of materials and therefore highly confidential). Would also be desirable if there was a way to customize write rules using R or Python where the administrator could define a rule and their own code to the rule and then select that rule as part of the auto classification options for a specific data classification. Still other predefined rules in the future would be looking for location information like IP addresses. In addition to auto-classification of the data classification field the rules could automatically add metadata to the description field which would help for searching. Consider it auto-tagging or metadata generation.
Why is it useful?
|Who would benefit from this IDEA?||Data Stewards, Business Analysts and Data Scientists|
How should it work?
|Submitting Organization||F2F Sales|