Published on 2018-06-27 17:36 (LinkedIn: sethsanu)
Depending on who you ask there are three or more “V”’s of Big Data attributes: Volume, Velocity, and Variety (additionally Veracity, Value, Vicinity and any other pertinent V noun). Enterprise Data governance programs are set up to ensure the quality, safekeeping and optimal utilization of data. The principles of good data governance can be mapped to each “V” of big data:
· Volume: With larger amounts of structured and unstructured data being utilized, a data governance program should have a good inventory or both data and metadata. Dynamic data catalogs, data dictionaries and data profiles are becoming more commonplace in large scale data lakes just as they are in relational and analytical platforms.
· Velocity: As organizations ingest fast moving streams of data through IoT devices, social media and globally distributed applications, it’s important to manage the practicality of data in motion. Some consuming systems may need real time streams of data, while others may only require daily aggregations. Creating an understanding of how data is consumed by each application has value; do you know where your lambda architecture needs are?
· Variety: The schema of right now, may not be the same schema of tomorrow. The format of unstructured data may change, structured data platforms may migrate. Tools are available that can swiftly scan data for changes in structure and content and alert or adapt downstream data consumers. Well designed consuming systems are reactive and resolute to changing data with late binding schema designs or written for adaptability. The dashboard that consumes relational SQL data can be updated for Polybase queries against a data lake with little code churn.
There are additional governance considerations per industry, region and technology set. For a more comprehensive perspective, I highly recommend Sunil Soares's book "Big Data Governance: An Emerging Imperative" available from Amazon: