![]() ![]() ![]() Thus, data lineage can be broadly divided into three categories based on the way metadata is harvested: data lineage involving software packages for structured data, programming languages, and big data.ĭata lineage information includes technical metadata involving data transformations. Even though the final representation of data lineage is provided in one interface but the way the metadata is harvested and exposed to the data lineage graphical user interface could be entirely different. Data quality, and master data management helps in enriching the data lineage with more business value. As the granularity increases it goes up to the data point level where it can provide the details of the data point and its historical behavior, attribute properties, and trends and data quality of the data passed through that specific data point in the data lineage.ĭata governance plays a key role in metadata management for guidelines, strategies, policies, implementation. At a very high level data lineage provides what systems the data interacts before it reaches destination. ![]() Data Lineage can be visualized at various levels based on the granularity of the view. Usually, data governance, and data management determines the scope of the data lineage based on their regulations, enterprise data management strategy, data impact, reporting attributes, and critical data elements of the organization.ĭata lineage provides the audit trail of the data points at the highest granular level, but presentation of the lineage may be done at various zoom levels to simplify the vast information, similar to analytic web maps. The scope of the data lineage determines the volume of metadata required to represent its data lineage. Data lineage also enables companies to trace sources of specific business data for the purposes of tracking errors, implementing changes in processes, and implementing system migrations to save significant amounts of time and resources, thereby tremendously improving BI efficiency. Tools that have the masking feature enable scalability of the view and enhance analysis with the best user experience for both technical and business users. Thus, the best feature of the data lineage view would be to be able to simplify the view by temporarily masking unwanted peripheral data points. As the data points or hops increase, the complexity of such representation becomes incomprehensible. These views can be combined with end-to-end lineage for a reference point that provides a complete audit trail of that data point of interest from sources to their final destinations. Data lineage provides sources of the data and intermediate data flow hops from the reference point with backward data lineage, leading to the final destination's data points and its intermediate data flows with forward data lineage. Representation broadly depends on the scope of the metadata management and reference point of interest. A simple representation of the Data Lineage can be shown with dots and lines, where dot represents a data container for data points and lines connecting them represents the transformations the data point undergoes, between the data containers. " Lineage is a simple type of why provenance." ĭata lineage can be represented visually to discover the data flow/movement from its source to destination via various changes and hops on its way in the enterprise environment, how the data gets transformed along the way, how the representation and parameters change, and how the data splits or converges after each hop. The generated evidence supports forensic activities such as data-dependency analysis, error/compromise detection and recovery, auditing, and compliance analysis. Data provenance refers to records of the inputs, entities, systems, and processes that influence data of interest, providing a historical record of the data and its origins. Database systems use such information, called data provenance, to address similar validation and debugging challenges. It also enables replaying specific portions or inputs of the data flow for step-wise debugging or regenerating lost output. Data lineage gives visibility while greatly simplifying the ability to trace errors back to the root cause in a data analytics process. ![]() ( May 2015) ( Learn how and when to remove this template message)ĭata lineage includes the data origin, what happens to it, and where it moves over time. Please remove or replace such wording and instead of making proclamations about a subject's importance, use facts and attribution to demonstrate that importance. This article contains wording that promotes the subject in a subjective manner without imparting real information. ![]()
0 Comments
Leave a Reply. |