Why Data Lineage Only Tells Part of the Story
Data Lineage has its moment in the sun, whereas Provenance tells the whole story.
Why Provenance Has the Real Depth Over Data Lineage in Information & Data Management
A new term has captured the corporate imagination: data lineage. Heralded as the solution for data trust and transparency, it promises to track data from its source to its destination, illuminating every transformation along the way. However, as it has its moment in the sun, it overshadows provenance which captures the the real depth and the full story for data and information management.
However, in our rush to adopt this technical solution, we risk overlooking a much older, deeper, and more powerful concept from which it partially derives, i.e. provenance. As an information scientist with one foot in the world of archives, and the other in modern data strategy, I argue that while data lineage provides a pathway, provenance provides the full depth of the story of the journey. In the complex world of information, the story is what truly matters for establishing trust, meaning, and long-term value. To understand why, we must first delineate the two concepts.
Defining the Terms: The Path vs. The Story
Data Lineage is fundamentally a technical concept. It is concerned with the lifecycle of a data point within a system or series of systems. It answers questions like:
Where did this specific data element originate? (e.g., a specific database, a sensor feed, a user input form).
What transformations, calculations, or aggregations were applied to it? (e.g.,
SUM(),JOIN, an ETL script).Which systems has it passed through on its way to this report or dashboard?
Think of data lineage as a parcel's tracking history. You can see it left the warehouse, was processed in a sorting facility, and was loaded onto a delivery truck. It’s an operational record of movement and processing.
Provenance, on the other hand, is a foundational principle of archival and information science. It refers to the origin, custody, and history of a record or collection of records. It is concerned with the entire context of information, not just the data points. Provenance answers much deeper, more consequential questions:
Who created this information, and what was their authority to do so?
Why was it created? What business function, process, or purpose did it serve?
When and where was it created and used?
How has it been managed, maintained, and preserved over time? Who has had custody of it, and was that chain of custody secure?
What is its relationship to other records created by the same entity?
If data lineage is the parcel's tracking history, provenance is the full story: who sent the parcel and why, what the contents mean, why they were chosen, and the guarantee that the parcel wasn't tampered with from the moment it was sealed.
Key Differences
1. Provenance is Wider: Beyond Structured Data
Data lineage is most at home in the world of structured databases and data warehouses. It excels at tracking a customer ID through various tables and ETL jobs. Provenance, however, applies to all information assets, as an integrated whole. The provenance of a signed contract, an engineering blueprint, a board meeting's minutes, or a critical email chain is essential to its legal and business value. These unstructured and semi-structured records, which constitute the majority of an organization's knowledge base, fall outside the typical scope of data lineage tools but are the primary concern of a provenance-based framework, and the structured data derives much of its meaning, authority, and context from these related unstructured and semi-structured records. Therefore we must focus on the value of the records as a whole, rather than the individual formats, or systems, they have been captured within.
2. Provenance is Richer: The Power of Context
Data lineage can tell you that a sales figure was aggregated from five regional databases. This is useful. Provenance, however, can tell you why. It connects that sales figure to the quarterly reporting mandate issued by the CFO, clarifies that it was compiled by the official sales operations team following established procedures, and authenticates it as the official record of performance for that period.
This context is the difference between data and information. Without it, we risk misinterpretation. Was a sudden spike in a dataset due to a genuine business event or a data entry error during a system migration? Lineage might show the migration, but provenance, documenting the purpose and authority of the process,provides the evidence needed to make an informed judgement in the full business context.
3. Provenance is Deeper: The Bedrock of Trust
Trust in information is not just about correct calculations. It’s about authenticity and reliability. The archival concept of the chain of custody, a core component of provenance, is critical here. It provides an audit trail of who has held and managed the information over time, ensuring it has not been inappropriately altered or tampered with. This is the standard required for legal evidence, regulatory compliance (like FDA 21 CFR Part 11), and long-term historical records. Data lineage can verify a transformation rule was applied correctly, but provenance can verify that the rule was applied by an authorised person and that the record has been maintained with integrity ever since.
4. Provenance is Far-Reaching: Ensuring Future Value
Data lineage is often implemented to solve a present-day operational problem, such as debugging a faulty report. Provenance is built for the long term. It ensures that information remains intelligible, usable, and trustworthy long after the original systems, software, and personnel are gone. By preserving the context of creation and use, we enable future generations (or future AI algorithms) to understand not just what a number is, but what it meant.
Conclusion: We Need Both, But Led by Provenance
This is not an argument to abandon data lineage. On the contrary, automated data lineage tools are a solid mechanism for capturing one specific, technical facet of an information asset’s history. They are a component of a modern provenance strategy.
The danger lies in believing that data lineage alone is sufficient. To treat it as the whole story is to manage a library by only tracking the movement of books between shelves, without knowing who wrote them, what they are about, or why they are arranged in a particular order.
For true, resilient, and trust-based information management, we must elevate our thinking. We must build frameworks where technical data lineage feeds into a broader, richer record of provenance. We must move beyond the operational trail and embrace the full story of our information. Only then can we truly call ourselves a data-driven,and knowledge-powered organisations.
A Deeper-Dive: Why Data Lineage is Only the Tip of the Iceberg of Provenance
Introduction
In the modern, data-driven landscape, "data lineage" has emerged as the dominant paradigm for tracking the complex journey of data through vast corporate ecosystems.1 It is an indispensable tool for navigating the intricate data pipelines of the 21st century, offering visibility into the mechanics of information flow. Yet, this focus on movement often obscures a more fundamental question, is the context of the data itself trustworthy? While lineage can map a path, it struggles to validate the pedigree, authenticity, and contextual meaning of the information it tracks. This exposes a critical gap between traceability and trustworthiness.
I would argue that data lineage, while a useful technological practice, represents a narrow subset of the much richer, more profound, and legally critical concept of archival provenance.3 For centuries, the discipline of archival science has developed a robust framework for ensuring the integrity and evidential value of records. A short-sighted neologist focus on data lineage risks creating information systems that are traceable but not trustworthy, auditable in flow but not in substance. To build truly robust and accountable information ecosystems, data governance professionals must look to the time-tested foundation of archival science. This analysis will deconstruct both concepts, perform a direct comparison, explore the multifaceted richness of provenance through its core tenets of context, authenticity, and integrity, and ground these principles in real-world scenarios. My objective is to demonstrate that provenance offers a deeper, more holistic framework for establishing the very quality that modern organisations seek, i.e. confidence in their data, and the context that turns it into actionable information.
Deconstructing Data Lineage: Mapping the Technical Journey
Data lineage is fundamentally the process of tracking how data moves and changes as it traverses an organisation's various systems, processes, and transformations.1 It provides a lifecycle view, explaining how data is obtained, modified, and used from its source to its final destination.2 Its primary function is to create a dynamic, operational map of the data supply chain.
The Role of Metadata
The mechanism behind data lineage is the collection and management of metadata, often (over-)simplified as "data about data”, at each stage of the information journey.2 This metadata documents the inputs, outputs, transformations, and system transitions that affect data throughout its lifecycle.6 Data lineage tools aggregate this metadata into a repository, allowing users to visualise and analyse the complete flow, often represented through diagrams (Data workflow maps) that show dependencies and connections across the data ecosystem.5
Primary Functions and Use Cases
The value of data lineage is primarily operational and diagnostic. Its most common applications include:
Root Cause Analysis: When data errors or quality issues arise, lineage provides a clear audit trail that allows data engineers to trace the problem back to its source. This capability is invaluable for debugging and expedites resolution.2
Impact Analysis: Before making changes to a data source, system, or transformation process, organisations can use lineage to identify all downstream systems, reports, and applications that will be affected. This foresight helps prevent unintended consequences and system failures.2
System Migration and Modernization: Lineage offers a detailed map of data dependencies, which is critical when planning complex initiatives like migrating data to a cloud warehouse or modernising legacy systems. It clarifies relationships between data objects and expedites the transition.2
Regulatory Compliance: Data lineage is frequently used to support compliance with regulations, e.g. General Data Protection Regulation (GDPR), by providing visibility into how sensitive data is handled, processed, and stored across the organisation.6
This focus on debugging, impact analysis, and troubleshooting reveals the fundamentally reactive nature of data lineage. It excels at explaining what went wrong after an error has occurred or what might happen if a change is made. This operational utility contrasts sharply with the proactive, foundational role of provenance, which aims to establish the trustworthiness of a record from the moment of its creation, long before it is ever used for analysis or reporting.
Furthermore, data lineage tools consistently define the "origin" of data in purely technical terms, such as the source system, e.g. a CRM, a database, an API, or a data lake.2 This technical definition stops at the machine. It answers the question, "What system created this data?" This is a profoundly different and shallower inquiry than that of archival provenance, which defines origin as the
creator—the individual, group, or organisation whose activities generated the record.10 Provenance pushes past the technical container to ask, "What entity, in the course of what activity, created this record and for what purpose?" This distinction highlights a fundamental difference in the depth of inquiry: the "origin" in lineage is a technical starting point, whereas the "origin" in provenance is a contextual anchor that gives the record its meaning and evidential force.
In essence, data lineage provides a crucial map of the data pipeline. It answers the questions "Where did this data come from?" and "How did it get here?" from a logistical and technical standpoint. However, it is fundamentally concerned with the mechanics of flow, not the intrinsic trustworthiness or meaning of the data itself.
Unveiling Provenance: The DNA of the Record
In stark contrast to the technical focus of data lineage, archival provenance is the complete history of a record's origin, custody, and ownership, which serves to establish its pedigree and authenticity.3 Originating from the French provenir ('to come from/forth'), the term refers to the chronology of an object's life, providing the contextual and circumstantial evidence for its creation and subsequent history.3 It is not merely a tracking mechanism but a comprehensive framework of principles designed to preserve the identity, context, and evidential integrity of a record.
The Foundational Principles
Two core principles, derived from the overarching concept of provenance, govern archival practice and distinguish it from other forms of information management:
Respect des Fonds (Principle of Provenance): This foundational principle, first codified in 18/19th century Revolutionary France, dictates that records originating from a single creating entity, i.e. an individual, family, or organisation, known as a fonds, must be kept together as an organic whole to maintain its integrity, context and original intent.12 They must not be intermingled with records from other creators, even if they pertain to the same subject.14 This practice preserves the unity of the records, ensuring that they continue to reflect the functions, activities, and structure of the entity that produced them.12 This stands in direct opposition to modern data management practices that often pool disparate data into a single data lake, thereby severing these crucial contextual ties. Provenance can still be achieved in these modern usage environments but it requires much greater care in maintaining the richness of the original order, and it’s capture within rich metadata schema, or preferably ontologies maintained independently but persistently related to the content.
Original Order: A corollary to respect des fonds, the principle of original order stipulates that records should be maintained in the sequence and arrangement established by their creator.11 This order is not arbitrary; it is itself a form of evidence, revealing the creator's workflows, priorities, and the implicit relationships between individual documents.14 This "archival bond", the network of interrelationships between records created during the same activity,is often as meaningful as the content of the records themselves, and it is imperative this context is maintained over time regardless of the subsequent uses the information is recontextualised within.11
The Purpose of Provenance
These principles are not academic formalities; they serve a profound purpose in safeguarding the value of information.
Protecting Integrity and Evidential Value: The primary goal of provenance is to protect the integrity of records as authentic evidence of the actions, transactions, and decisions that created them.12 To arbitrarily rearrange or decontextualise records is to obscure or even destroy their significance as evidence.12
Revealing Context and Meaning: Provenance is the key to unlocking a record's full meaning. An individual record can only be fully understood when viewed in its original context, alongside the other records from the same source and in the same sequence.12 Provenance provides and preserves this essential context, transforming isolated data points into meaningful knowledge.15
This distinction between "information" and "knowledge" is critical. Archival sources emphasize that a provenance-based approach enables the pursuit of "knowledge", a deep understanding of context, relationships, and evolution, rather than the mere retrieval of "information," which consists of isolated names, dates, and facts devoid of context.15 Data lineage, with its focus on automated mapping and retrieval, aligns perfectly with the goal of information management. Provenance, however, operates on a higher epistemological level; it helps one understand what a record means.
This philosophical difference is also reflected in the language used by each discipline. Archival theory repeatedly describes records as having an "organic nature," growing naturally out of the activities of their creator.12 In contrast, data management literature describes data moving through a "pipeline" where it is "processed" and "transformed".2 This reveals a fundamental divergence: records are viewed as a natural byproduct of an entity's existence, while data is treated as a raw material to be manufactured into a product. An "organic" record's primary value is its authentic connection to the activity that created it. A "manufactured" dataset's primary value is its utility for a specific downstream purpose. This explains why provenance is obsessed with preserving the original context, while lineage is focused on documenting the transformations. Provenance seeks to preserve the evidence of the past; lineage seeks to validate the process of creating a new asset for the future.
The Core Distinction: A Path vs. a Pedigree
The fundamental difference between the two concepts can be stated succinctly: data lineage tracks the path of data, while archival provenance establishes its pedigree.4 Lineage is concerned with the logistical flow of data through a technical pipeline, whereas provenance is concerned with the historical record of its authenticity and integrity.8 While data lineage provides a high-level, dynamic map of data's journey, provenance offers a deeper, more granular, and historically grounded record of its origins, modifications, and custody.8 Provenance answers not just "how" data changed, but "who" changed it and "why."
This distinction is best understood through analogy. Data lineage is like a package's shipping tracker: it shows every logistical hop from the warehouse to the final destination. Provenance, implemented as a legal chain of custody, is a legally binding document that records every individual who handled a piece of evidence, for what purpose, and at what time, ensuring its integrity is uncompromised for admission in court.3 Similarly, lineage is a road map of the data's journey, while provenance is its full biography, detailing its birth (creation context), its life experiences (use and modification), and its family tree (relationship to other records).
Some data management literature suggests an interdependence, stating that accurate lineage cannot be maintained without knowing the data's provenance.8 From an archival standpoint, this reveals a misunderstanding of the concepts' relationship. Lineage is not a co-equal concept but an outcome of proper provenance documentation. A complete provenance record, by definition, includes the "custodial history" and "chain of custody," which is the very path that lineage seeks to describe.3 Therefore, a complete provenance record inherently contains the data lineage. The reverse, however, is not true. A lineage map contains none of the other critical components of provenance, such as the creator's context, evidence of authenticity, or proof of integrity. The data management field views them as two related things to track, while the archival field understands that one (provenance) is the holistic concept that generates the other (lineage) as a natural byproduct of its rigorous documentation.
Comparing Data Lineage and Archival Provenance
The distinctions between data lineage and archival provenance are stark when compared directly. Data lineage centers on the technical movement and transformation of data as it travels across systems, while archival provenance is fundamentally concerned with the origin, context, and custodial history of a record. This leads to divergent primary goals: lineage aims for operational efficiency, debugging, and impact analysis, whereas provenance seeks to establish the record's authenticity, reliability, and evidential value.
Consequently, the scope of data lineage is a technical map of the data pipeline, the "how", answering logistical questions like "Where did the data come from?" and "What transformations did it undergo?". In contrast, provenance encompasses a complete historical and contextual record of the data's entire lifecycle, the "who," "what," "when," and "why"—addressing deeper inquiries such as "Who created this record and why?," "Is it authentic?," and "Has its integrity been maintained?". These differences are rooted in their respective origins, with lineage stemming from Computer Science and Data Engineering, and provenance from Archival Science, Law, and History. The analogy holds: data lineage is a package's shipping tracker, while provenance is a legally admissible chain of custody for evidence.
The Richness of Provenance: Context, Authenticity, and Integrity
The superiority of provenance as a framework for trust lies in its multi-dimensional nature. It is not a single practice but a synthesis of principles that together ensure an information object is meaningful, genuine, and unaltered. Data lineage, by itself, addresses none of these foundational pillars directly. Archival Provenance in the digital era is often characterised by it’s complex interrelationships, deep contextual metadata rooted in continuum thinking, rathet n the 20th Century more simplistic linear, unidirectional lifecycle model, ie. provenance must take account of records moving backwards, and forward through the ‘lifecycle’ jumping between phases, or existing simultaneously in in different contexts at once.
The Primacy of Context
Context refers to the surrounding circumstances of a record's creation, use, and maintenance, which are indispensable for its proper interpretation.17 Provenance is the primary archival tool for preserving this context.18 By adhering to the principles of respect des fonds and original order, the archival method preserves the intricate web of relationships, between the record and its creator, and between the record and other related records that give it meaning.12 Data lineage, in contrast, often strips data of this context by isolating it from its original collection and focusing solely on its subsequent technical journey.
Establishing Authenticity
Authenticity is the quality of a record being genuine, what it purports to be, and free from tampering or forgery.23 In the archival world, provenance provides the fundamental evidence for verifying authenticity.8 An unbroken and well-documented custodial history is the primary means of demonstrating that a record is the same one created by a specific entity at a specific time and has not been substituted or altered.15
To formalize this assessment, archivists employ the science of diplomatics, a discipline centered on the critical analysis of a document's genesis, form, and transmission to determine its authenticity.27 Diplomatics provides a rigorous, structured methodology for examining a record's internal and external characteristics, its language, formulae, script, and seals, to verify that it conforms to the practices of its purported time and creator.29 This deep, scientific analysis of a record's intrinsic properties goes far beyond the simple metadata tracking offered by data lineage.
Ensuring Integrity (The Concept of Fixity)
Integrity is the state of a record being whole, complete, and uncorrupted.26 In the digital realm, where information is intangible and easily altered, ensuring integrity requires a specific technical approach known as fixity. Fixity is the property of a digital file remaining unchanged over time.32 It is verified using cryptographic hash functions (such as MD5 or SHA-256) to generate a unique digital fingerprint, or "checksum," for a file.32 If even a single bit in the file is altered, the checksum will change dramatically, providing definitive proof of the change.32
This concept connects directly to provenance. Performing fixity checks at each point of transfer or custody in a digital workflow is the modern, technical implementation of maintaining an unbroken chain of custody. It is the mechanism by which the principle of provenance is enforced to protect the integrity and evidential value of digital records (and is the basis for the establishment of digital archives).34
Together, these three pillars of context, authenticity, and integrity, form a complete and robust framework for establishing trust. Provenance is the master concept that binds them together. The archival principles of respect des fonds and original order are designed to preserve context. The documentation of custodial history provides evidence for authenticity. The practice of fixity checking is the modern tool for ensuring integrity within that custodial chain. Data lineage may track alterations (a component of integrity), but it cannot verify if the original was genuine (authenticity) or what the data truly means (context).
The Richness of Provenance: Evidence, Custody, and Accountability
Beyond establishing the intrinsic qualities of a record, provenance provides the framework for its role in the world, particularly in contexts where accountability and proof are paramount. This involves a fundamental shift in perspective from viewing data as a mere asset to understanding records as evidence.
The Centrality of Evidence
From an archival perspective, records are not simply "data"; they are "evidence" of actions, transactions, and decisions.15 The primary purpose of the archival endeavor is to protect this "evidential value", the passive ability of a record to provide insight into the events and processes that led to its creation.15 This view contrasts sharply with the common perception in data management, where data is often treated as a fungible commodity to be cleansed, transformed, and aggregated for business intelligence, with little regard for its original evidential nature.
The Chain of Custody
The practical, procedural implementation of provenance, especially in high-risk environments, is the Chain of Custody. This documented custodial model is the chronological, legally-defensible documentation that records the sequence of custody, control, transfer, analysis, and disposition of an item, whether physical or electronic.36 While conceptually similar to data lineage in that it tracks movement, Cahin of Custody is fundamentally different in purpose and execution.
A proper Chain of Custody requires meticulous documentation of every hand-off, including unique identifiers, the names and signatures of all individuals involved, precise dates and times, the purpose of the transfer, and secure storage protocols.22 This process is not about simply logging a system-to-system transfer, as a lineage tool might do. It is about establishing a formal, unbroken chain of human and systemic accountability. It answers the question, "Who is responsible for this evidence at every moment of its existence?"
This reveals a critical distinction: data lineage is descriptive, while a Chain of Custody is performative. A lineage tool typically generates reports and documentation automatically, describing what happened to the data, often after the fact.7 A Chain of Custody, however, requires active participation; the act of signing a custodial form is an integral part of the evidence-handling process itself.22 The consequences of a failure in each system are also vastly different. A gap in a data lineage log is a technical problem to be debugged.2 A break in a Chain of Custody, such as a missing signature or an undocumented transfer, can render evidence inadmissible in court and cause an entire legal case to collapse.38 This highlights the immense gap between a system designed for technical assurance and a framework designed for legal and procedural proof.
Provenance in Practice: High-Risk Scenarios
The theoretical differences between lineage and provenance become starkly apparent when applied to real-world scenarios where the trustworthiness of information is not an academic concern but a legal, regulatory, or public health necessity. These cases demonstrate a critical "trustworthiness gap" between what data lineage provides and what professional practice demands.
Case Study: The Courtroom and E-Discovery
In a legal proceeding, the standard of proof is not mere traceability but legally defensible authenticity and integrity. Consider a civil litigation case where one party produces a set of emails as evidence.39 A data lineage tool could trace the path of these emails from the company's mail server, through an e-discovery collection tool, to their final production format.8 It would show the flow.
However, this is legally insufficient. The opposing counsel would immediately challenge the evidence, questioning whether the emails were altered before collection, who had access to them, and whether the collection process itself was forensically sound.38 The lineage map cannot answer these questions. Admissibility in court requires a full provenance record, including a meticulous Chain of Custody for the digital files, forensic imaging to preserve the original state, and metadata analysis to detect tampering.39 The legal system requires proof of provenance, not just a map of lineage.
Case Study: The Pharmaceutical Lab and Data Integrity
The U.S. Food and Drug Administration (FDA) frequently issues warning letters to pharmaceutical companies for failures in data integrity.43 A review of these letters reveals that the violations are catastrophic breakdowns of provenance. Common findings include the use of shared user accounts, the deletion of raw analytical data, a lack of audit trails, and the outright falsification of records and signatures.43
These are not simple "data quality issues." They represent a complete collapse of the principles of provenance:
Shared accounts destroy attribution (the "who").
Deleted raw data destroys originality and integrity.
A lack of audit trails destroys the custodial history.
Falsified signatures destroy authenticity.
A data lineage tool might successfully track the flow of this falsified data through the company's systems, but it would be utterly useless in preventing or even detecting the underlying fraud. Only a robust system and culture of provenance, with strict controls over data creation, modification, and custody, can ensure the data is trustworthy enough to protect public health.47
Case Study: The Financial Audit and Regulatory Compliance
In the financial sector, regulations like the Sarbanes-Oxley Act (SOX) mandate the existence of a verifiable audit trail, a detailed record that traces financial data back to its source transaction for verification.48 The requirements for such a trail are a direct implementation of provenance principles. It must document the origin of every transaction, record all subsequent modifications, ensure the immutability of the log, and control access to prevent tampering.21
A data lineage map can show that a number in a final financial report originated from a specific general ledger system. However, it cannot prove the legitimacy of the transactions within that system. A full, provenance-based audit trail is required to demonstrate who approved each transaction, to verify that the underlying source documents are authentic, and to prove that the records have not been altered. This is the crucial difference between tracing a number and verifying its integrity.
In all these domains, the core requirement is not just to see the data's flow but to be able to prove its state and history to a skeptical third party, a judge, a regulator, or an auditor. Data lineage provides internal visibility; provenance provides external defensibility.
Conclusion: Reclaiming Provenance in the Age of Big Data
Data lineage is a valuable but ultimately limited tool. It captures a single, logistical dimension of a record's existence, i.e. its flow. Archival provenance, by contrast, is a holistic framework for establishing trust, encompassing the critical dimensions of context, authenticity, integrity, evidential value, and accountability. The modern obsession with tracking data movement has led many organisations to conflate the map with the territory, believing that a clear data pipeline is synonymous with trustworthy data.
This "lineage-only" approach creates a significant risk. Organisations that invest heavily in lineage tools without embracing the deeper principles of provenance may be building "glass houses." Their data pipelines may be transparent, but the data flowing within them may lack the foundational trustworthiness required for critical decision-making, legal defense, and regulatory compliance. When challenged, these systems may prove to be traceable but not defensible.
The path forward requires data governance professionals, chief data officers, and system architects to look beyond their immediate technical disciplines to the centuries of knowledge codified in archival science. The challenge is not merely to manage data flow but to architect systems that create and preserve authentic, reliable, and contextually rich records from their inception. This means integrating archival principles into the core of data governance frameworks: preserving contextual sources (respect des fonds), implementing formal chain of custody protocols for critical data, and embedding rigorous authenticity and integrity checks (like fixity) throughout the data lifecycle.
The future of trusted data, responsible AI, and accountable enterprise does not lie in simply building more elaborate maps of our data swamps. It lies in applying the timeless archival wisdom of provenance to ensure the water is pure in the first place.
Sources
www.hpe.com, accessed September 1, 2025, https://www.hpe.com/us/en/what-is/data-lineage.html#:~:text=Data%20lineage%20is%20the%20process,through%20an%20organization's%20information%20ecosystem.
What Is Data Lineage? | IBM, accessed September 1, 2025, https://www.ibm.com/think/topics/data-lineage
Provenance - Wikipedia, accessed September 1, 2025, https://en.wikipedia.org/wiki/Provenance
What is the significance of provenance in data management? - Secoda, accessed September 1, 2025, https://www.secoda.co/blog/significance-of-provenance-in-data-management
Data lineage - Wikipedia, accessed September 1, 2025, https://en.wikipedia.org/wiki/Data_lineage
What is Data Lineage? | Cloudera, accessed September 1, 2025, https://www.cloudera.com/resources/faqs/data-lineage.html
What is Data Lineage? Why You Need It & Best Practices - Qlik, accessed September 1, 2025, https://www.qlik.com/us/data-management/data-lineage
Data Lineage vs Data Provenance: Nah, They Aren't Same! - Atlan, accessed September 1, 2025, https://atlan.com/data-lineage-vs-data-provenance/
What is Data Lineage? - Informatica, accessed September 1, 2025, https://www.informatica.com/resources/articles/what-is-data-lineage.html
en.wikipedia.org, accessed September 1, 2025, https://en.wikipedia.org/wiki/Provenance#:~:text=Provenance%20%E2%80%93%20also%20known%20as%20custodial,items'%20subsequent%20chain%20of%20custody.
Provenance and Original Order — Backlog • Archivists & Historians, accessed September 1, 2025, https://www.backlog-archivists.com/blog/provenance-and-original-order
Archives and Records Management Resources | National Archives, accessed September 1, 2025, https://www.archives.gov/research/alic/reference/archives-resources/principles-of-arrangement.html
Respect des fonds - Wikipedia, accessed September 1, 2025, https://en.wikipedia.org/wiki/Respect_des_fonds
Original Order and Provenance in Archival Arrangement - Lucidea, accessed September 1, 2025, https://lucidea.com/blog/original-order-and-provenance/
The Archival Paradigm: The Genesis and Rationales of Archival Principles and Practices, accessed September 1, 2025, https://www.clir.org/pubs/reports/pub89/archival/
Original order - Wikipedia, accessed September 1, 2025, https://en.wikipedia.org/wiki/Original_order
Kindred contexts: archives, archaeology, and the concept of provenance, accessed September 1, 2025, https://d-nb.info/1354093976/34
Archival context, provenance, and a tool to capture archival context ..., accessed September 1, 2025, https://www.researchgate.net/publication/384536727_Archival_context_provenance_and_a_tool_to_capture_archival_context
Kindred Contexts: Archives, Archaeology, and the Concept of Provenance - Illinois Experts, accessed September 1, 2025, https://experts.illinois.edu/en/publications/kindred-contexts-archives-archaeology-and-the-concept-of-provenan
What is Data Provenance? | IBM, accessed September 1, 2025, https://www.ibm.com/think/topics/data-provenance
Data Provenance 101: The History of Data and Why It's Different From Data Lineage, accessed September 1, 2025, https://www.zendata.dev/post/data-provenance-101-the-history-of-data-and-why-its-different-from-data-lineage
Chain of Custody - StatPearls - NCBI Bookshelf, accessed September 1, 2025, https://www.ncbi.nlm.nih.gov/books/NBK551677/
authenticity - SAA Dictionary, accessed September 1, 2025, https://dictionary.archivists.org/entry/authenticity.html
Archival science - Wikipedia, accessed September 1, 2025, https://en.wikipedia.org/wiki/Archival_science
Digital Preservation Authenticity and Provenance, accessed September 1, 2025, http://www.ifs.tuwien.ac.at/~andi/teaching/dp/DP13_Authenticity_Provenance.pdf
Authenticity and Provenance in Long-Term Digital Preservation: Analysis of the Scope of Content - ResearchGate, accessed September 1, 2025, https://www.researchgate.net/publication/330287457_Authenticity_and_Provenance_in_Long-Term_Digital_Preservation_Analysis_of_the_Scope_of_Content
Diplomatics: New Uses for an Old Science - Archivaria, accessed September 1, 2025, https://archivaria.ca/index.php/archivaria/article/download/11567/12513/0
Diplomatics - Wikipedia, accessed September 1, 2025, https://en.wikipedia.org/wiki/Diplomatics
archivaria.ca, accessed September 1, 2025, https://archivaria.ca/index.php/archivaria/article/download/11567/12513/0#:~:text=diplomatics%20is%20the%20discipline%20which,and%20communicate%20their%20true%20nature.
Diplomatics | Definition, History, Characteristics, & Facts - Britannica, accessed September 1, 2025, https://www.britannica.com/topic/diplomatics
authenticity of digital records from theory to practice - UBC Library Open Collections, accessed September 1, 2025, https://open.library.ubc.ca/soa/cIRcle/collections/ubctheses/24/items/1.0166169
Protect Your Data: File Fixity and Data Integrity | The Signal, accessed September 1, 2025, https://blogs.loc.gov/thesignal/2014/04/protect-your-data-file-fixity-and-data-integrity/
SAA Dictionary: fixity - Society of American Archivists, accessed September 1, 2025, https://dictionary.archivists.org/entry/fixity.html
Fixity and checksums - Digital Preservation Handbook, accessed September 1, 2025, https://www.dpconline.org/handbook/technical-solutions-and-tools/fixity-and-checksums
What is Fixity, and When Should I be Checking It? - Digital Preservation, accessed September 1, 2025, https://www.digitalpreservation.gov/documents/NDSA-Fixity-Guidance-Report-final100214.pdf
Chain of Custody: Avoiding Data Disasters - SecureScan, accessed September 1, 2025, https://www.securescan.com/articles/records-management/securing-the-chain-of-custody/
Chain of custody - Wikipedia, accessed September 1, 2025, https://en.wikipedia.org/wiki/Chain_of_custody
Challenging Chain of Custody in Digital Document Investigations - Leppard Law: Federal Criminal Lawyers, accessed September 1, 2025, https://federal-criminal.com/obstruction/challenging-chain-of-custody-in-digital-document-investigations/
Challenging Chain of Custody in Altered Record Cases - Leppard Law, accessed September 1, 2025, https://leppardlaw.com/federal/obstruction/challenging-chain-of-custody-in-altered-record-cases/
Maintaining the Digital Chain of Custody - Challenges to Address - Page Vault Resources, accessed September 1, 2025, https://blog.page-vault.com/digital-chain-of-custody
Best Practices for Maintaining Chain of Custody for Digital Evidence - Vidizmo, accessed September 1, 2025, https://vidizmo.ai/blog/chain-of-custody-for-digital-evidence
Broken Chain of Custody: Factors, Legal Consequences and Prevention, accessed September 1, 2025, https://digitalevidence.ai/blog/broken-chain-of-custody
Data Integrity: A hot topic in FDA Warning Letters - PharmOut, accessed September 1, 2025, https://www.pharmout.net/data-integrity-fda/
FDA finds data integrity problems in recent warning letters - RAPS, accessed September 1, 2025, https://www.raps.org/news-and-articles/news-articles/2025/3/fda-finds-data-integrity-problems-in-recent-warnin
Intas Pharmaceuticals Limited - 652067 - 07/28/2023 - FDA, accessed September 1, 2025, https://www.fda.gov/inspections-compliance-enforcement-and-criminal-investigations/warning-letters/intas-pharmaceuticals-limited-652067-07282023
Data Integrity Case Studies - Parenteral Drug Association, accessed September 1, 2025, https://www.pda.org/docs/default-source/website-document-library/chapters/presentations/brazil/pharma-trends-day-1/data-integrity-case-studies.pdf?sfvrsn=57f6838e_1
FDA to pharmaceutical companies: Certain studies conducted by Raptim Research Pvt. Ltd. are unacceptable, accessed September 1, 2025, https://www.fda.gov/drugs/drug-safety-and-availability/fda-pharmaceutical-companies-certain-studies-conducted-raptim-research-pvt-ltd-are-unacceptable
Comprehensive Guide to Audit Trails: Tracking, Types, and Real-World Examples, accessed September 1, 2025, https://www.investopedia.com/terms/a/audittrail.asp
Creating Verifiable Audit Trails for Legal Compliance - Aaron Hall, Attorney, accessed September 1, 2025, https://aaronhall.com/creating-verifiable-audit-trails-for-legal-compliance/
Exploring Data Provenance: Ensuring Data Integrity and Authenticity - Astera Software, accessed September 1, 2025, https://www.astera.com/type/blog/data-provenance/
Tracking Data Provenance to Ensure Data Integrity and Compliance - Acceldata, accessed September 1, 2025, https://www.acceldata.io/blog/data-provenance



Thanks for this article, bringing it back to my archival roots is what I needed to remember to frame it for today. Archivist turned taxonomist/ontologist