Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Digital Forensics XML

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Digital Forensics XML
Description

Digital Forensics XML (DFXML) is an Extensible Markup Language (XML) language designed to represent a wide range of forensic information and processing results. Since its inception in 2007, DFXML has served the purpose of archiving forensic processing steps, reducing the need for re-processing digital evidence. It acts as an interchange format, facilitating the sharing of structured information between independent tools and organizations. The format was created to establish a shared way for forensic software tools to exchange information.

DFXML captures metadata and provenance information about the operation of software tools. Initially designed to represent the output of digital forensics tools, particularly SleuthKit tools, DFXML has expanded to operate with bulk extractor digital forensics tools (DFXML Python and DFXML C++).

DFXML is versatile and capable of representing various forensic information such as disk images; files; file system metadata; moves, adds, and changes (MAC) times; file hashes; sector hashes; transmission control protocol (TCP) flows; and hash sets. It provides provenance details, including the origin of data, classification, use restrictions, and the tools employed in the forensic process. Moreover, DFXML can document provenance, including details about the computer on which the application program was compiled, linked libraries, and the runtime environment, proving useful in research and courtroom testimony.

DFXML plays a pivotal role in enhancing composability by providing a language for describing common forensic processes (e.g., cryptographic hashing), forensic work products (e.g., the location of files on a hard drive), and metadata (e.g., file names and timestamps). It serves as the basis for a Python module (dfxml.py), simplifying the creation of sophisticated forensic processing programs.

Production phase Middle-state and archival.
Relationship to other formats
    Subtype of XML, Extensible Markup Language

Local use Explanation of format description terms

LC experience or existing holdings None
LC preference The Library of Congress has not yet expressed any format preference for digital forensic data.

Sustainability factors Explanation of format description terms

Disclosure Fully disclosed. The original DFXML source code repository is now considered legacy and directs users to the official schema repository, version 1.2.0. The legacy repository is retained for historical reasons, housing legacy GitHub Issues and maintaining historical version control that wasn't transferred to the new repository. Additionally, DFXML has official Python and C++ codebases.
    Documentation
Adoption

The assumed maintainer of the specification is the National Institute of Standards and Technology. Comments welcome.

Widely adopted. A non-exhaustive list includes:

  • File Extractors: DFXML is integral to file extractors, documenting file locations in tools such as fiwalk (part of SleuthKit), EnCase EnScript for NTFS, and XBox 360 storage parsers.
  • File Carvers: DFXML is used in file carvers like frag_find, PhotoRec, and StegCarver from the Department of Defense Cyber Crime Center DC3Tool.
  • Hash Calculators: Various hash calculators, including md5deep, sha1deep, and hashdeep, utilize DFXML.
  • Other Forensic Tools: DFXML is incorporated into forensic processing tools like regxml_extractor, tcpflow, and bulk_extractor.
  • NIST Usage of DFXML: NIST uses DFXML to distribute information and internally for research projects.
  • Digital Preservation: Digital preservation software Archivematica and data recovery tool IsoBuster use DFXML.
  • BitCurator, fiwalk & SleuthKit Integration: DFXML is chosen for BitCurator projects due to its use in BitCurator, where it generates technical metadata. BitCurator employs fiwalk, developed by Garfinkel, which produces DFXML. Fiwalk, initially independent, was integrated into SleuthKit around 2012.
  • PhotoRec and md5deep: DFXML is used in the "PhotoRec carver" and md5deep, allowing their matching XML output to be used together and compared.
  • idifference.py: A Python program comparing two DFXML files and reporting differences on fileobjects.
  • imicrosoft_redact.py: A tool in the DFXML distribution that modifies executables in a disk image, making it non-workable for virtual machines. It records cryptographic hashes before and after modification.
    Licensing and patents According to the specification, DFXML was “developed by the National Institute of Standards and Technology by employees of the Federal Government in the course of their official duties”. DFXML is hosted by the National Institute of Standards and Technology (NIST) at the National Software Reference Library (NSRL).
Transparency DFXML is open and text-based, and thus can be read using basic text editors. However, deployment of DFXML requires the use of complex tools.
Self-documentation

DFXML is identified through the XML namespace: <dfxml>. For example:

<dfxml xmlns="http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dfxmlext="http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML#extensions" version="1.2.0">

External dependencies For use with XML Schema Definition (XSD), the specification states: If you intend to use this file as a DFXML document validator, note that you will also need to download two accompanying .xsd files under the "ref" directory. The easiest way to do this is by downloading the repository as a Git clone, or by downloading the zip archive from the GitHub page.
Technical protection considerations None.

Quality and functionality factors Explanation of format description terms


File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension dfxml
xml
DFXML can be embedded into other XML-based formats or as a standalone document. Documents may informally have the file extension of .XML or .DFXML, such as this sample file system report.
Internet Media Type See related format.  See XML.
Magic numbers See related format.  See XML.
Indicator for profile, level, version, etc. See note.  Version for the schema that generated the XML is required. See line 68 of the schema.
XML DOCTYPE declaration See note. 

"dfxml" is the XML namespace.

At minimum the required declaration is:

<dfxml xmlns="http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dfxmlext="http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML#extensions" version="1.2.0">

Version is required, notably this part:

xmlns:dfxml="http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML"

Full schema from the XSD/standard:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dfxml="http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML" targetNamespace="http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML" elementFormDefault="qualified">

Example:

<dfxml xmlns="http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dfxmlext="http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML#extensions" version="1.2.0">

Pronom PUID See note.  PRONOM has no corresponding entry as of January 2023.
Wikidata Title ID Q105855984
Digital Forensics XML file format. See https://www.wikidata.org/wiki/Q105855984.
Wikidata Title ID Q16956577
In addition to the Wikidata for the DFXML file format, there's also a Wikidata for the Digital Forensics XML language. See https://www.wikidata.org/wiki/Q16956577

Notes Explanation of format description terms

General

DFXML Structure

The DFXML structure consists of metadata, creator information, runtime configuration, volume details, and file objects. The XML format serves to encapsulate Dublin Core metadata, runtime CPU usage information, and specifics about volumes and individual files. Example, from Garfinkel’s “DFXML and Other Standards”:

<dfxml> <metadata> Dublin Core Metadata </metadata> <creator> The program that made this DFXML </creator> <configuration> Runtime Configuration </configuration> <volume> Information about Volumes </volume> <fileobjects> <fileobject> Information about a file </fileobject> </fileobjects> <rusage> Runtime CPU usage information </rusage> </dfxml>

Tools for DFXML Validation

The command line tool xmllint, commonly used for parsing XML files, is employed for validating DFXML against its XML Schema. Notably, xmllint can validate both DFXML and another schema called RegXML. The latter, similar to DFXML, has official documentation and a schema available on GitHub. According to Forensics Wiki, RegXML is an XML syntax analogous to DFXML. While it uses parts of DFXML in its schema, its official documentation and schema are independently available on GitHub.

History

Developed by Simson L. Garfinkel, DFXML has been employed in forensic data description since 2006. The original DFXML paper by Garfinkel in 2009, “Navigating Unmountable Media with the Digital Forensics XML File System”, introduced Fiwalk as an extension to The SleuthKit. Fiwalk utilizes The SleuthKit's internal bindings for direct storage parsing and reports the results in XML format.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 01/19/2024