Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Advanced Forensic Framework Disk Image, AFF Version 4 (AFF4)

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Advanced Forensic Framework 4, AFF 4
Description

Termed an object-oriented "framework" by its creators, AFF_4 is an abstract information model that permits disk-image data to be stored in one or more places while the information about the data is stored elsewhere. Users of the framework typically gather data as evidence in legal or investigative activities, hence the use of the term forensic. Unlike predecessor disk image formats, AFF_4 is not a "file format" and the specification does not establish a single wrapper that encapsulates the data.

Image data is segmented as chunks, collected into what is called a bevy (see Notes). The chunks may be compressed with zlib. Meanwhile, storage can be carried out using regular HTTP, as well as imaging directly to a central HTTP server using Web Distributed Authoring and Versioning (WebDAV), an extension of HTTP. The overall data set is stored in evidence volumes that may be implemented as directory-based volumes (more or less, directories in a filesystem) or zip-64-based volumes (based on the ZIP64 extension of the ZIP_PK format). The corpus of data to be managed may be distributed and of great extent: the Forensic Wiki states that "AFF4 is designed to be scalable to huge evidence corpuses [and thus] the AFF4 universe is infinite."

Objects in the AFF_4 framework are addressable using RDF instances that minimally carry a unique URN for the object, defined in the aff4 namespace, and that may also carry information about object attributes.

The developers state that AFF_4 surpasses its predecessors in terms of how the data it contains can be examined for forensic purposes. For example, the element referred to as a map stream connects the values for byte offsets in the image to offsets in a file that has been imaged and, in turn, to the target object. This mapping supports such actions as "carving" (searching for files or other kinds of objects based on content rather than metadata) without the need to store the multiple (and often extensive) copies of data chunks as required when carving other disk-image formats.

Signifiers like filename extensions, media types, or magic numbers may exist for AFF_4's component elements but not at the level of this information model.

Production phase Typically used for data analysis and not part of a process to create new content. May be used to archive data.
Relationship to other formats
    May contain AFF4 Volume ZipFile, not described at this Web site.
    May contain Compression via the zlib implementation of the DEFLATE algorithm, not described at this Web site.
    May contain ZIP_PK, ZIP File Format (PKWARE). Use of ZIP_PK includes the ZIP64 extension.
    Other AFF_1_0, Advanced Forensic Format Disk Image, AFF Version 1.0. AFF_4 was preceded by this format, which has a significantly different structure.

Local use Explanation of format description terms

LC experience or existing holdings  
LC preference  

Sustainability factors Explanation of format description terms

Disclosure Open source format developed by Michael Cohen, Simson Garfinkel, and Bradley Shatz.
    Documentation The most thorough description of the format can be found at the aff4.org Web site.
Adoption Two software libraries for zip-based implementations have been identified: Python Advanced Forensic Format Version 4 library and Rekall PMEM. Adoption expected to be by law enforcement and legal investigators rather than by memory institution archives; Comments welcome.
    Licensing and patents None identified.
Transparency Transparent wrapper; content within wrapper may require algorithms and tools to read, and/or require sophistication to build tools.
Self-documentation AFF4 self-documentation pertains to the structure of the data, e.g., compression type, size, and storage location. Segments of disk-imaged content are called AFF4 Objects, and these have unique names in the form of URNs, often based on a GUID. AFF4 uses RDF to model statements about objects as the tuple of subject, predicate, and value. In the ZipFile implementation, the RDF is in the ZIP archive in a file called information.turtle that uses a particular syntax for the RDF.
External dependencies None identified.
Technical protection considerations

The Forensics Wiki states that AFF4 supports cryptography and image signing. The implications are not clear to the compiler of this description; Comments welcome.


Quality and functionality factors Explanation of format description terms


File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Pronom PUID See notes
Pronom's PUID fmt/844 covers AFF but does not include version information
Wikidata Title ID Q27473543
See https://www.wikidata.org/wiki/Q27473543.

Notes Explanation of format description terms

General

The number "4" assigned to the format's name is not a version indicator in the conventional sense. AFF_1_0 (Advanced Forensic Format; the second F stands for format), AFF_4's predecessor, was associated with three versions of a software tool, the most recent of which is AFFLIBv3. The software version numbers seem to have been used to refer to instances of the format by those who implemented it. Thus it appears that the "4" in Advanced Forensic Framework (the second F stands for framework) was intended to eliminate any chance of confusion with the predecessor.

Information from the AFF4 Web site explains chunks, bevys (bevies?), and bevy indexes: "The image data is divided in chunks. Each chunk is compressed separately. A number of chunks are collected into a single segment termed a bevy. Each bevy also has another segment called the bevy index which contains the start offset of each compressed chunk inside the bevy. The bevy URN is derived from the AFF4 Image URN by appending an 8 digit decimal, zero padded bevy id (an incrementing integer started from 0). Each bevy’s index has a URN which is created by appending “/index” to the bevy’s URN."

The creators of the format use the term resolver to name the component that stores the RDF metadata and manages the content in the AFF4 information model. In this context, the term is implementation-dependent and does not refer to a global resolution system; see for example the article A scalable file based data store for forensic analysis.

A 2010 article in Digital Investigation 7 highlights the virtues of reducing the need for data storage as forensic analyses are carried out, as well as highlighting the hash-based compression scheme that the authors used as an extension to the AFF4 format: "As disk capacity growth continues to outpace storage IO bandwidth, the demands placed on storage and time are ever increasing. Data reduction and de-duplication technologies are now commonplace in the Enterprise space, and are potentially applicable to forensic acquisition. Using the new AFF4 forensic file format we employ a hash based compression scheme to leverage an existing corpus of images, reducing both acquisition time and storage requirements."

History

The development of AFF_4, completed in 2009, was the work of Michael Cohen, Simson Garfinkel, and Bradley Schatz, and originally designed and published in “Extending the advanced forensic format to accommodate multiple data sources, logical evidence, arbitrary information and forensic workflow,” Digital Investigation 6 (2009) S57–S68. This paper was released with an early implementation written in python; the later version available at aff4.org is an open source re-implementation for a general purpose AFF4 library.

The development context is outlined in a 2010 article in Digital Investigation 7 titled "Hash based disk imaging using AFF4": "Traditional imaging technologies consist of making bit for bit copies of all data stored on the acquired media (so called raw or 'dd' images). Second generation imaging techniques improved space efficiency by introducing block based compression to the data stream . . . . Although space requirements for image storage was reduced, this came at the cost of increased acquisition time. The advanced forensics file format (AFF4) is a third generation forensic file format integrating multiple image streams, the expression of arbitrary information and storage virtualisation into the forensic file format itself."


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 07/27/2017