Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

MAT-File Level 5 File Format

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name MAT-File Level 5 File Format (v5, v6, v7)
Description

A Level 5 MAT-File is an openly documented, but proprietary, binary data container format used by MATLAB software from MathWorks. MATLAB is a fourth-generation programming language that allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages. It was first released in 1984; a version of MAT-file has been documented since at least 1999. The naming of the MAT-File format versions is somewhat confusing. One distinction is by "Level." Level 4 and Level 5 files have completely different technical structures. While continuing to use the same general structure, new format versions have been introduced to support new capabilities. For example, data compression and Unicode support were added in v7, but using the same Level 5 format structure. When using MATLAB software, the user's choice is by format version/type, not by level. As of December 2015, the latest version is v7.3. See MAT-File Versions for the explanation of different MAT-file format versions from MATLAB documentation. This page says, "Version 7.3 MAT-files use an HDF5 based format." Experimentation with MATLAB 2015a (see Notes below) and a Q&A in a user support forum indicate that headers for versions 5-7 as produced by MATLAB display "MATLAB 5.0 MAT-file" whereas version 7.3 files show "MATLAB 7.3 MAT-file". No public specification for version 7.3 MAT-files has been found by the compilers of this resource; the MAT-File format specification for Levels 4 and 5 makes no mention of HDF5. Comments welcome. See Useful references below for more information about compatibility of 7.3 MAT-files with HDF5 and comparison with earlier versions.

Level 5 MAT-Files include support for multidimensional numeric arrays, character arrays, cell arrays, sparse arrays, objects, and structures. In MATLAB, MAT-files can be created using the save function, which writes the arrays currently in memory to a file as a continuous byte stream. MATLAB is a high-level language and interactive environment that enables you to perform computationally intensive tasks faster than with traditional programming languages. MATLAB code is stored in separate files, typically with a '.m' extension. See What is the difference between .m and .mat files in MATLAB?

Level 5 MAT-files are made up of a 128-byte header followed by one or more data elements. This use of the term 'element' should not be confused with the way the term is used in XML-based format specifications. Each array is a data element, as are some other MATLAB constructs. Each data element is composed of an 8-byte tag followed by the data in the element. The tag specifies the number of bytes in the data element and how these bytes should be interpreted; that is, should the bytes be read as 16-bit values, 32-bit values, floating-point values or some other data type.

Relationship to other formats
    Has earlier version

Level 4 MAT-File, not described separately in this resource. Supports only two-dimensional matrices and character strings. No longer offered as a Save option in recent MATLAB software releases.

    Has later version MAT-File 7.3, not described separately in this resource. Based on HDF5, but with undocumented special conventions. See Useful references below.

Local use Explanation of format description terms

LC experience or existing holdings Library of Congress staff performing image analysis associated with quality control for scanning make use of MATLAB software.
LC preference  

Sustainability factors Explanation of format description terms

Disclosure Developed by MathWorks as a proprietary format for use with MATLAB software. Level 4 and later Level 5 variants have been openly documented since 1999.
    Documentation MATLAB® MAT-File format, from MathWorks. Has specification for Level 4 and Level 5 variants.
Adoption

MATLAB API for Other Languages, from MathWorks, lists APIs that support interaction with the MATLAB application, MAT-Files, and MATLAB data types. The MAT-File API (also called the MAT-File Interface library) is available for Fortran and for C/C++.

The following software libraries or applications for numerical computation can read or write Level 5 MAT-files: Mathematica, see Wolfram Language; Maple; Lumerical, see mat files; MATLAB MAT-file Viewer; GNU Octave, an open source package, see Simple File I/O; Math.NET Numerics, part of an open-source initiative.

Application areas in which MATLAB is widely used include: chemical engineering, bio-engineering, signal processing (including for images), medical image analysis, quantitative finance, pattern recognition. However, sharing and interchange of data in these fields is much less common than the sharing of MATLAB program code.

Generic data archives do not tend to list MAT-File as a supported format. The ETH-Bibliothek Digital Curation unit (at a university in Switzerland) mentions *.mat files in its File Formats for Archiving recommendations, encouraging conversion to HDF5 or storing as MAT-File 7.3 for HDF5 compatibility before submission to the ETH Data Archive. However, see Useful references below for references that question the compatibility.

A few research communities use MAT-Files to store and share data. For example, the Max Planck Institute for Psycholinguistics hosts an archive of language-related data that accepts MATLAB files for Neurobiology of Language data; see Accepted file types and formats.

    Licensing and patents

There are patents associated with the MATLAB software, but MathWorks encourages use of files in the MAT-File format through the MAT-File interface library it supplies, and states, "However, if you need to read or write MAT-files on a system that does not support the MAT-file interface, you must write your own read and write routines."

Transparency The MAT-File format is not transparent. In addition to storing data in binary form, the array data elements that hold the actual data are often compressed to save space. See Notes on Performance vs. Transparency for discussion of the disadvantages of XML for data used with MATLAB for image analysis.
Self-documentation A MAT-File is self-describing from a technical perspective, allowing an application that understands the format to interpret the structure and contents without supplementary information. The array data elements that hold the actual data values (numeric or character) are named. MAT-Files offer no support for embedding descriptive metadata or describing semantics and provenance for the variables and arrays in the file.
External dependencies No dependencies beyond software that can read MAT-Files.
Technical protection considerations None

Quality and functionality factors Explanation of format description terms

Dataset
Normal functionality

The MAT-file format supports many data types including signed and unsigned, 8-bit, 16-bit, 32-bit, and 64-bit data types, a special data type that represents MATLAB arrays, Unicode-encoded character data, and data stored in compressed format. Floating point numbers in IEEE 754 single- and double-precision are also supported.

Support for software interfaces (APIs, etc.)

MathWorks supplies a number of APIs for various programming languages. The MATFile (C and Fortran) interface library provides routines for reading and writing MAT-files. In addition, MATLAB API for Other Languages lists a number of APIs that support interaction with the MATLAB application and MATLAB data types.

Data documentation (quality, provenance, etc.) No support for embedding descriptive or administrative metadata.
Beyond normal functionality Support for complex numbers, sparse matrices, and nested data structures.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension mat
From the Level 5 MAT-File specification.
Internet Media Type application/x-matlab-data
application/matlab-mat
From Apache TIKA.
Magic numbers Hex: 4D 41 54 4C 41 42 20 35 2E 30 20 4D 41 54 2D 66 69 6C 65 2C 20 50 6C 61 74 66 6F 72 6D 3A 20
ASCII: MATLAB 5.0 MAT-file, Platform:
From the Level 5 MAT-File specification.

Notes Explanation of format description terms

General

Headers for MAT-files: Although the bulk of the file is in binary form, the header of a Level 5 MAT-File can be read with a text editor. This is also true of a version 7.3 file. Headers from a file saved as v6, v7, and v7.3 in MATLAB (2015a) are shown below preceded by the version in []. Note that the headers for v6 and v7 files have the same "5.0" identification. Both are Level 5 MAT-Files.

  • [v6] MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Thu Dec 17 15:48:57 2015 IM
  • [v7] MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Thu Dec 17 15:52:10 2015 IM
  • [v7.3] MATLAB 7.3 MAT-file, Platform: PCWIN64, Created on: Thu Dec 17 15:48:12 2015 HDF5 schema 1.00 . IM

Level 4 MAT-files have no magic number. They have a 20-byte fixed-length header, beginning with a 4-digit integer that indicates how the data is stored.

Byte order in MAT-FilesThe Level 5 specification describes MAT-files from a big-endian perspective. The associated figures also reflect MAT-files written by a big-endian system. In MAT-files written by a little-endian system, the order of bytes within each instance of a MAT-file data type is reversed. The file header has an "endian indicator" as the last two bytes of the 128-byte header. The indicator comprises the two characters, M and I, written to the MAT-file in this order, as a 16-bit value. If, when read from the MAT-file as a 16-bit value, the characters appear in reversed order (IM rather than MI), the program reading the MAT-file must perform byte-swapping to interpret the data in the MAT-file correctly.

Performance vs. transparency: Although widely recommended for reasons of transparency, XML-based formats are not used in practice for sharing large-scale scientific data in active communities. Scientists and researchers need a reliable format for exchanging large datasets for use in computational environments for scientific data analysis. Important factors for choosing tools are ease of use, platform independence, device-independent plotting, graphical user interface. Important factors for choosing formats include performance for loading and analysis. In their 2011 paper, The Impact of the Data Archiving File Format on Scientific Computing and Performance of Image Processing Algorithms in MATLAB Using Large HDF5 and XML Multimodal and Hyperspectral Data Sets, Kelly Bennett and James Robertson found that the HDF5 format (a binary format with many characteristics in common with a Level 5 MAT-File) provided faster load and process times than XML formats. In most cases, the XML file is between 2.5 and 3 three times as large as the comparable HDF5 file. Using a benchmark set of analyses, they found the execution times for the HDF5 files were significantly less than for the XML files onboth Linux and Windows. For their data, the mean time taken for the XML files was around 60% more than for HDF5, with the main contribution to this difference being the large load time and the preprocessing step required to convert an ASCII XML character string to a numeric array in MATLAB.

History  

Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: Monday, 27-Feb-2017 09:55:20 EST