Towards Best Practices in Disk Imaging: A Cross-Institutional Approach

Eddy Colloton; Jonathan Farbowitz; Flaminia Fortunato; Caroline Gil
Electronic Media Review, Volume Six: 2019-2020

ABSTRACT

The growing prevalence of computer and software-based art in contemporary museum collections has been met with serious discussion and research. Through various forums, symposia, and peer networks, museum professionals are collaborating to address the unique challenges in caring for these types of artworks.

Within this context, media conservators have sought tools and techniques to deal with the urgent need to back up data from aging computers and hard drives in museum collections. One practice that is emerging among conservators, drawing from digital forensics and widely adopted by libraries and archives, is disk imaging. A disk image, a bit-for-bit copy of a digital storage device, is a powerful tool for encapsulating both the artwork and its software environment for preservation or documentation. However, the vast array of formats, tools, and procedures used in disk imaging and practiced in various disciplines for different purposes, often complicate finding appropriate procedures and workflows that suit museum collections.

This article presents the findings of a yearlong cross-institutional collaborative examination of disk imaging between the Solomon R. Guggenheim Museum in New York, The Museum of Modern Art (MoMA) in New York, and the Hirshhorn Museum and Sculpture Garden in Washington, DC. The authors will jointly examine questions related to creating, condition checking, accessing, and storing disk images while addressing key issues, including:

1. Differences between disk image formats and tools used for creating such disk images, their respective advantages and disadvantages, and their suitability for long-term preservation;

2. The development of practices and guidelines for condition checking, quality control, and troubleshooting of disk images after their creation, and;

3. The difficulties of using a disk image to run a software-based artwork independent of the original hardware while ensuring a faithful representation of the work and its work-defining properties.

Recognizing that the creation of a disk image is just one step at the beginning of an artwork’s preservation life cycle, the authors engage in a frank and open discussion about their successes and failures with creating and managing disk images. By sharing their findings, the authors seek to demystify disk imaging for the purposes of long-term preservation and display within an art museum, focusing on the tools used for creating disk images and accessing them in the future.

INTRODUCTION

This article presents the outcome of a collective research endeavor undertaken by Eddy Colloton at the Hirshhorn Museum and Sculpture Garden in Washington, DC, Jonathan Farbowitz at the Solomon R. Guggenheim Museum in New York, and Flaminia Fortunato and Caroline Gil at the Museum of Modern Art in New York. The four authors jointly examine questions related to creating, condition checking, accessing, and storing disk images within the context of an art museum.

Museums are now, more often than ever, acquiring software-based artworks. As a result, media conservators have become custodians of this vulnerable medium. One strategy to address aging and vulnerable computer hardware and storage media is the practice of disk imaging, which produces a digital file that is a bit-for-bit copy of the data on a physical storage device. This disk image file can contain the content and structure of a data storage device, such as a computer hard drive, external hard disk drive, solid-state drive, optical disc, floppy disk, memory card, or data tape.

Generally, the terms software-based artwork or computer-based artwork denote an artwork in which the media is “formal instruction code” (Cramer 2002), which may come to a museum in the form of an executable file, a compiled application, source code files or scripts, or as a set of instructions intended to perform a specific task.  The terms software-based artwork and computer-based artwork are often used interchangeably. This article will favor the term software-based art to denote “works of art created by artists who write computer programs to render their work” (Engel and Wharton 2014, 404). Software-based works are often the result of a collaboration between software developers, artists, and other technicians. To complicate matters, software-based artworks may rely on proprietary technologies such as virtual reality, live simulation, custom-made electronics, and/or an Internet connection that enable their work-defining properties. These technologies are often delivered on a computer, frequently built to customized specifications so that the artwork functions as the artist intends.

Regardless of whether the laptop, desktop, or single-board computers used to run an artwork were artist provided or sourced by museum staff to exhibit software-based works, creating disk images of them is an important preservation step. Collecting, documenting, and creating a disk image of a software-based artwork’s original technical environment (sometimes referred to as the “native software environment”) as soon as possible is essential for anticipating and planning for its preservation. Creating a disk image of a computer, for example, serves as a way to encapsulate what was acquired by the museum at a given time, with the added benefit that a user cannot inadvertently alter the data or metadata contained within the image. Keeping the disk image as an encapsulated backup of a computer allows for maximum flexibility in any future preservation or conservation activities. Disk images also allow for the possibility of emulation or virtualization. Emulators and virtual machines (VMs) are software that allow a user to run operating systems and software independent of the original hardware. In practice, this enables conservators to run software-based artworks whose underlying software may have become obsolete separate from their original hardware as a strategy for study or exhibition.  

It is worth emphasizing that disk imaging is only one of many activities in the software-based artwork acquisition process. Other widely adopted practices, such as collecting and documenting source code, fall outside the scope of this article. 

The seeds of this article stem from concerns and conversations that arose during a Peer Forum on Disk Imaging hosted by MoMA in December 2017. During that forum, two documents—a Collaborative Resources Document and Policy and Procedures—were written by the attendees. These documents list helpful links and general procedural guidelines for conservation staff getting started in disk imaging. The desire for an agreed-upon general practice and guidelines on preimaging, disk image acquisition, postimaging, and exhibition of software-based artworks has directed this research. The authors of this article have employed diverse research methods, including literature research, compiling the troubleshooting process by keeping a collaboratively created journal titled the “Captain’s Log,” interviewing experts in the field, and holding biweekly calls over the course of a year to compare disk imaging workflows currently in use at each institution and, through discussion, to move closer to a set of best practices that could be established at their places of work.

PREIMAGING PHASE

The preimaging phase consists of the following steps. The information generated during these steps can be collected in a Disk Imaging Report as conservation documentation:

  • Documenting physical components
  • Obtaining computer configuration information
  • Considerations before powering on hardware
  • Removing the hard drive (if applicable)
  • Protecting media from electrostatic discharge
  • Documenting the disk

Documenting Physical Components

Documenting artist-provided computers can guide conservators in clarifying the work-defining properties of each artwork and capturing its anatomy, function, characteristics, and the artist-intended behaviors (Phillips 2015). As part of all three museums’ workflows prior to disk imaging, photo documentation is taken of the computer and significant internal components, including artist signatures, inscriptions, or edition numbers on the equipment. Special attention is given to the hard drive; motherboard (also sometimes called the logic board); graphics processing unit (GPU); processor (CPU); and any cables, peripherals, and connections that are custom built or appear to be modified by the artist. Photographs of labels, serial numbers, model numbers, or artist or technician signatures or inscriptions should be coupled with written documents that capture all of this information. Moreover, documentation of all input/output ports on the computer (such as USB, ethernet, serial, parallel, and SCSI) is carried out. This information becomes important later if a disk image of the computer is used within a VM or emulator. Within a VM or emulator, ports may need to be virtualized or emulated to get the artwork to run properly.

At this stage, the source media (e.g., a computer, floppy disk, or optical disc) should be catalogued in the museum’s collection management system. This is important for tracking so that the future disk image has an explicit relationship to the original physical media. Before creation, the disk image file will also be catalogued. The name of the disk image file may include “Disk image of component X” to link it to the original source media. More information about cataloging will be provided later in the Intellectual Control section of this article.

In the case of all three institutions, after photographic documentation of a computer is taken, user manuals and spec sheets created by the manufacturer or by computer enthusiast communities are collected and stored in the conservation artwork file along with similar documentation about other technology required to realize the piece.

Obtaining Computer Configuration Information

While disk imaging a computer’s hard drive is a key step in preserving the data stored on the drive, the data can hold little value without understanding the original hardware and software environment housing that drive. Data stored on a particular hard drive can behave differently when accessed by different types of computers. This is of particular concern when migrating data from a relatively old computer to a contemporary machine. Thankfully, computers are often able to describe themselves. Most operating systems have the ability to generate a detailed list of hardware components and connected peripherals (Farbowitz 2018). This information is packaged into a file referred to as a configuration file. Configuration files are created by different tools, typically unique to a particular operating system. Three common tools for creating configuration files are MSINFO32.exe for Microsoft Windows, System Report for macOS, and hardinfo for Ubuntu Linux. These tools and their resulting files are summarized in Table 1.

Operating SystemProgram or CommandResulting File
Windows XP – Windows 10msinfo32.exe (run from Command Prompt).nfo file (XML-based)
macOS 10.0 or higherSystem Report.spx file (XML-based)
Ubuntu Linuxhardinfo (run from Terminal)HTML file
Table 1: Overview of Configuration Information Reports

Overview of Configuration Information Reports

The Microsoft Windows operating system creates such a list through an executable, MSINFO32.exe, which “displays a comprehensive view of hardware, system components, and software environment” (Kumar et al. 2011, 1676). Similarly, macOS has a built-in application stored in the Utilities directory, called System Information, that can display and export a report listing the hardware of the machine, down to the serial number of a particular component. The System Information report also includes a list of all of the software installed on a machine, which can be a helpful tool in identifying dependencies for a software-based artwork acquired on a specific device.

The System Information report is most easily read on a Mac computer, within the System Information application, with linked headings in a column on the left and details in a window on the right, but these reports can also be viewed on any machine as an XML file, as shown in figure 1.

Fig. 1. Mac OS system information report in the system information application (left) and as an XML file viewed in a text editor (right). Screenshot by Eddy Colloton.
Fig. 1. Mac OS system information report in the system information application (left) and as an XML file viewed in a text editor (right). Screenshot by Eddy Colloton.

For Ubuntu Linux systems, the command “hardinfo” can be run from the terminal. Running this command opens a graphical user interface (GUI) window that can be used to export system information as an HTML file.

Note that these tools generate this information; it is not stored on the drive in this format prior to running the application. Therefore, reports such as the ones created from System Information or MSINFO32 must be created before or after disk imaging. The creation of such a report will change the state of the hard drive at the bit level. This change may be deemed insignificant, depending on the artwork’s acceptable level of variability. If it has been deemed safe to power on the machine, then this minimal change of creating an automated report about the physical hardware in the computer is unlikely to be significant, and knowledge of the computer’s hardware can ease the process of removing the drive. The report can even alert the museum to significant aspects of the machine’s functionality, which could impact the treatment of the work. If it is undesirable to temporarily create a new file on an artist-provided computer, the report can be saved to an external hard drive or flash drive connected to the computer.

Because these system reports document every attached peripheral, they can provide valuable information about what must be connected to the computer to properly run the artwork. For this reason, if feasible, it is recommended to export the system report when an artwork is installed and running properly during an exhibition.

Understanding the processing power of a computer that the artwork runs on can help to establish requirements for future exhibition devices. “The specifications of a computer can have a direct impact on the aesthetic properties and behavior of an artwork. For example, the clock speed of the Central Processing Unit (CPU) can determine the speed at which an artwork runs” (Farbowitz 2018). In addition, to run artworks properly in a VM or emulator, it may be necessary to adjust the emulator settings to match the specifications of the original computer.

Considerations Before Powering on Hardware

Obtaining a configuration information file may be completed before or after imaging. However, there are several factors to consider when deciding whether to power on the computer or obtain configuration information before acquiring a disk image from the hard drive. If the condition of the computer has not been assessed recently, the conservator may want to document its condition before imaging, which could include starting it up and exporting a configuration information file. Among the authors, there were differing opinions about whether to power on a computer before imaging. The decision may depend on the condition, status, and configuration of the computer in question. The authors recommend that conservators proceed case by case, keeping the following concerns in mind:

  • If a conservator wants to export a configuration information file from a computer, the computer needs to be powered on. As mentioned earlier, this procedure will alter the computer’s state and exporting configuration information may change the data and metadata of certain system files. Turning the computer on may alter its state and may change the data or metadata of certain system files (typically unrelated to the artwork). If the exact state of the computer is important to maintain (e.g., if the artist’s software records how long the work has been running), a disk image should be created before starting up the computer. 
  • Acquiring a disk image can be an invasive procedure, especially if the hard drive is removed from the case; powering on the computer can provide valuable baseline information about the condition of the machine before the case is opened—most importantly, whether the computer and/or the hard drive are currently functioning properly. The conservator can then document this initial condition through images, video recording, or written reports before any further action is taken.
  • For an older hard drive, especially one whose condition is questionable, either powering on the computer or creating a disk image requires the drive to spin up properly. However, a conservator may prefer to attempt making a backup copy of the drive by disk imaging first before taxing the hard drive further by starting up the computer.

Removing the Hard Drive

The following section focuses on creating disk images of the internal hard drives of computers, which, typically, must be removed from a computer’s case for disk imaging. It is often straightforward to open the computer’s case and physically extract the hard drive.However, certain computer models make this task more challenging than others. Particularly with Apple Macintosh computers, disassembling the computer’s case and removing the hard drive may be extremely difficult and time-consuming. In addition, a museum may not have the proper write-blocking equipment on hand to connect with some of Apple’s proprietary interfaces for solid state drives (SSDs). Macs have a feature called Target Disk Mode that allows for forensically sound disk imaging without physically removing the hard drive from the computer’s case (Henry 2011).

Physical removal of the hard drive from a computer’s case incurs some risks—pin connections on the drive could be bent or broken, or delicate cable connectors could be damaged. While outside the computer’s casing, the hard drive could sustain physical damage through bumps, shocks, or electrostatic discharge (ESD). The drive could also be reinstalled in the wrong configuration. Therefore, documentation of the drive’s original position and cable connections and correct handling are important.

Protecting Media from Electrostatic Discharge

When removing the hard drive from a computer’s case to create a disk image, a conservator may need to unplug or remove sensitive electronic components to get to the drive and successfully remove it. Electronic components within a computer (including the hard drive) are vulnerable to ESD. ESD occurs when static electricity suddenly flows from one charged object to another. This can occur, for example, when a person charged with static electricity touches exposed components, such as a hard drive or a RAM chip. ESD exposure may permanently damage the component.

Several measures can be taken to minimize the possibility of ESD damage. An ESD wrist strap is a common tool used for mitigating the risk of ESD damage. The wearer should attach the cable connected to the strap to something grounded, such as a large piece of metal touching the floor, an ESD mat, or the metal frame of the computer’s case (Lowe 2017). Some ESD wrist straps are designed to plug into wall outlets, but these are not recommended for safety reasons. ESD straps are electrically conductive, and if there is a ground fault in the electrical wiring of the building, the wearer could be electrocuted (Lowe 2017).  The authors recommend wearing an ESD wrist strap when opening computers or handling computer components and that any equipment used to open or work on computers—such as screwdrivers, tweezers, and spudgers—be identified by the manufacturer as ESD safe.

When stored outside of the computer’s case, hard drives can be placed on ESD-safe mats and stored in ESD-safe storage bags. These mats and bags contain conductive material that prevents static electricity from building up around the electronic component. However, according to ESD bag manufacturer SCS, “bags do not have a set lifespan” but rather “should be continually replenished.” ESD bags should not be exposed to excessive sunlight, moisture, or heat and any bags that are scratched or torn should be replaced (Digi-Key Electronics 2018). Given the uncertainty about their long-term protection against ESD, more research and study is warranted to determine whether these bags are suitable for long-term storage of hard drives, disks, and other computer components.

Documenting the Hard Drive

In addition to documenting the physical components as described earlier, the disk itself can be described prior to imaging. Information about partitions and file systems on the drive or disk can be obtained by using the “disktype” command (installed by default in BitCurator and available for free download for Linux or macOS) before imaging. The terminal output of the disktype command can be saved, along with other disk imaging metadata. The name of each partition can be recorded in conservation documentation. The same information can be obtained for Windows with FTK Imager, which visually displays the partitions and file systems.

Some computers used in software-based artworks have multiple hard drives within the same computer. In order to create an accurate representation of the computer system, all of these drives must be imaged. In addition, documenting both where the hard drives were connected to the computer’s motherboard and the drive’s jumper pins creates an accurate representation of how the computer was configured. Ports are typically labelled on the motherboard with designations such as IDE01 and IDE02 or SATA01 and SATA02. Information about the number and position of drives can be recorded in conservation documentation.

For PATA/IDE drives, the positioning of the jumper pins on the drive or where the drive was plugged in on the cable run will affect whether it is considered the primary or secondary drive. Typically the primary drive is the one that the computer boots from first and the primary drive controls the secondary drive.1 Within the BIOS settings, which determine how the computer boots, some computers allow the “cable select” option. With “cable select,” the position of a drive in the cable run affects whether it will be considered primary or secondary (Hasting n.d.) when the machine boots.

SATA drives do not use jumper pins or a primary/secondary setup, but the port where the drive is connected will affect the boot order of the drives. Therefore, in a computer with multiple drives, taking a screen capture of the BIOS settings (along with images of where the drive was physically connected) will be helpful in reconstructing how the machine was meant to boot.

DISK IMAGING

Disk imaging, or disk image acquisition, is the action of creating a disk image. While technology is always evolving, at the time of this writing similar workflows for the creation of disk images at cultural heritage institutions are beginning to become apparent. Certain tools, both hardware and software, often borrowed from the digital forensics community, are widely used by stewards of born-digital collections, and communities of these user groups are slowly coalescing.

In the next sections, we will go over the following:

  • Write blockers
  • Disk image file formats
  • Disk imaging software
  • Ethical considerations for disk imaging tools

Write Blockers

Write blockers are important tools when creating disk images from artwork-related hard drives and flash drives. A write blocker is essentially what it sounds like: a tool that prevents writing to a drive. Mounting a drive without one can inadvertently change the data stored on that drive. Especially in cases in which bit-level file integrity, or metadata such as “date modified” fields are determinative to a drive’s value, the effects of such a change can be catastrophic. Regardless, ensuring that original information on a storage device is not changed, either through automated manipulation or human error, should be considered best practice in media conservation. The most common and recommended methodology for doing so is employing a hardware write blocker, also called a forensic bridge, when creating a disk image. “A hardware write blocker is a physical device that connects via USB or other standard interface and blocks all write activity from the imaging workstation to the digital media.” Hardware write blockers “are generally seen as more reliable” than software write blockers (McKinley 2014), and in the field of archival science, “best practice suggests using physical write-blockers as standard practice for transferring material from one storage media to another, in order to prevent changing the original media” (Dietrich and Adelstein 2015, 139).

The museums participating in this research project all use hardware write blockers when acquiring disk images. Write blockers are often connection-type specific, requiring either multiple blockers for different connection types (such as PATA/IDE, SATA, USB, and FireWire), or “all in one” units such as the FRED (Forensic Recovery of Evidence Device), which are often appealing to archives with a variety of born-digital volumes (Prael and Wickner 2015). As manufacturers introduce SSDs as standard equipment in new computers, conservators will also need the ability to write block when they create disk images from these drives. There are currently several different interfaces for SSDs and, at this writing, manufacturers such as Tableau require purchasing a SATA or PCIe write blocker along with a specific adapter for each SSD type, which unfortunately requires an additional investment.

Disk Image File Formats

Disk imaging is a commonly held practice that supports long-term preservation; conservators may encounter an already created disk image within their collections but may be confused as to how to work with the vast array of file formats and options for generating disk images. Disk imaging formats employed for the purposes of preservation can be separated into two basic types: raw images and forensic images. In general, forensic (mainly EWF, or Expert Witness Format) and raw are the two formats that are commonly used for long-term storage in cultural heritage institutions today (Farbowitz 2018). Raw images are simply sector-by-sector copies of data with no metadata or compression and do not require specialized software to interpret the format. Conversely, EWF is a proprietary, forensic file format that allows for embedded metadata, the option to employ lossless compression, and encryption. The authors of this article strongly discourage the use of EWF’s encryption, as it introduces a dependency on hardware and software, the application of guidelines that will become outdated, and other unnecessary risks that may render works inaccessible in the future. The EWF format’s ability to store embedded metadata enables preservation-friendly features such as cyclical redundancy checks (CRCs) every 32 KB, as well as higher-level fixity information in the form of checksums in every disk image or disk image segment. These automated data integrity checks act as safeguards to prevent data corruption, including potential loss during long-term storage.

Some institutions, such as the Guggenheim and the Hirshhorn, create and store both EWF and raw disk images for each artwork, while MoMA is working toward establishing a disk image file format policy. Though the MoMA media conservation department has habitually created EWF disk images, the museum also stores and preserves some raw disk images within their collection. Because the future of the EWF format is unclear, storing the raw image (along with the EWF) allows a museum to have a disk image that is not locked into a proprietary format. Transcoding an EWF to a raw image results in no loss in data. Both the Guggenheim and the Hirshhorn Museum create an EWF disk image and export a raw disk image from the EWF postimaging. Extensive use of the EWF format, particularly in the library and archives fields, and the development of libewf by Joachim Metz, which resulted in an open-source tool called libewf to work with EWF files, signals that those who choose to work with EWF files will have options in the future to continue to work with the format or migrate out of it if need be.

The relative ubiquity of EWF disk images provides some protection for the future, but it is worth emphasizing that these tools were not developed for media conservation purposes, nor can the conservation community control their development. A digital forensics investigator may not be required to keep evidence files in perpetuity, as would a museum. Thankfully, disk image formats can be converted from one to another with no loss of data.  This is particularly helpful when running a disk image in a VM or an emulator.

There are a range of raw and forensic acquisition tools to choose from. The imaging tool that media conservators choose to use may be dependent on their target disk image format, while their target format may be contingent on the media type to be imaged. In this way, the choice of a disk imaging tool will be context dependent. Similarly, a chosen target format may depend on institutional policies (available operating systems, hardware, or software), but the variety of tools available ought to leave media conservators with choices when creating policy.

A consideration for stewards of digital collections to take into account is the option of splitting images into smaller segment files. This is generally done because image files stored in a FAT32 file system have a maximum size limit of 2 GB per file (Goethals and AVPreserve 2016). Tableau Forensic Imager also has a maximum size of 2 GB for EWF disk image files, effectively forcing the user to split images. Though one disk image per disk is often preferable, multiple disk images should not pose much of a problem, because tools such as Sleuth Kit, libewf and FTK Imager are compatible with these types of images. The risk incurred by storing multiple segmented files can be reduced by bundling them into a .tar file or even zipping the sequence of segmented files. Whether segmenting disk images or not, disk images often contain a large amount of data. To combat unwieldy file sizes, some conservators may choose to apply lossless compression to their images. In this section, popular disk image file formats will be briefly summarized to aid in the selection of a target format, and the pros and cons of each format are outlined in Table 2.

Format and extensionTypeTools and UtilitiesMediaProsCons
EWF (.e01, or incrementing file extensions, i.e. .001, .002, .003 and so on)Forensic, proprietary format.Encase, FTK Imager (GUI and CLI), Guymager, libewf (ewfacquire). AFF, EnCase, FTK, SMART, Sleuth Kit, X-Ways can read this format.Floppy disks, Optical Media, External Hard Drives, Computers-EWF is compressible, and searchable. -Appends MD5 hash of image as a footer in the file. -Strong community of users and support. -It is possible to export raw image from an EWF image. -Uses cyclic redundancy check (CRC) for each block of data -Support for splitting files up to 2GBUses compression, which could prevent access if software to decompression isn’t available.   -Proprietary file format, somewhat closed, though documentation is widely available.
Raw Image format (file extensions can include: .dd, .raw, .01Rawdd (Unix/Linux utility), dc3dd, dcfldd, ddrescue, FTK Imager, ProDiscover, dd, ddrescueFloppy disks, Optical media, External Hard Drives, Computers -No additional wrapping or encoding, which may make format more sustainable for long-term preservation. -Uncompressed, no decompression is needed for a computer to read the data.  -Raw disk images contain no additional metadata and rely on other programs to identify any filesystem(s) contained within. -Lack of compression, which takes up more storage space. -No cyclic redundancy checks (CRCs) of data blocks during imaging -No support for embedding metadata in the image
AFFRaw, open formatGuymager, Kryoflux,  AFFLibFloppy disks, External Hard Drives, ComputersIt is notable for being the only open-source forensic format    AFF version1-3 has been deprecated. AFF 4 is in the works, is not production-ready.
.isoRaw, There is no comprehensive single specification for all of the variant formats called ISO image.FTK Imager, DVDisaster, Carbon Copy Cloner, IsoBuster, ImgBurn, various optical media ripping/burning softwareOptical media-file structure of internal optical disc file system (CDs, UDF and DVD) -broad compatibility with operating systems and software-No single specification for all variants of ISO images
Table 2: Disk Image Format Comparison Chart

Expert Witness Disk Image Format Family Ewf/E01

The Expert Witness Disk Image format—often referred to as E01, EWF, or the EnCase format—is a closed format defined by Guidance Software for use in their EnCase tool to store hard drive images and individual files. The EWF format is often called E01 because media data can be stored across multiple segment files; these files are designated with sequentially numbered file extensions, for example, .e01, .e02, e03, etc. These disk images contain the content and structure of a data storage device, disk volume, or computer’s physical memory (Library of Congress 2015). EWF files may be structured as a bitstream (a sequence of bits) or forensic-type image (a sector-by-sector copy of the source, including inactive data and fragments that reside in the unallocated sector of a disk and may include deleted files that have not yet been overwritten). In addition to the physical bitstream of the acquired disk, EWF disk images contain Adler-32 checksums for every 64 block sectors, MD5 checksums for the entire bitstream, and can be accompanied by descriptive metadata (typically entered as “Case Info”) in its header. The Case Info section includes date and time of acquisition, examiner’s name, notes, and the option of password protection, which the authors of this article recommend against. Password protection and the standardized names of metadata fields such as Case Info are remnants of the file format’s origins in the application of forensic science in the criminal investigation realm. However, the descriptive metadata fields can be adapted to the needs of conservators and, because the metadata is embedded in the file itself, it will always stay with the disk image. The Guggenheim imaging workflow specifies the information added to each of these metadata fields. The Hirshhorn and MoMA are still developing their practices for what information to include in each metadata field.

Raw Image Format

Raw format is simply a file that contains the exact data as it existed in the source media without any additions or deletions.  A raw disk image is an uncompressed sector-by-sector sequence captured from a physical or logical volume. The raw image format is an open format, free of any license restrictions. Raw disk images can have any arbitrary file extension. Common extensions for raw disk images are .raw, .dd, or a numerical sequence such as .01. The .dd file extension takes its name from a Unix command line application of the same name used to copy and convert data (files or raw device contents) with a specified input and output. Raw disk images produced by .ddcontain no additional metadata and will rely on other programs to identify any file system(s) contained. The Guggenheim prefers the extension .dd for raw disk images so that they are not confused with raw image files from digital cameras.

Aff (Advanced Forensics Format)

The Advanced Forensics Format is an extensible open format for the storage of disk images and related forensic metadata developed by Simson Garfinkel and Basis Technology. Extensible here means that any amount of metadata can be encoded in AFF files in the format of field name/field value pairs. AFF was designed as an alternative to proprietary disk image formats. The format allows for extensive metadata to be stored with the image in a single file and it consumes less disk space than other image formats, such as EWF (Garfinkel et al. 2006). Adoption of AFF and the corresponding library of software tools AFFLIB diminished following the creation of libewf and the growing adoption of Guidance Software’s EWF format (Library of Congress 2015). It should be noted that AFF Versions 1 to 3 are deprecated. A version 4 specification, with a significantly different structure, is in progress.

Iso Disk Image File Format

ISO disk images are a nonstandardized format used to package a variety of media. While the file extension takes its name from ISO 9660, a standardized optical disc file system, since the 1990s the term ISO image has also been applied to data structured following the UDF (Universal Disk Format) specification (UDF is employed for commonly used for computer data storage, DVDs, and Blu-rays) (Library of Congress 2012). Moreover, file extensions in and of themselves should not be taken as a valid form of file identification. In fact, a helpful workaround for operating systems that do not recognize the .001 file extension of a raw disk image (commonly used with forensic disk imaging software) is to simply change the extension to .iso. In this way, while associated with optical media, a disk image with an .iso file extension could be from any source.

Comparison of Disk Imaging Software

Key among the tools necessary to create a disk image is, of course, disk imaging software. There are a variety of applications available to perform this task, from data recovery to disk duplication tools, but when disk imaging a hard drive for the purposes of preservation, tools designed with digital forensics in mind are preferred.

Software tools for the acquisition and analysis of disk images are broadly adopted by the cultural heritage community. BitCurator, a suite of free and open-source Linux tools packaged and distributed as a VM (which can also be installed as a host operating system), was designed with museum, archives, and library professionals in mind. BitCurator includes the open-source program Guymager for creating disk images (BitCurator n.d.). FTK (Forensic ToolKit) Imager, a free disk imaging software program, is also broadly adopted in the libraries and archives field (Arroyo-Ramirez et al. 2018). There is a command line version of FTK Imager available for macOS; however, the Windows version, which includes a GUI, is much more popular. Despite being free, FTK is proprietary software. In addition, the command line tools “ddrescue” and “dd” (which can be used in Linux or macOS) are sometimes used within the cultural heritage community. While each of these applications have different features and interfaces, they all can perform the same key function of creating a bit-for-bit copy of a volume in the form of a raw disk image.

As a part of developing a disk imaging workflow at the Hirshhorn Museum in 2019, five disk imaging applications were reviewed: ddrescue (v.1.24), Guymager (v.0.8.8), FTK Imager for Mac (v.3.1.1 CLI), FTK Imager for Windows (v.3.2.4.6), and Tableau Forensic Imager (v.1.2.1). Broadly speaking, the Hirshhorn’s tests yielded similar results across the five applications. However, these tools are somewhat different from one another, thereby empowering media conservators with options when selecting their preferred disk imaging software. Among the choices are compression options and the creation of sidecar files.

Compression Options

Each of the digital forensic tools reviewed—FTK Imager, Tableau Forensic Imager, and Guymager—allow the user to create disk images in the EWF format. Each also offers levels of compression when creating EWF disk images. The compression options of the different software programs offer the user a choice between imaging speed and level of compression. This is a logical trade-off—the more compressed the data, the longer the compression takes. The FTK user manual offers a practical explanation of the compression levels, stating that level “1” compression is the “fastest, least compressed” option, and that “9” is the “smallest file, slowest to create.” The Tableau Forensic Imager software provides a relative definition, explaining that the software’s  “Maximum Speed” compression is the equivalent to FTK Imager’s “1” compression level, and that the “Minimum Size” compression is the equivalent to FTK’s “9.” Guymager offers the most information about its compression options, “Fast,” “Best,” or “Empty.” In the software’s configuration file, stored by default at “/etc/guymager/guymager.cfg,” “Empty” compression is said to do “no compression, except if a block contains zero bytes only. Such blocks are replaced by their compressed equivalent.” The other two options “Best” and “Fast,” are defined as using “Fast Z” and “Best Z” compression. The “Z” compression refers to the “zlib” abstraction of the DEFLATE compression algorithm (Wikipedia 2018).

Sidecar Files

Each of the five disk imaging applications reviewed as a part of the Hirshhorn’s tests produces a sidecar “info” file that describes the disk imaging process. All of these files, typically with file extensions such as .info or .txt, are automatically produced by the software and accompany the disk image output. All of the applications include the start and end time of the acquisition process, but that is where their similarities end. The ddrescue output is likely the most different from the other four reviewed, which makes sense given that ddrescue is not designed for digital forensics specifically. While not including the checksum hashes for the image, ddrescue does produce certain technical metadata that some of the other tools do not, such as the block size of the file system or specific errors encountered. The Guymager output is the most thorough, including the most fields, the most specific description of both the host machine and the target device, and listing in the info file the commands that were run to collect this information. Perhaps most significantly, the Guymager application includes the option to automate source verification by rehashing the source volume after imaging and includes both the checksum of the volume and the image in the output. The developer of the software included the feature after observing that “hard drives containing bad sectors may deliver different data each time a bad sector is read” (Voncken 2015).

The choice of which disk imaging tools to adopt and incorporate into a workflow involves weighing an array of variables, many of which will be specific to an institution. The ease of using a particular operating system can often be dictated by other departments; existing hardware is commonly inherited from predecessors; and, of course, budgetary considerations are always pertinent. The digital preservation actions of an institution can also be dictated by the museum’s digital asset management system, or storage costs.

Ethical Considerations for Disk Imaging Tools

Computer forensics (also known as digital forensics), with its emphasis on preserving and reconstructing digital evidence (Kirschenbaum, Ovendon, and Redwine 2010), has provided the field of media conservation with tools for dealing with the investigation and transfer of data while preserving its integrity. Forensic bridges, also known as write blockers, have become an essential part of the media conservation toolkit for digital media acquisition. Currently, MoMA, the Guggenheim, and the Hirshhorn are equipped with write blockers, which are employed for a variety of tasks, including disk imaging. Disk imaging software preferred by digital preservation practitioners are often created for, and primarily supported by, the field of digital forensics.

As the prevalence of these tools continues to increase in libraries, archives, and museums, questions and concerns arise about whether cultural institutions ought to do business with companies that primarily exist to develop and sell equipment for law enforcement agencies (Arroyo-Ramírez et al. 2018).

While all of the authors of this article use products created for the digital forensics community, the group shares the feelings of colleagues in the library and archiving fields who have expressed concern over this investment. “The sobering fact that our purchase would indirectly help support the tools of the criminal justice system, and by extension, the prison industrial complex” should not be taken lightly (Arroyo-Ramírez et al. 2018, 9). While these expenditures do not amount to an explicit endorsement, there is no denying that our institutions are financially supporting companies that are dependent on, and invested in, a grossly unethical criminal justice system.

Another ethical consideration related to disk imaging concerns the contents of a disk image itself. Because a disk image is a full representation of the physical storage media, the image contains all files in the operating system as well as files that artists believe they had previously deleted. In the course of creating disk images, all authors of this article have noticed files within disk images not meant to be part of the artwork that they were investigating. Disk images may also contain Personally Identifiable Information (PII) from the artist or their technicians, such as addresses, social security numbers, phone numbers, or credit card numbers. Additional artworks, other than the accessioned artwork, may exist on the disk image as well.

This issue is discussed extensively in the literature on digital preservation (Kirschenbaum, Ovendon, and Redwine 2010; Leighton John 2012; Association of Research Libraries 2012; Farrell 2012). Archives have responded to this issue by modifying or creating new Deeds of Gift for born-digital materials. Museums may consider modifying their Acquisition Agreements in a similar manner, practicing informed consent. During acquisition, the artist or gallery would be made aware that disk images will be created for preservation purposes and that these disk images are bit-for-bit copies of the entire hard drive, which may contain deleted files.

POSTIMAGING PHASE

As previously outlined in this article, the creation of a disk image is only one step among many others that can be applied for the preservation of software-based art. Based on the authors’ own experiences and on an interview with Alex Chassanoff and Colin Post, coauthors of the paper “Digital Curation at Work: Modeling Workflows for Digital Archival Materials”(2019), the common denominator among cultural heritage institutions engaged in disk imaging is that, once a disk image is created, the pathway that follows is largely uncharted territory.

In the section that follows, a post–disk imaging workflow is proposed that may help collecting institutions in the following activities:

  • Cataloguing
  • QC (Quality Control)
    -Bad Sectors
    -Verification
    -Mounting disk images
  • Analysis of Disk Image using fiwalk and Digital Forensics XML
  • Virtualization and Emulation strategies using disk images
  • Exhibition

Cataloguing

All three institutions involved in this research issue a unique number for each disk image file in their respective museum’s collection management system (fig. 2). This catalogue record indicates the status of the image (whether it was created by museum staff or artist provided) and what it was derived from (often another component, such as a computer, which is specifically named). Additional information includes the date the image was created, who created it, and the software used.

Fig. 2. A component list for an artwork in the Guggenheim’s collection management system, TMS. The artist-provided computer is component 2007.31.1 and disk images created from that computer are 2007.31.8 and 2007.31.9. Screenshot by Jonathan Farbowitz.
Fig. 2. A component list for an artwork in the Guggenheim’s collection management system, TMS. The artist-provided computer is component 2007.31.1 and disk images created from that computer are 2007.31.8 and 2007.31.9. Screenshot by Jonathan Farbowitz.

In addition, conservation documentation, such as a disk imaging report, can contain much more granular information about the conditions under which the image was created and the results of various QC tests, which will be discussed later in this article. The various info files and reports from imaging software are saved within the museum’s records about the artwork. Pertinent metadata files, such as info files created by the imaging software, may also be included with the disk images in a “bag,” derived from the Library of Congress’s BagIt specification and stored in the museum’s digital repository. Some institutions package the bag further, such as wrapping it in a tar archive, which can be stored in a server location as a single file (Kunze et al. 2018).

Metadata describing the disk image is recorded in the collection management system, such as disk image format, creation date, relationship to other artwork components, and its status (e.g., whether it is the artist’s master or exhibition copy). When working with computers or devices entering the collection, the collection management system can be used to store information about the hardware and software, passwords/passcodes, volume names, byte offset, and file system, along with any other contextual information gathered during the acquisition process.

Quality Control of Disk Images

Conservators typically do QC testing for video or audio files accessioned into the collection to make sure that the files they created or received appear or sound as expected and are useful for re-exhibiting the artwork. The authors considered how to do the same for disk images with the purpose of both verifying that the disk image is a true backup of the source media and ensuring that the image is usable. When determining a QC workflow for disk images, there are two areas worth considering: bad sectors and disk image verification.

Bad Sectors

If the imaging software uncovered bad sectors on the original source media, this is important to understand. A bad sector refers to an inability to read a discrete unit on a disk (the “sector” is the smallest discrete unit on a disk). A bad sector cannot be read either owing to a write error by the hard drive or physical causes such as deterioration of the drive or a malfunction/failing hard drive. Bad sectors presenting consistently in the same sector location on the disk would indicate damage to the drive, whereas inconsistent location of bad sectors could potentially indicate a read error as opposed to damaged media. All imaging programs include information about the number of bad sectors found during imaging in their INFO report sidecar files.

However, the question remains when imaging—how many bad sectors is too many? The authors came to the conclusion that decisions should be made case by case. Older media will likely contain some bad sectors depending on their age and storage conditions. Newer media with bad sectors may justify a different approach. For example, a conservator could request a new copy of a hard drive or disc that has many bad sectors from the artist. Alternative methods to address bad sectors could involve attempting to image the drive again with data recovery software, such as ddrescue. A conservator could also send the media with bad sectors to a data recovery service in an attempt to obtain a better disk image.

Verification of the Image

Given that the most common purpose for creating a disk image in media conservation is to create a bit-for-bit copy of a piece of physical media, one must ensure the bit-level accuracy of a copy. This process is referred to as verification and typically involves comparing the data in the disk image to the data on the original source media via checksums. While a manifest of checksums for all of the individual files on a drive can be created, this only ensures an accurate copy of the logical file system of the device.

A more comprehensive approach is to create a checksum of the entire volume postimaging and compare that value to the checksum of the disk image. The authors recommend rereading the original source after imaging, calculating the checksum of the source drive, and verifying that the disk image is an exact match by ensuring that the checksum of the source volume and the disk image are identical. A potential cause of a checksum mismatch is a bad read of the source drive by the imaging software. If the imaging software does not accurately record the data off of the drive for whatever reason, then the checksum of the source volume will not match the checksum of the image.

It should be noted that the checksum of an entire drive or device is not a particularly granular form of metadata. When presented with a checksum mismatch, in some cases, a more thorough understanding of the differences of the original media and the disk image may be important. Tools for conducting a deeper dive into such disk images are discussed in the fiwalk and Digital Forensics XML section to follow. 

Mounting

Disk images do not “open” like a traditional document file but rather “mount” as an external volume in the same way as an external hard drive or optical disc. Once the image file is mounted, it can be accessed as if it were  any other drive; the user can browse file directories and copy files from the disk image to the local computer’s file system. Unfortunately, not all disk images can be mounted on all computer systems. This depends on both the format of the disk image and the file system inside the image. For example, a disk image with the HFS file system (used in older Macs) cannot be mounted and explored on a Windows or Linux machine without additional software.

Mounting can be a method of verifying whether the disk image is both usable and an accurate representation of the original source media. Mounting the image and exporting files are the second test in the Guggenheim’s QC workflow. When mounted, conservators also check the partition layout (the way the drive is separated into different sections) of the image to ensure that it is the same as the original media (Farbowitz 2018).

Following a workstation overhaul at MoMA’s media conservation lab, where computers were upgraded to macOS High Sierra (10.13.6) in unison, conservation staff diligently installed the FOSS (Free and Open Source Software) tools they had been using. One such tool is libewf, a library to access the Expert Witness Compression Format (EWF), developed and maintained by Joachim Metz. Libewf facilitates the acquisition of disk images, exporting images from EWF to raw formats, and examining metadata within EWF files, among other utilities.

During that process, MoMA media conservation ran into issues both installing libewf and mounting images with the ewfmount utility. For example, disk images would mount as text files―the only solution at that moment was to change the extension to properly mount them. This came from a suggestion from Eddy Colloton in the Captain’s Log. Although not a sophisticated or elegant solution, it did work on some of the disk images that MoMA was initially testing. In another case, an attempt to mount a disk image would result in the messages “following disk images couldn’t be opened” or a “No subsystem to mount FUSE error.” This led the team to reach out to colleagues, including Joachim Metz, the developer of the tool, on GitHub, and Nick Krabbenhoeft from the New York Public Library for input on these problems.

Unable to find a solution, and with limited time remaining for the research project, it was decided to use the BitCurator suite, which can run on a virtualized Linux environment. This suite of tools is thoroughly documented and very user friendly. When initially mounting the disk image in the BitCurator environment, using the “mount disk image” script, one could browse through the FAT32 partition of the disk only (even while selecting to view all hidden files in finder). However, when using the Disk Image Access Interface on which you can export specific files and/or directories from the disk image, you could see other partitions on a macOS disk image that the team was working on (volume 2 and volume 3 of the HFS+ partitions).

On one hand, this experience raises concerns about our reliance on tools that fall outside of our field of practice. Working with proprietary disk image formats can pose interesting challenges that require time, outside expertise, and dedication to work through. At the same time, this experience demonstrates that there is a large community of users who are willing to share their knowledge and assist in troubleshooting and that open-source tools (such as BitCurator) can provide useful solutions to tricky problems.

Analysis of Disk Image using fiwalk and Digital Forensics XML

Introduced in 2009, the fiwalk tool “is designed to automate the initial forensic analysis of a disk image” (Garfinkel 2009, 2). Short for file inode walk, the tool “walks” the disk image, describing every file, partition, and even the serial number of the imaged disk. The tool allows for several different output types. At the Hirshhorn, the XML output has been found to be most useful, as it is formatted in a standardized DFXML (Digital Forensics XML) output and allows for automated differencing using Python.

The Digital Forensics XML Python Toolkit is available through fiwalk, and includes a tool for comparing two forensic disk images (“Forensic Disk Differencing” 2013). DFXML outputs from fiwalk are easily created by running fiwalk against the disk image using the “-X” flag and providing a file path for the XML file. In this example, where “~/Desktop/fiwalk_output1.xml” is the destination of the XML output and “/home/bcadmin/mountpoint/ewf1” is the disk image, the command is:

bcadmin@ubuntu:~$ fiwalk -X ~/Desktop/fiwalk_output1.xml ‘/home/bcadmin/mountpoint/ewf1’

Initial tests with the “idifference.py” script were unsuccessful, but the “idifference2.py” script, currently in alpha version on the DFXML GitHub download, worked well in the BitCurator environment when used with Python 3.6 (Colloton 2018; Garfinkel 2012). The tool can compare two fiwalk reports using the following syntax:

bcadmin@ubuntu:~$ python3.6 ‘/home/bcadmin/dfxml-master/python/idifference2.py’ ‘/home/bcadmin/Desktop/fiwalk_output1.xml’ ‘/home/bcadmin/Desktop/fiwalk_output2.xml’ > differences.txt

The output of the idifference2.py command is broken down into 4 sections: “New Files,” “Deleted Files,” “Files with modified content,” and “Files with changed file properties.” Assessing the significance of individual files is especially challenging without a particularly granular understanding of the software’s functionality.

A change in the checksum of the data on a computer hard drive could be caused by simply turning the machine off and then on again. Then again, the change of a single file may significantly affect the functionality of an artwork. This would represent a worst-case scenario, in which the file in question would likely be something rather obvious.

Nevertheless, the ability to accurately evaluate the significance of a minor change to the data on a carrier of a software-based artwork requires a particularly thorough knowledge of how the software operates. Software-based artworks will locate significant data in different files and directories and will, of course, have their own significant properties defining appropriate functionality. To ensure that any change in the data would not impact the authenticity of an artwork requires herculean research of the work’s functionality and an in depth understanding of the work’s underlying dependencies. Such a granular understanding of a software-based artwork is essentially a luxury, the result of a thorough research project distilling shared knowledge between computer science and media conservation, which cannot be expected to be the standard. This, in part, is why disk imaging has become the tool du jour for creating a preservation master format for software-based works; it allows for future research on all data on a known working copy—the computer acquired with the artwork.

Virtualization and Emulation Strategies using Disk Images

While disk imaging mitigates the risk of data loss caused by a hardware failure, it hardly guarantees one’s ability to accurately render that data in the future. For this reason, disk imaging should be seen as only the first step of many in staving off obsolescence. To continue to render the software beyond the life of its original hardware model, the data must be rendered in other environments.

Disk images allow conservators the possibility of running a software-based artwork independent of the original hardware via software called an emulator or virtual machine. Virtualization and emulation are two popular methods for running older software on contemporary computers, even if that software is now obsolete. Virtualization and emulation are employed by many communities for replicating the functionality of software that is not compatible with, or simply not installed on, the host hardware (Johnston 2014). Emulation as a term has previously been used more broadly in conservation literature. For example, the Variable Media Glossary defines emulation as “imitating the original look of the piece by completely different means” (Depocas et al. 2003, 125). This definition of emulation could encompass physical objects as well as computer software. In this article, the term emulation is used more narrowly to describe the process of running an emulated version of a computer operating system that relies on hardware that is either obsolete or not present on the host machine. Virtualization, by contrast, allows for more direct communication between the hardware on the host machine and the virtualized system. In general, “the purpose of a virtual machine is to create an isolated environment” where a different operating system or older software can be run, while “the purpose of an emulator is to accurately reproduce the behavior of some hardware,” usually vintage hardware (Voigt 2014).

As an example of the importance and specificity of host hardware, at the Hirshhorn the data that the conservation department copied from an artist-provided mid-2011 Mac mini was found to be incompatible with a hard drive from a late-2012 Mac mini. The hard drive from the late-2012 model was removed and the disk image was written out to the disk and verified. Despite verification that the data on the drive was identical to the original drive, the machine refused to boot up, producing an error message stating that the version of macOS (10.7.5) that had been installed on the mid-2011 model was not supported by the late-2012 machine. Therefore, in less than two years, the hardware produced by the same company lacked sufficient interoperability to allow for hardware replacement without changes to the artist’s software. While stockpiling mid-2011 Mac mini computers would allow for temporary alleviation of impending obsolescence, the lifespan of these machines is ultimately limited and alternative preservation strategies must be developed, such as emulation and virtualization.

Emulation and virtualization have been employed by Rhizome as well as other institutions to make now-obsolete software accessible to the public over the Internet. One popular system for cloud-based emulation is Emulation as a Service (EaaS), which makes preconfigured emulation environments available via a remote server.2 Instantiations of EaaS use disk images to run software, such as images of CD-ROMs for video games created by Theresa Duncan (1966–2007) and the disk image of a vintage Macintosh computer for Bomb Iraq (2005-2014), by Cory Arcangel (b. 1978) (Espenschied 2018). In the case of Bomb Iraq, the entire EaaS platform was written to a bootable USB flash drive, which was connected to a small computer (an Intel NUC model) in the gallery. As described by Espenschied, one often starts with a base disk image: a standard install of an operating system such as Windows 95. Files are added and settings are changed based on what is needed to get an artwork running. Each progressive change is saved as a series of QEMU qcow files (Falcao et al. 2017). Espenschied refers to the base image with added artwork files, necessary dependencies, and settings changes as a “synthetic disk image” (Espenschied 2017).

Creating a base disk image of a VM with a particular operating system and versioning the base image to meet an artwork’s needs circumvents the challenges of translating data from a physical machine to a virtual one. By beginning with a VM, and populating the virtual machine with data from a disk image of a physical machine, the focus can shift from successfully booting up the machine to identifying essential data to run an artwork. This is the approach that Rhizome has taken to facilitate running many works with obsolete software dependencies (Espencheid 2017).

As part of its QC testing for disk images discussed earlier, the Guggenheim has attempted to run disk images from various computer-based artworks in emulators and VMs. Emulation and VM software has included Basilisk II, Sheepshaver, and DOS Boxer, but the most commonly used software was VirtualBox. Attempting to run disk images of artwork-related computers produced both successes and failures to boot. Therefore, at this time, failure to boot is not considered a failure in QC for the disk image, as there are many technical reasons that an image could fail to boot in a VM. For example, some disk images of older versions of Windows (95 and XP) may fail in emulation/virtualization owing to a problem with the hard disk controller. As Falcao et al. (2017) demonstrated, choosing a generic hard drive controller may alleviate the problem and allow the emulator or VM to run the disk image. In this way, eccentricities of an operating system may present unique challenges to virtualization, with each eccentricity requiring its own unique solution.

Software-based artworks can provide even greater challenges than other digital collection objects in that they often rely on specific hardware as well as software. Once virtualization or emulation is successfully achieved, the software-based art installation must be able to communicate with peripheral devices. While a direct side-by-side comparison between the artwork running with the original hardware and the work running within an emulator or VM is ideal, this comparison is not always feasible owing to factors such as space constraints, finite resources, and finite staff time. In some cases, a VM running successfully in a host machine may still not communicate properly with peripheral devices. Thorough testing and troubleshooting is required to solve these communication issues but can sometimes be impractical owing to space and resource constraints. Despite the promise of emulation and virtualization to render software independent of the original hardware, many challenges still exist to rendering artworks faithfully.

Exhibition

Practices for exhibiting a software-based artwork using a disk image are still in development. However, the exhibition of a work represents a key opportunity for research, developing more robust documentation and creating a deeper understanding of the work’s needs. Disk imaging can be a powerful tool when conducting such research.

Disk images can also be used to create a new exhibition copy of the original device on a new machine. This process, referred to as creating a replica computer, was adopted by MoMA while exhibiting the Long March: Restart (2008), by Feng Mengbo (b. 1966) (Lewis and Fino-Radin 2015). Creating a replica computer can be an effective strategy for mitigating the risk of hardware failure, but it also allows an opportunity to evaluate the significance of certain properties of the original machine. If the replica does not contain the exact same components or the exact same data as the original computer but still faithfully reproduces the work, it is still an effective replica. An ineffective replica—one that lacks certain data or components and does not faithfully reproduce the work—may still be instructive, helping conservators to understand what components are essential to the work.

Over the course of this research project, the authors were not working to prepare any of the test disk images for use in an exhibition; thus, this exploration could be a future phase of this research.    

CONCLUSION

This yearlong cross-institutional collaboration informed the disk imaging policy of each of the museums involved, and helped the authors solidify procedures at their respective institutions. Through weekly meetings, interviews with experts, and engagement with other communities that use disk imaging tools, it became more evident that collaboration is a fundamental ingredient in building confidence in the selection of disk imaging tools, disk image formats, and troubleshooting procedures.

Techniques such as disk imaging can assist conservators in backing up the data of software-based artworks, which may be stored on aging or obsolete computers and physical storage media. Despite its technical complexities, the authors believe that disk imaging is the most prudent procedure for the long-term preservation of data on storage media versus other potential methods. For example, traditional backup procedures, such as Apple’s Time Machine software, can miss data that is integral to an artwork, which more methodical processes such as disk image will capture. By virtue of verification, a disk image guarantees a bit-for-bit copy of the original volume, ensuring a precise backup. The disk image is also a secure container format, which prevents the future alteration of files or their metadata, ensuring their validity if the disk image is transferred from one digital storage location to another.

Best practices for disk imaging software-based artworks are coalescing. There is consensus in the field of media conservation around the value of storing a copy of the disk image in the raw format and/or the EWF format; there has also been rigorous review of each format’s pros and cons. Similarly, a common suite of tools for creating and working with disk images is shared by many members of the cultural heritage field. Creating an ESD-safe work area when removing a hard drive from a computer and the use of a write blocker when accessing any artwork-related writable media (such as a hard drive or USB flash drive), for instance, are shared practices across much of the community and are recommended by the authors. There are several good options when selecting software for creating disk images, allowing for the use of different operating systems and user interfaces, while still achieving equivalent functionality. While this article outlines some challenges that the authors faced when working with libewf in a Mac environment, the libewf library is enormously helpful. The complexity of managing dependencies of sophisticated tools simply underscores the value of collaborative efforts such as the BitCurator project and demonstrates the ability to grow best practices through cooperation with other fields.

The research that informed this article revealed that further investigation is needed in the postimaging phase, both in the media conservation community and in libraries and archives. Verification of a disk image and a thorough appraisal of a disk imaging software’s sidecar file are commonly practiced. However, there is less unanimity of practice in further assessing a disk image. Given the wealth of data potentially contained within a disk image and the tools available for documenting this data with a high level of granularity, leveraging all of that information to guide preservation actions can be a daunting task. As disk imaging practices grow more ubiquitous, as it seems they will, there is a need for continued collaboration and practice sharing, particularly in the postimaging phase.

The authors of this article strongly recommend verifying a disk image using a checksum but also suggest that conservators perform a more thorough assessment of the disk image by creating fiwalk reports and exploring emulation, virtualization, and replication. Disk images provide a great opportunity to better understand an artwork and its functionality. Thorough qualitative evaluation of a disk image, while sometimes complicated, takes advantage of this potential. The authors encourage practitioners to share their workflows and results to help better define best practices of these processes in the future.

New practices in media conservation, such as disk imaging, should not be seen as usurping traditional practices, which maintain their enduring value. Creating complementary documentation of the originally acquired artwork, its hardware and software environment, and any conservation actions taken, are essential to both contextualizing a disk image and supporting its future use to exhibit a software-based artwork while maintaining its artist-intended behaviors. No matter how technical the treatment, the unique characteristics of the artwork and an understanding of its conceptual underpinnings should remain at the forefront of the conservator’s mind when making decisions or evaluating the success of an intervention. Disk imaging is a valuable technique in the growing practice of conserving contemporary artworks.

ACKNOWLEDGMENTS

The authors of this article would like to acknowledge the following institutions and foundations for their support:

The Museum of Modern Art (MoMA), New York, and the Andrew W. Mellon Foundation
Hirshhorn Museum and Sculpture GardenSmithsonian Collections Care Initiative
Solomon R. Guggenheim Museum—Conserving Computer-Based Art (CCBA) is supported by the Carl & Marilynn Thoma Art Foundation, the New York State Council on the Arts with the support of Governor Andrew Cuomo and the New York State Legislature, and Josh Elkes.

The authors would also like to acknowledge the following individuals for their research assistance and guidance:

Amy Brost, Assistant Media Conservator, MoMA
Alex Chassanoff, Research Program Officer, Educopia
Dragan Espenschied, Preservation Director, Rhizome
Briana Feston-Brunet, Conservator of Variable and Time-based Media, Hirshhorn Museum and Sculpture Garden
Agathe Jarczyk, Dipl. Rest. Conservator, Atelier für Videokonservierung
Nick Krabbenhoeft, Digital Preservation Manager, NYPL
Kate Lewis, Agnes Gund Chief Conservator, MoMA Conservation Center and Department
Joachim Metz, Digital Researcher and IT/IS specialist
Lyndsey Jane Moulds, Software Curator, Rhizome
Peter Oleksik, Associate Media Conservator, MoMA
Colin Post, Doctoral Candidate, University of North Carolina, Chapel Hill
Lena Stringari, Deputy Director and Chief Conservator, Solomon R. Guggenheim Museum
Guy Voncken, Lead Developer, Guymager

NOTES

  1. Drives that are primary and secondary are often referred to in technology literature and in a computer’s BIOS as “MASTER” and “SLAVE.” The authors have avoided using these terms in this article because of their perpetuation of oppressive historical and ideological concepts.
  2. The EaaSI (Emulation as a Service Infrastructure) project, led by the Digital Preservation Services team at Yale University Library, and with support from OpenSLX, DataCurrent, PortalMedia, Educopia, and the Software Preservation Network, seeks to make Emulation as a Service more scalable as an emulation platform across a consortium of cultural heritage institutions. The results of this project could hold promise for the development of emulation as a viable strategy for exhibiting artworks. For more information, see https://www.softwarepreservationnetwork.org/eaasi/.

REFERENCES

Arroyo-Ramírez, E., K. Bolding, F. Charlton, and A. Hughes. 2018. “Tell us about your digital archives workstation”: A survey and case study. Journal of Contemporary Archival Studies 5 (1). https://elischolar.library.yale.edu/jcas/vol5/iss1/16 (accessed 08/14/19).

Association of Research Libraries. 2012. Special issue on special collections and archives in the digital age. Research Library Issues 279 https://publications.arl.org/2ds247.pdf (accessed 07/17/19).

“BitCurator,” n.d. http://bitcurator.net/bitcurator/ (accessed 04/10/19).

Colloton, E. 2018. “Any advice on comparing the contents of 2 disk images? Getting some checksum mismatches, but I believe the changes were inconsequential (optimistically).” Tweet. @EddyColloton, October 5, 2018. https://twitter.com/EddyColloton/status/1048195437801422848. (accessed 03/05/2019).

Cramer, F. 2002. “Concepts, notations, software art.” Last modified March 23, 2002. http://cramer.pleintekst.nl/all/concept_notations_software_art/concepts_notations_software_art.html (accessed 08/14/19).

Depocas, A., J. Ippolito, and C. Jones eds. 2003. Permanence Through Change: The Variable Media Approach. Guggenheim Museum Publications and The Daniel Langlois Foundation for Art, Science, and Technology. https://variablemedia.net/pdf/Permanence.pdf  (accessed 06/03/2020).

Dietrich, D., and F. Adelstein. 2015. Archival science, digital forensics, and new media art. Digital Investigation 14. The Proceedings of the Fifteenth Annual DFRWS Conference. Philadelphia, PA. S137–S145. https://www.sciencedirect.com/science/article/pii/S1742287615000493 (accessed 06/04/2020).

Digi-Key Electronics. 2018. “Shelf life for SCS static shielding bags.” Tech Forum. Last modified January 26, 2018. https://forum.digikey.com/t/shelf-life-for-scs-static-shielding-bags/1126 (accessed 07/18/19).

Engel, D., G. Wharton. 2014. Reading between the lines: Source code documentation as a conservation strategy for software-based art. Studies in Conservation 59 (6): 404–415.

Espencheid, D. 2017. Emulation and access. Paper presented at the Peer Forum on Disk Imaging, Museum of Modern Art, December 8, 2017. https://vimeo.com/278042613/a78ee5e46b (accessed 07/26/19).

Espenschied, D. 2018. “Acknowledgement, Circulation, Obscurity, System Ambience” Rhizome, August 8, 2018. https://web.archive.org/web/20180808092149/http://rhizome.org:80/editorial/2014/jun/24/emulating-bomb-iraq-arcangel/ (accessed 07/19/19).

Falcao, P., K. Rechert, D. Espenschied, and T. Ensom. 2017. Emulating subtitled public: Towards a preservation workflow for software-based art. Paper presented at the American Institute of Conservation Annual Meeting, Chicago, IL, 2017. https://nc.1x-upon.com/s/25PzbZiWrLQwQqH#pdfviewer (accessed 07/19/19).

Farbowitz, Jonathan. 2018. Archiving Computer-Based Artworks. Electronic Media Review. Vol. 5. Washington, DC: AIC. http://resources.conservation-us.org/emg-review/volume-5-2017-2018/farbowitz/ (accessed 08/5/19).

Farrell, M. J. 2012. Born-digital objects in the deeds of gift of collecting repositories: a latent content analysis. Master’s Thesis, University of North Carolina at Chapel Hill. https://cdr.lib.unc.edu/concern/masters_papers/sn00b218p (accessed 07/19/19).

“Forensic Disk Differencing.” Forensics Wiki. Last modified October 22, 2013. https://forensicswiki.xyz/wiki/index.php?title=Forensic_Disk_Differencing (accessed 06/04/20).

Garfinkel, S. L. 2009. “Automating disk forensic processing with SleuthKit, XML and Python.” In Systematic Approaches to Digital Forensics Engineering (IEEE/SADFE 2009. http://simson.net/clips/academic/2009.SADFE.xml_forensics.pdf (accessed 08/14/19).

Garfinkel, S. L. 2012. “Digital Forensics XML project and library.” GitHub. https://github.com/simsong/dfxml (accessed 03/04/19).

Garfinkel, S., D. Malan, C. Stevens K. Dubec, and C. Pham. 2006. Advanced forensic format: An Open, Extensible Format for Disk Imaging. In Advances in Digital Forensics II: FIP International Conference on Digital Forensics, National Center for Forensic Science, Orlando, Florida, January 29-February 1, 2006. ed. Martin Olivier and Sujeet Shenoi. New York: Springer. 17–31. http://nrs.harvard.edu/urn-3:HUL.InstRepos:2829932 (accessed 06/04/20).

Goethals, A. and AVPreserve. 2016. “Harvard Library, disk image content model and metadata analysis.” Harvard Wiki. https://wiki.harvard.edu/confluence/display/digitalpreservation/Disk+Image+Formats (accessed 08/14/19).

Hasting, M. n.d. “Master/Slave Settings vs. Cable Select.” PC Hell. http://www.pchell.com/hardware/masterslaveorcableselect.shtml (accessed 07/23/19).

Henry, P. 2011. “How to: Forensically sound Mac acquisition in target mode.” SANS Digital Forensics and Incident Response blog. https://digital-forensics.sans.org/blog/2011/02/02/forensically-sound-mac-acquisition-target-mode (accessed 07/19/19).

Johnston, L. 2014. “Considering emulation for digital preservation.” The Signal. https://blogs.loc.gov/thesignal/2014/02/considering-emulation-for-digital-preservation/ (accessed 04/10/19).

Kirschenbaum, M. G., R. Ovenden, G. Redwine, and R. Donahue. 2010. Digital forensics and born-digital content in cultural heritage collections. CLIR publication no. 149. Washington, DC: Council on Library and Information Resources. https://www.clir.org/pubs/reports/pub149/ (accessed 08/14/19).

Kumar, K., S. Sofat, and N. Aggarwal. 2011. Identification and analysis of hard disk drive in digital forensics. International Journal of Research in Engineering and Technology 6 (2): 1674–1678.

Kunze, J., J. Littman, E. Madden, J. Scancella, and C. Adams. 2018. “The BagIt file packaging format (V1.0).” Last modified September 17, 2018. https://tools.ietf.org/html/draft-kunze-bagit-17 (accessed 03/11/19).

Leighton John, J. 2012. “Digital forensics and preservation.” Digital Preservation Coalition. http://www.dpconline.org/component/docman/doc_download/810-dpctw12-03pdf (accessed 10/18/15).

Lewis, K., and B. Fino-Radin. 2015. “Preparing for exhibition: Feng Mengbo: ‘Long March: Restart’ (2008).” Paper presented at the TechFocus iii: Caring for Software-based Art, Solomon R. Guggenheim Museum, New York. http://resources.conservation-us.org/techfocus/techfocus-iii-caring-for-computer-based-art-software-tw/ (accessed 07/29/19).

Library of Congress. 2015. “Expert Witness Disk Image Format (EWF) Family.” Last modified 2015. https://www.loc.gov/preservation/digital/formats/fdd/fdd000406.shtml (accessed 08/14/19).

———. “ISO Disk Image File Format,’ Sustainability of Digital Formats.” Last modified 2012. https://www.loc.gov/preservation/digital/formats/fdd/fdd000348.shtml (accessed 08/14/19).

Lowe, D. 2017. Electronics all-in-one for dummies. 2nd ed. Indianapolis, IN: John Wiley and Sons.

McKinley, M. 2014. Imaging digital media for preservation with LAMMP. Electronic Media Review. Vol. 3. Washington, DC: AIC. http://resources.conservation-us.org/emg-review/volume-three-2013-2014/mckinley/ (accessed 07/18/19).

Phillips, J. 2015. Approaching the challenge: Caring for Software-based Art in Museum Collections. Paper presented at TechFocus III: Caring for Software-based Art, Solomon R. Guggenheim Museum, New York, NY.

Post, C., A. Chassanoff, C. Lee, A. Rabkin, Y. Zhang, K. Skinner, and S. Meister. Digital curation at work: Modeling workflows for digital archival materials. In 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 39–48. Champaign, IL: IEEE. https://ieeexplore.ieee.org/document/8791228/ (accessed 08/14/19).

Prael, A., and A. Wickner. 2015. “Getting to know FRED: Introducing workflows for born-digital content.” Practical Technology for Archives. https://practicaltechnologyforarchives.org/issue4_prael_wickner/ (accessed 02/21/19).

Voigt, B. 2014. “Emulation —What are the specific differences between an ‘emulator’ and a ‘virtual machine’?” Stack Overflow. Last modified June 2, 2014. https://stackoverflow.com/questions/6234711/what-are-the-specific-differences-between-an-emulator-and-a-virtual-machine (accessed 08/27/19).

Voncken, Guy. 2015. “Guymager’s Source Verification.” Guymager Wiki. Last modified March 21, 2015. https://sourceforge.net/p/guymager/wiki/Guymager%27s%20source%20verification/ (accessed 07/26/19).

Wikipedia. 2018. “Zlib.” https://en.wikipedia.org/w/index.php?title=Zlib&oldid=875113857 (accessed 03/14/19).

Zlib. 2017. “Zlib.” Last modified January 15, 2017. http://www.zlib.net/ (accessed 07/22/19).

AUTHORS

Eddy Colloton
Project Conservator, Time-Based Media
Hirshhorn Museum and Sculpture Garden
CollotonE@si.edu

Jonathan Farbowitz
Fellow in the Conservation of Computer-based Art
Solomon R. Guggenheim Museum
jfarbowitz@guggenheim.org

Caroline Gil
Andrew W. Mellon Fellow in Media Conservation
Museum of Modern Art (MoMA)
caroline_gil@moma.org

Flaminia Fortunato
Andrew W. Mellon Fellow in Media Conservation
Museum of Modern Art (MoMA)
flaminia_fortunato@moma.org