Computational Provenance and Computational Reproducibility: What Can We Learn about the Conservation of Software Art from Current Research in the Sciences?

Mark Hellar and Deena Engel
The Electronic Media Review, Volume Four: 2015-2016

ABSTRACT

The field of art conservation has relied on advances in scientific research throughout history, whether from studying contemporary approaches to chemical analysis of materials to taking advantage of new imaging techniques or many other examples. Our field is the study of the conservation of time-based media and software art. To this end, we have focused on the theoretical framework and practical application of two fields of study in applied mathematics and the sciences known as computational provenance and computational reproducibility. Our goal is to ascertain whether and how these approaches could inform our work on the conservation of time-based media and software art.

We will begin with an overview of the techniques used in the sciences to ensure that results are reproducible. It is a basic premise of the sciences that experimental results must be consistent in order to be validated; however as current scientific research relies heavily on computational techniques, the software and technologies used to obtain current scientific results must be preserved so that those same scientific results will be achieved in the distant future. This is analogous to the goal of art conservation as museums will wish to re-exhibit contemporary works of time-based media and software art over time.

Through this model we will consider how the artist’s intention can be preserved and authentically represented throughout an artwork’s lifespan and exhibition history. We will look at the depth of what should be considered to accurately execute or replay data such as digital media or execution of software code by understanding the metadata of its computing environment, such as codecs, compilers, the operating system, and hardware. In addition, we will look at how this model can apply to documentation of physical and environmental factors (such as installation details, sculptural details, lighting, etc.).

INTRODUCTION

The field of art conservation has relied on advances in scientific research throughout history. For example, conservation scientists use contemporary approaches to chemical analysis of materials in order to study pigments and grounds, e.g. the Courtauld Institute’s work on Cézanne’s watercolors in 2004-2008 (Buck 2008). Conservation scientists have a long history of using imaging techniques; for example, early x-ray techniques were used to study Rembrandt’s paintings in 1916 at the Kunsthistorisches Museum in Vienna (Von Sonnenberg 1995), while the use of x-rays and ultra-violet lights were used at the Museum of Modern Art in New York City to study and conserve Jackson Pollock’s (1912-1956) One: Number 31 (1950) in 2013 (Vogel), to cite two of many examples.

Our study focuses on the theoretical framework to assess the possible practical application of two current fields of study in applied mathematics and the sciences: the study of computational provenance and the field of computational reproducibility. The goal of our research is to ascertain how and whether these approaches could inform our work on the conservation of time-based media and software art.

CONCEPTUAL FRAMEWORK

A basic premise in the natural sciences is that experimental results must be consistent in order to be validated. Science students learn early on that they must document their materials and techniques (often referred to as a “lab notebook”) so that another scientist can use the same materials and the same techniques to come up with the same result. This is a guiding principle in scientific research.

However, current scientific research often relies heavily on computational methods and computational processes to obtain results. Therefore, it is crucial to the credibility of the results that all of the technologies that are used in the computational processes to produce the results (including all of the hardware and software) must be documented and preserved in some way so that those same scientific results will be achieved in the distant future.

We posed two questions in our research as follows: first, is this work in the natural sciences and applied mathematics analogous and therefore relevant to our research on the conservation of time-based media and software art? Second, if so, can we learn from this approach in order to assist museums and galleries who will wish to re-exhibit works of time-based media and software art in the future? Are there specific practical guidelines, methodologies, and approaches that we can learn from contemporary scientists that would facilitate art conservation practices?

COMPUTATIONAL PROVENANCE

The term computational provenance comes from the art world (Dawson 2012). This term refers to documenting all of the steps in a science experiment, including not only the techniques, but also the materials used and all of the relevant data required for a future scientist to reproduce the experiment and obtain the same result. Whereas provenance in the art world refers to a documented history of the ownership and/or location of a work of art, scientists use the term provenance in this context to refer to all of the steps in their work. Scientists typically document the design of an experiment, how the experiment was performed, how the raw data were acquired, and all of the steps in the ensuing computation, data manipulation, and data presentation.

Art conservators are typically trained to document all of the steps that they undertake as they assess, repair, and prepare works of art for conservation and re-exhibition. In this sense, the practice of art conservation is in keeping with the principles of computational provenance.

COMPUTATIONAL REPRODUCABILITY

Reproducibility is a term in the sciences used to capture the concept that an experiment can be repeated with the same results. Computational reproducibility specifically addresses scientific research that relies on computer hardware and software applications to support the results. All computer simulations, calculations, data processing, data analysis, and related tasks must be documented so that they can be reproduced in the future.

Differences Between Computational Reproducibility and the Conservation of Time-based Media and Software Art

We found several general areas of difference in scientific practice and art conservation activities:  areas of concern, goals of research, and practical differences in workflow.

With respect to scientific practice, scientists have three general areas of concern to their research that do not apply to the conservation of time-based media and software art. First, floating point arithmetic is of great importance to scientists, but the level of precision that scientists need is not typically relevant in the creation of software art. Second, while there are some works of art that rely on data and data sources, art conservators do not have the storage and processing concerns to address that would merit the time and attention that scientists need to devote to research that involves big data. Third, works of art are relatively small applications and do not require parallelization or other powerful processing techniques that are important in the sciences.

On the other hand, art conservators have a number of concerns with respect to the conservation of time-based media and software art that are not typically relevant to scientists. Art conservators are necessarily concerned with aesthetic factors such as fidelity to color and color spaces implementation, visual design, the speed of any animation, the resolution needed to properly display the works, and other aesthetic factors that are crucial to current and future viewers for a given work of art. In essence, scientists seek to reproduce calculated results (“to get the same answer”), while art conservators seek to reproduce the viewer’s experience. These different goals inform different working practices.

From a practical perspective, the workflows differ as well. Research scientists are generally expected to manage concerns about reproducibility before publication. Artists however often turn their works over to museums and galleries without specific conservation goals or plans in mind and it is the responsibility of the museums, galleries, and archival institutions to conserve these works of art. It is possible that as museums and galleries ask that artists turn over their source code and document their works of software art, that this practice will change in the near future.

Similarities Between Computational Reproducibility and the Conservation of Time-based Media and Software Art

We found three general areas that are described in the literature on computational reproducibility that do apply to the conservation of time-based media and software art and are currently in use in some museums. We believe that formally and consistently addressing these three areas will greatly benefit art conservators as they strive to conserve and re-exhibit time-based media and works of software art.

The first and most important common practice in these two fields is the importance of obtaining and preserving source code. There is a growing pressure on scientists to submit all of their source code along with the raw data for every publication that relies on computation for results. This is analogous to galleries and museums setting up standards and protocols to require that artists provide source code along with the work of software art at the time of acquisition.

Along with source code, scientists are now typically asked to document the computation environment, including factors such as the operating system and version, hardware setup (RAM, processor, etc.), and the programming languages used. Scientists are also asked to document any software dependencies (such as external libraries) that are required to maintain and run the source code. Software documentation is a standard software engineering practice and an important component of software maintenance in general. Some museums have undertaken software documentation as part of their conservation practice (Wharton and Engel 2014).

Scientists are also required to provide raw data with their results. We believe that artists should also provide all relevant media files (images, sound files, etc.) for their works at the time of acquisition in uncompressed formats. Artists who create works of database art should also provide museums with stand-alone copies of all of the relevant data along with the hardware and software tools to continue retrieving data, if that is relevant to the work of art.

WHAT ART CONSERVATORS CAN LEARN FROM COMPUTATIONAL REPRODUCIBILITY

In our research, we came across two additional areas of practice in software engineering that have had very positive outcomes with respect to computational reproducibility and we believe should inform standard conservation practices for works of software art going forward. These two practices are as follows:

  1. Implementing version control
  2. Differentiating between code and a system configuration and parameters

We then shifted our research to focus our case studies on these two areas, which we believe can further strengthen art conservation efforts for software art.

Software Version Control

Software version control is a system of managing and tracking changes to a collection of information, such as the source code of a computer program. Many of us have developed our own basic systems of version control, such as saving multiple copies of a document we were working on, perhaps with a date stamp or revision number in the filename. It’s a familiar approach; however, for a complex collection of computer source code, such a system has the potential to be error prone. Software developers realized the need for more structured systems based on this same principal, to keep track of how a document has changed over time and who made the changes.

One of the earliest systems of software-based version control was called the Source Code Control System (SCCS) and was developed at Bell Labs in 1972 by Marc Rochkind. SCCS allowed one individual at a time to modify a source code file and would record those changes to a special history file.  The history contained a cumulative record of which files were changed along with a user id and timestamp. SCCS had a number of commands that would allow one to compare differences across files, revert to older versions, and make a branch or derivative version of the source code, among many others. SCCS was the dominant version control software for many years and was eventually superseded by the Revision Control System (RCS) that was initially released in 1982 by Walter F. Tichy at Purdue University. RCS introduced a more space efficient way of tracking changes; this was important at the time, as hard drive space was a precious resource.

Version control systems have continued to evolve in their efficiency and add complex feature sets through today. The current dominant system is called Git, which was developed by Linus Toravalds, the inventor of the Linux operating system in 2005.

All of the version control systems share some common core concepts:

– A repository is a database of changes to the code, documents, or information for the project. It contains all edits and historical versions of the files.

-A working copy is a local copy of all the files in the project. These are the files one would make changes to. The set of files is also sometimes referred to as a checkout as one would check it out from the repository. Once a change has been made to the working copy, it is checked in or committed to back to the repository with information about the author, a comment on what was added or changed, a record of what changes were made, and a timestamp.

-A branch is a set of files that are forked or split off into a new copy of the code. This code could be manipulated in different ways or at a different pace from the code it was branched from independently. It is possible to merge a branch back into the version it was derived from.

As version control was developed to manage increasingly complex software projects, it offers a great deal of advantages to those who are conserving software-based works of art, if one has acquired the source code along with the artwork. Some of these advantages are:

-Version control offers the ability to store the history of changes to the artwork. If the creator of a program used version control earlier on, the conservator would have a granular set of records about its creation. For example, the media art programming language Processing (https://processing.org/) has used version control since its inception and has tracked every modification, addition, and deletion over 14 years (see fig. 1).

Fig. 1 The Processing version control repository has tracked every change to the project from its beginnings in 2001 up through today. There are 10,848 check-ins or commits. (Source: https://github.com/processing/processing )
Fig. 1 The Processing version control repository has tracked every change to the project from its beginnings in 2001 up through today. There are 10,848 check-ins or commits. (Source: https://github.com/processing/processing )

-Version control offers the ability to make further changes to the code for its continued operation, without altering the original. Software and hardware environments change, languages evolve, and this increases the possibility of have to migrate source code in order to keep it running. Options such as creating a branch offer the possibility of creating a research copy to test migration strategies without affecting the original acquired code. Also, a documentation branch could be created in order to add comments to the code so that it could be understood in the future.

Separating System Configuration and Parameters from Source Code

A number of software programs, operating systems and programming languages offer the ability to configure initial settings about their operation. This is typically done outside of the executable software in the form of a configuration file. Configuration files are typically written in ASCII plain text and contain a number of parameters that inform their accompanying software how to run. The parameters can cover a wide range of options depending on how the software was authored. For example, a configuration file parameter may define how much system memory the program is allowed to use, the color depth and screen resolution it should run at, etc. The possibility for parameters is limitless and unique to each piece of software. Programs typically read the configuration file on startup, and some may periodically check for changes as they execute.

The following is a snippet of the main configuration file for DOSbox (http://www.dosbox.com/), a Microsoft Disk Operating System (DOS) emulator. This section specifies what language the emulator should run in, the amount of memory it should use, type of machine it should emulate, and how the text should be encoded:  

# language — Select another language file.
# memsize — Amount of memory dosbox has in megabytes.
# machine — The type of machine tries to emulate:hercules,cga,tandy,pcjr,vga.
# codepage — Specify a code page number.
language=EN
machine=vga
codepage=437
memsize=16

One advantage to separating out the parameters from the main program is that it gives flexibility in altering how a program runs without having to recompile the source code. Additionally, parameterization of system operating variables could offer a level of portability if a program needed to move to new hardware.

CONCLUSIONS

In conclusion, we recommend further periodic research (perhaps every year or more) so that conservators of software art and time-based media might continue to learn from the model of computational reproducibility in the sciences. We would like to add the following recommendations, so that museums would implement and/or continue to implement the following practices:

  • Require source code from every artist along with the artwork at the time of acquisition if possible.
  • Document the source code, including all software dependencies, to ensure that the museum has acquired all relevant files.
  • Require all relevant media files from the artist (uncompressed if possible), as well as any relevant datasets.
  • Document the environment used to create and to run the artwork.

Museums should also strive to implement the following for works currently under research, conservation treatment, and/or during preparation for re-exhibition:

  • Version control
  • Parameterization of the source code

We believe that further research is needed to define and describe best practices and protocols to implement all of these tasks in order to best suit the needs of software art conservators.

REFERENCES

Buck, S et al., eds. 2008. The Courtauld Ćezannes. London: The Courtauld Gallery, Paul Holberton Publishers.

Dawson, A. 2012. Tutorial: Workflows for reproducible research in computational neuroscience. https://rrcns.readthedocs.io/en/cns2012/ (accessed 6/6/15).

Vogel, C. 2013. A Pollock Restored, A Mystery Revealed. The New York Times, May 27. www.nytimes.com/2013/05/28/arts/design/jackson-pollocks-one-number-31-1950-restored-by-moma.html?_r=0 (accessed 6/6/15).

Von Sonnenburg, H. 1995 Rembrandt/not Rembrandt in the Metropolitan Museum of Art: Aspects of Connoisseurship, vol. 1. New Haven, Connecticut: Metropolitan Museum of Art.

Wharton, G. and D. Engel. 2014. Reading Between the Lines: Source Code Documentation as a Conservation Strategy for Software-Based Art. Studies in Conservation 59(6): 404-415.

Mark Hellar
Technology Consultant
Hellar Studios LLC
mark@hellarstudios.com

Deena Engel
Clinical Professor
Director, Program in Digital Humanities and Social Science
Department of Computer Science
New York University
deena.engel@nyu.edu