Revealing Hidden Processes: Instrumentation and Reverse Engineering in the Conservation of Software-based Art

Tom Ensom
The Electronic Media Review, Volume Five: 2017-2018

ABSTRACT

Software-based artworks possess a curious material status. While rooted in bits stored on a physical medium, they can also be considered performative and ephemeral in that the tangible elements of such works are created on the fly when code is executed as a computational process. When realized, the artwork is experienced primarily in relation to the tangible inputs and outputs of this process. The conservator must navigate this seemingly contradictory nature, a challenge that has required the development of new analytical approaches. Source code analysis is one such approach and has been demonstrated to be a powerful tool in understanding software programs through the close study of the code in which they were written. However, this approach is not suitable—or even possible—in all cases. In this paper, I explore alternative approaches that allow the analysis of compiled software in order to derive useful information for the conservator, including their application to software-based artwork case studies from the Tate collection. In doing so, I consider the potential applications and limitations of these novel methods in relation to existing workflows and argue for their place in the toolbox of the conservator of software-based art.

INTRODUCTION

Over the past decade, software-based artworks have become an increasingly important research priority for those engaged in the long-term care of artworks employing digital technology. Such works present new challenges in their reliance on the unfamiliar medium of software. They are often technically complex and may employ many interrelated (and sometimes bespoke) components embedded in a specific technical environment. As a result, software-based artworks are particularly at risk from the effects of technological loss and obsolescence. Among the most important steps in developing a plan for dealing with these risks is the examination of the software system employed in order to develop a clear understanding of how it works, how its components relate to the artwork and its ongoing display, and how they might be maintained in the future. The materials acquired may vary from complete computer systems to software package downloaded from a server to source code and development projects. On initial inspection, the function (and event existence) of some components may be unclear. Despite recent advances in relation to source code analysis, alternatives and complementary approaches remain relatively unexplored for software-based artworks. In this paper, I discuss approaches to the analysis of software to support the examination and documentation of software-based artworks.

A helpful starting point to the discussion is to consider why conservators have a desire to analyze and document software in the first place. While this process is inherently variable and dependent on the characteristics of the work in question, the following information is usually of a high priority for understanding software systems:

Description of the hardware and software components and their interrelationships so that these can be maintained in the short term and emulated in the future.
Description of what the software does (i.e., its functionality) and how (i.e., its implementation) so that it can be rewritten in another programming language (or migrated) if necessary.
Description of important nonfunctional characteristics of the software’s behavior and a means of assessing their presence in future realizations of the work.

Software-based artworks may be acquired with variable levels of documentation (dependent on their development history); gaps in the above information may need to be identified and filled by conservators during acquisition or approaching a treatment. Much like digital media of other kinds, analysis of software might be most effectively carried out using software tools. However, unlike other digital media (such as digital video, which has a powerful set of specialized analysis tools), software analysis in a conservation context remains more poorly covered by existing literature. To some degree, this is a product of its newness in the field of conservation and digital preservation—we have simply not had the time to develop new tools. However, it is also simply a product of its complexity to a point—thinking about software in relation to ideas such as file formats only gets us so far. Instead, software requires an outlook that acknowledges a performativity concealing a layered materiality, which can be intercepted at various points. The next few sections of this paper consider this materiality and methods for revealing its layers. I will then present an in-depth software-based artwork case study that demonstrates some of these methods used in practice.

MULTIPLE MATERIALITIES OF SOFTWARE-BASED ART

To understand how we might examine software effectively, we need to understand the multiple levels of materiality at which analysis can be targeted. We can approach this by looking at what is occurring when a software-based artwork is experienced, installed, or accessed by a human observer. The performance model developed by the National Archives of Australia (Heslop, Davis, and Wilson 2002) helps us conceptualize this process. While the model was originally developed to describe how we experience digital materials in archives, I have reformulated it here (fig. 1) to captured how we experience software as the object of preservation.

Fig. 1. Diagram representing the software performance model.

In this model, at the root of any experience of software is the source of the performance itself: the code. The code is essentially a script: a set of instructions that tells a computer system to carry out some sequence of actions. This is generally typically understood as a software program. In terms of materiality, this code can be understood on two levels. It is a physical object—for example, a bitstream represented by magnetized regions of a hard disk platter—that we manage carefully in order to avoid loss, using appropriate archival storage systems, much as we would for any other digital thing. In addition, it is a logical object (Thibodeau 2002) in that the physical signs can be understood as meaning something to a computer system—for example, a Windows Portable Executable file is understood as a set of executable instructions (stored in a file) by Windows operating systems. This code is often associated with some kind of data—for example, graphical elements of a user interface or a database with which the program interacts. While these might also be referred to as “software” in many cases (and, indeed, in some cases might be packaged within an executable file), it may be helpful to think of them separately for preservation purposes.

The code’s meaning as a logical object—its executability and internal structure—allows the second part of the model to be initiated: the process. This describes the way in which code and data are brought together and executed as a computational process within a technical environment—a general term for the hardware and software components on which the process is dependent. In this part of the model, the logical object described in the previous section becomes an active process that, in turn, generates the final part of the model: the performance. The performance can be understood as the experiential qualities of the artwork that are perceptible to the observer. This could be as varied as a projected image, the movement of a mechanical part, or a website as seen in a web browser.

In Pip Laurenson’s influential paper on managing change in time-based media artworks (Laurenson 2006), she sets out a theoretical foundation for time-based media conservation that conceives of these works as allographic, that is, created in two phases: (1) the artwork, which has some coherent identity, which is realized as (2) individual installations (or my preferred term, realizations) of that work through time. The software performance might be best understood as an additional level below the realization of a work as a whole, which occurs at the same time, in which the software is at some point activated and thus contributes to the realization. The extent to which this lower level of performance is allographic is similarly variable. To use Laurenson’s terminology, it might range between thinly and thickly specified modes. A thickly specified software performance is closely tied to a specific underlying technical implementation carried out by the artist—a specific software object. A thinly specified software performance, on the other hand, is possible to separate from this underlying implementation, thus allowing change in the software object.

Restaging software performances through time is one way of seeing the core aim of software-based art conservators. In this role, they manage changes in the components on the left-hand side of the model (technical environment, code, data, and process), while ensuring that the integrity of the performance (on the right) is maintained. This is challenging because the executable code, process, and technical environment are all somewhat opaque: without special measures to reveal hidden processes, the system is simply a black box with a set of inputs and outputs. Fortunately, the field of software engineering offers a variety of ways of addressing this problem, particularly for cases in which poorly documented legacy systems are inherited and require maintenance.

REVERSE ENGINEERING AND SOURCE CODE ANALYSIS

Software engineering methods for analyzing software are known as reverse engineering techniques, which can be broadly defined as “the process of analysing a subject system to identify the system’s components and their interrelationships and create representations of the system in another form or at a higher level of abstraction” (Chikofsky and Cross 1990, 15). Revere engineering quite literally reverses the traditional software-engineering process, aiming not to move forward from design documentation to work products such as source code and deployable software but rather to work backward to derive useful information about the design from these artifacts (fig. 2).

Fig. 2. Diagram representing forward and reverse engineering processes (thin black arrows) in relation to the artifacts of software development (white boxes).

Reverse-engineering strategies can be understood in relation to the artifact of the development process that they target, known as a representation of the software, each of which has different utility for a conservator. Programmers usually author a human-readable (and, therefore, writeable) representation of the code, typically in a high-level language such as C++, Java, or Python. This must be compiled into machine code (or binary code) in order for a processor to act on it. However, there is variability between programming languages in terms of when this happens. In some cases, the source code is compiled (a kind of transformation) directly to native machine code for a particular platform, while in others it is translated to an intermediate representation that is interpreted when the program is executed—thus, transformed only into machine code at this stage. I will start this discussion by considering the most common reverse-engineering method: the analysis of the human-readable code, or source code analysis.

In source code analysis, the human-authored representation of the software program is analyzed through a process of program comprehension in order to derive information about the way in which a software program works. In addition to its well-established value in software-engineering processes (Das, Lutters, and Seaman 2007; de Souza, Anquetil, and Oliveira 2006; Singer 1998), it has now been demonstrated as a powerful tool in the conservation of software-based artworks through research undertaken by Deena Engel and a number of collaborators at New York’s Museum of Modern Art (Engel and Wharton 2014, 2015) and Guggenheim (Phillips et al. 2017; Dover 2016). Therefore, it is impossible to dispute the value of source code analysis in meeting the needs of examination processes that I outlined earlier in this article. However, there are many situations in which the practical implications of source code analysis prevent or limit its applicability when attempting to assess or examine a particular software-based artwork.

The most important of these situations is that source code may simply be inaccessible for the particular work. This may happen for a multitude of reasons: an artist may have lost the code, may be unwilling to share it with an institution, or may never have had access to it if working with external collaborators. In other cases, notions of source code may not line up with the actuality of the artifacts deriving from software production processes: the use of integrated development environments (IDEs), WYSIWIG (what you see is what you get) interfaces, and node-based editors generates other materials—project files, data assets and other materials—that cannot necessarily be analyzed as textual source code. In some cases, source code may be present, but of limited value. There may be an unclear line of provenance between source code and the software binaries acquired or the code may be extremely large and complex, impeding its analysis with available resources. None of this is to say that having access to source code is unhelpful—it is always beneficial to acquire it—but simply that conservators may need alternatives in order to effectively derive useful information about a software-based artwork. This brings us to the set of other reverse-engineering techniques outlined in figure 2—decompilation, binary analysis, and process analysis—which I will discuss in the following sections.

DECOMPILATION AND BINARY ANALYSIS

In the absence of software source code, we might focus instead on how to extract useful information from the executable software itself—be that machine code or some intermediate representation. The techniques I will discuss in this section might be considered static approaches in that they address the code components of the software performance model in a latent, pre-execution form. The first question that we can address is whether it is possible to derive a source code representation from an executable representation. Here we might use a process known as “decompilation,” which, in the strictest use of the term, attempts to transform compiled code back into a source code or, at least, something resembling source code (Geffner 2014).

Using software-based artworks from the Tate collection which have original source code available for comparison, it was possible to make a direct comparison. Decompilation was found to be highly effective in several cases, yielding well-structured code with a close resemblance to the human-authored source code at the functional level. I first compared original and decompiled ActionScript 3 source code associated with an artwork using Flash software. Decompilation in this case was carried out using the JPEXS Flash Decompiler (JPEXS 2016). I then carried out a similar comparison for the artwork Brutalism: Stereo Reality Environment 3 (2007) by Jose Carlos Martinat Mendoza, decompiling a Java binary used in the 2011 realization of the work using JD-GUI (Dupuy 2015). Original source and decompiled outputs for a segment of this code relating to variable declaration are compared in figure 3.

Fig. 3. Comparison of original human-authored source code (top) to decompiled source code (bottom) derived using JD-GUI Java decompiler (Dupuy 2015). The code fragment represents a set of variable declarations and forms part of the code employed in the Flash realization of Brutalismo: Stereo Reality Environment 3 (2007) by Jose Carlos Martinat Mendoza.

As is clear from this comparison, there is a nearly one-for-one matching of functional code lines. However, code comments (in beige text in fig. 3) are completely lost, including some unused code that would be a potentially interesting art historical trace. While this is undoubtedly a significant loss, the decompiled code is still extremely useful in deriving knowledge about the software. ActionScript and Java are both examples of interpreted languages that are transformed not into machine code that a processor can execute directly but rather into an intermediate representation that requires interpretation by additional software before it can be executed. As a result, their executable form remains closer to the original source code (less information is removed than compilation), making it easier to decompile. Where the executable representation is machine code, decompilation becomes much more challenging. Indeed, there is debate within the reverse-engineering community about whether decompiling machine code to source code will ever be possible given the important syntactic information that is lost in the compilation process (Jazdzewski 2014; Eilam 2011). Further difficulties arise when we consider the legal and ethical issues in decompilation, which operates in a legal gray area when applied to proprietary software (Behrens and Levary 1998).

If we cannot derive useful source code from machine code, it is worth considering whether there are other ways to extract useful information from the machine code itself. While machine code is not intended for human reading, it contains a complete logical representation of the same information contained in the source code. As a result, we can use software tools to extract useful information from it. Testing tools on different programs associated with software-based artworks, it is clear that this can be particularly useful in identifying dependencies. For example, working in a Windows environment, we can extract information about Windows Dynamic Link Library (DLL) dependencies from machine code in order to ensure that these are identified. In figure 4, the output of the binary analysis tool CFF Explorer (Pistelli 2012) is shown, demonstrating the extraction of the list of DLL dependencies posed by a Windows Portable Executable program used in the Subtitled Public (2005) installation by Rafael Lozano-Hemmer (b. 1967). A third-party Intel Computer Vision library has been identified and is highlighted in this screenshot, with a set of embedded metadata extracted from the file, including version number and copyright date. In this case, the information derived was useful in carrying out a subsequent hardware migration at the time of this writing.

Fig. 4. Screenshot of CFF Explorer’s dependency analysis component after analysis of a Windows Portable Executable used in Rafael Lozano-Hemmer’s Subtitled Public (2005) installation, revealing a number of Dynamic Link Library (DLL) dependencies.

While there are clearly ways in which static binary analysis can be used by the conservator, it has limitations in that it does not take into account the action that occurs in the process component of the software performance model. As I will go on to demonstrate, there are various situations in which it is important to understand what is happening at this stage in the model in order to reveal the software structures involved in achieving a particular software performance. Static binary analysis is also limited by available analysis tools, which are highly specialized and for detailed insight into machine code must navigate similar problems to those that limit decompilation.

PROCESS ANALYSIS AND INSTRUMENTATION

The software performance model formalizes the idea that software performance emerges from a process—the unfolding execution of the code and related activity. Unfortunately, this process is largely hidden from a user when software is executed unless specific steps are taken to intercept or instrument it. The most obvious way in which this can be achieved is by design. In some cases, programmers and developers will implement instrumentation into the software that they distribute, perhaps to debug and test the software during development and in other instances for the benefit of the end user. There are cases in which this can be observed in software-based artworks in the Tate collection. The software at the heart of the work entitled Sow Farm by John Gerrard (b. 1974), for example, features a hidden debug overlay that can be accessed using a certain keystroke combination. This overlay, pictured in figure 5, provides information about the program’s performance and the parameters of the simulation as they unfold. In other cases, the code may be instrumented in such a way that a specific tool (such as a specialized debugger) can be used to hook into the program as it runs and returns certain information.

Fig. 5. Cropped screenshot of a hidden debug overlay in the software associated with John Gerrard’s Sow Farm (2009), which includes information about rendering performance and the status of the simulation.

Where instrumentation has not been implemented in the software itself, it may be possible to use third-party tools to achieve similar results. Process analysis (also known as “dynamic analysis”) tools can be selected for a particular purpose by identifying the point at which to hook in to the process. Such tools can serve a variety of purposes, but among the most important for the conservator are likely to be tracing tools. These are designed to capture and log information about program events and system interactions as a software program is running in memory. For the conservator, these techniques can be particularly useful in identifying calls to dependencies and other interactions with the software environment as they occur. Intercepting these at the operating system level bypasses the need to understand the machine code itself by observing the effect of the code’s execution rather than its source. For instance, the process requesting read access to a particular file is likely to indicate a dependency on this file. In figure 6 is a screenshot of file system activity captured by the Sysinternals Process Monitor (Russinovich 2017) tool for Windows.

Fig. 6. Screenshot of filesystem activity logged using the Sysinternals Process Monitor tool attached to the sowfarm.exe process.

This kind of approach can produce very large quantities of data to be analyzed, as a similarly large quantity of read/write operations are produced by a running process. While process instrumentation can also be used to trace the program’s execution as it happens—and so reveal the functionality of the program itself—this again requires negotiating machine code (or more commonly, a mnemonic representation of it known as “assembly”language).

The challenge of applying process analysis is often simply one of tool availability. This is an unusual use case in software engineering and development, which can be used to circumvent protections on closed-source, proprietary products. Therefore, their use and development remains on the fringes of computer science, although interest in the analysis of malware in computer security has brought renewed legitimacy to reverse engineering in recent years. Even where tools are available, however, there are inherent limitations to their practical application. As they are operating at a low level among machine code, they can generate very large quantities of data, which require careful and painstaking analysis in order to yield a meaningful interpretation. Where particularly difficult cases are encountered, it may be beneficial for conservators to engage with reverse-engineering specialists—examples of which have already occurred in museum environments (Fino-Radin 2016).

So far, I have focused primarily on what we can learn from code—be it source code, binary code or executing code. When it comes to understanding the parameters of a software performance, however, this alone may be inadequate. Understanding of a particular software performance may also be enhanced by examining the inputs and outputs of the process. This requires the use of instrumentation techniques to intercept data at different places within the software environment. In these situations, data monitoring and logging tools can be used to capture and log data that is sent and received by a software program. Such tools could target a variety of communication protocols. For example, they might be used to monitor network activity (a process known as “packet sniffing”), capture data being sent to a port (for example, to a printer or other hardware device) or intercept frames as they are sent to a graphics card for display. For example, serial data might be captured in order to understand how a software program communicates with an attached device—as shown in figure 7, which illustrates serial data logging used in the analysis of the artwork ‘Astrophotography…The Traditional Measure of Photographic Speed in Astronomy…’ by Siegfried Marx (1987) (2006) by Cerith Wyn Evans (b. 1958).

Fig. 7. Screenshot of serial data logged using RS232 Data Logger (Eltima Software 2018) via a virtual serial port. Data was transmitted from a custom software application to a Morse code unit via an RS-232 serial port as part of the software system employed in the artwork ‘Astrophotography...The Traditional Measure of Photographic Speed in Astronomy...’ by Siegfried Marx (1987) (2006) by Cerith Wyn Evans. — Fig. 7. Screenshot of serial data logged using RS232 Data Logger (Eltima Software 2018) via a virtual serial port. Data was transmitted from a custom software application to a Morse code unit via an RS-232 serial port as part of the software system employed in the artwork ‘Astrophotography…The Traditional Measure of Photographic Speed in Astronomy…’ by Siegfried Marx (1987) (2006) by Cerith Wyn Evans.

In a software-based art conservation context, this kind of information can be particularly useful for identifying the nature of a program’s interaction with an external resource or device, assessing what is being transmitted and at what rate, and defining tests that can be used to verify whether a performance is occurring within acceptable parameters. Understanding these characteristics is often key to pinning down the carefully choreographed behaviors seen in software-based artworks so that they can be preserved as accurately as possible. This involves engaging with what is happening across the system and ensuring that appropriate analytical tools—or instruments—are defined through which to verify them. Rather than precluding it, this kind of approach might be beneficially explored through collaboration with artists when possible. For example, appropriate metrics might be identified through consultation and suitable instrumentation either written into a version of the program or accommodated with agreed upon third-party tools.

ANALYZING JOHN GERRARD’S SOW FARM

In order to illustrate the practical applications of the methods introduced here, I am going to consider their use in the examination and documentation of a particular software-based artwork case study: Sow Farm (near Libbey, Oklahoma) 2009 (Gerrard 2009) by John Gerrard (figure 8).

Fig. 8. John Gerrard, Sow Farm (near Libbey, Oklahoma) 2009, 2009 (T14279), installed at the Tate Britain in 2016. ©John Gerrard and Tate, London 2018.

This work is in a medium that the artist calls “real-time 3D,” technology similar to that used in video games. The work, usually presented as a large-scale projection, depicts an unmanned pig farm in a remote region of the Great Plains in Oklahoma, United States, seen from the perspective of a slowly circling virtual camera. Unfolding over a period of 365 days in real time, the 3D environment is a meticulous recreation of a real-world location, including simulation of day-night cycles, complete with dynamic sun and stars. Once every 156 days, a truck drives up to the buildings and waits for one hour.

Acquired in 2015, this work presented a difficult case study to examine using established methods, as Tate did not have access to source code. In a sense, this work never had source code per se: Sow Farm was developed in a proprietary engine called Quest3D (Act-3D 2012). While some custom plugins and shaders (specialized 3D rendering programs) were developed to achieve certain effects, Quest3D prepackages much of the code beneath WYSIWIG components and node-based editors. The Quest3D engine is also no longer sold or supported by its original developer, Act-3D, having been superseded by newer commercial products. We know from interviews with Gerrard that he feels that the specific rendering characteristics of the works, as a product of the technology of their time, should be maintained (Gerrard, pers. commun.)—which would include the particular characteristics of the Quest3D engine. Given all of these factors, migrating the work to a new 3D engine in the future would seem to be an unlikely scenario. Instead, it seems more appropriate to focus on how an appropriate technical environment might be maintained while leaving the original software as is in order to achieve future software performance, thus, drawing the focus of long-term strategy to emulation and virtualization. Research into the virtualization of Sow Farm at the Tate has been discussed elsewhere by other authors (Falcão and Dekker 2015). Here I will focus on my own strand of research into deriving critical information to support effective application of such strategies in the absence of source code.

Without source material availability, I considered how we might use reverse-engineering tools to address this gap. The first port of call was to consider whether any useful code could be derived from the executable software. In this case, this is a Windows Portable Executable file containing a machine code representation of the code and is therefore very difficult to decompile. Quest3D applications are written in C and C++, and the code is likely to have been heavily altered when compiled to remove information that could aid a decompiler in making sense of it. Decompiling would also be legally questionable, as Quest3D is licensed software owned by a third party. More importantly, this would also be an ethically dubious approach given that the artist preferred not to share the source materials at this point in the work’s history.

The next step was to consider alternative approaches, which would focus on the analysis of the compiled software binaries. Using a static binary analysis tool, it should be possible to pull out a list of the DLL dependencies, which is one important piece of knowledge in documenting the technical environment constituents. However, when analyzed with the tool CFF Explorer (Pistelli 2012), as shown in figure 9, the listed dependencies show only a small set of core Windows libraries. With no evidence of any DirectX libraries in this list—linking with which would be required for this software to utilize the Windows graphics pipeline—it was clear that there was something wrong with the way that the analysis was being addressed. Perhaps linking with certain dependencies was occurring only when the software was actually executed.

Fig. 9. List of DLL dependencies extracted from the Sow Farm 32-bit Windows Portable Executable software by CFF Explorer.

To test this theory, process analysis was applied. In this case I used a debugging tool called x64bg (x64dbg 2018), which is popular in the reverse-engineering community and has an array of other uses beyond dependency identification. When attached to the sowfarm.exe process, the tool’s logs revealed that files packaged in the executable were being extracted by the process to a temporary directory. The presence of additional executable code here explains why a static binary analysis approach was not identifying dependencies when applied to the original binary. It is useful not only to identify this behavior as something to consider in future migrations to new platforms but also to capture the extracted data as an artifact worthy of preservation itself. Again, using x64dbg, it was possible to consult the list of libraries loaded by the process and thus identify anything unusual or nonstandard. In this case, this included a set of specific DirectX and Microsoft Visual C++ helper libraries (an example of which is highlighted in fig. 10) and a third-party library called Phidgets, which relates to a motion sensor used in certain versions of the work.

Fig. 10. Cropped screenshot showing partial list of libraries loaded by sowfarm.exe process in x64dbg, with an example DirectX library highlighted.

Although the large quantity of data generated in this analysis requires a certain depth of knowledge of the operating system to make sense of it, it can be usefully paired with other workflows. The process of building a backup machine (physical or virtual), for instance, ensures that knowledge derived from analysis can be iteratively tested during reconstruction of the technical environment, allowing for a recipe for an appropriate execution environment to be gathered and verified.

Instrumentation was also found to be useful in understanding the acceptable parameters of the Sow Farm software performance and how they might be verified in future realizations. These issues came to the fore again during the virtualization of Sow Farm as part of a research project at the Tate (Falcão and Dekker 2015). When running the software in a virtual environment, it was clear on examination of screen outputs that there was a small but noticeable variability in the smoothness of the virtual camera’s movement through the environment. With a technical knowledge of rendering, we understand this as relating to the rate at which frames (i.e., single raster images) are being generated by the system as part of the sequence of actions that constitute the performance. This output can be intercepted as frames are passed to the graphics card for display, allowing for the measurement of frame rate (frames per second [FPS]). Instrumentation integrated in the Sow Farm software itself includes an on-screen frame rate counter. However, in this case, this was not helpful: rates displayed by the debugging screen appeared to remain consistently above the 60 FPS minimum required for display with little variance.

When using the third-party tool called RivaTuner Statistic Server (Hagedoorn 2017) to log frame rate, a similar pattern was observed. However, by logging an alternative metric using this tool, maximum frame time, the previously hard to quantify smoothness issues were revealed. Frame time is the measurement of the time between frames—for a second in which 60 frames have been generated, this measure tells you what the time interval is between each of those 60 frames. Generally, higher frame time values result in a lower overall frame rate, but variance in frame time can create visible problems that are not reflected in the frame rate itself. In figure 11, the maximum frame time values recorded in each second of runtime are plotted for executions of the software in native and virtualized Windows 7 environments.

Fig. 11. Line graph plotting frame time values (ms) over a 120-second period for the Sow Farm software, running in a VMware virtual machine (blue) and natively on the host machine (red).

One clarification to make regarding the data is that while native frame time values do not reflect large spikes in frame time, they are on average higher than the virtualized version. This is likely a result of certain resource-intensive postprocessing effects not being usable within the virtual environment.

With this important performance metric now better understood through the consideration of appropriate instrumentation, it can be formalized as part of the documentation of the work: in addition to achieving a minimum of 60 FPS, a software performance must show a variance in frame time of no more than 2 ms. In arriving at this point, it has been necessary to carefully consider the connection between the software performance and its underlying technical basis. By applying analysis with the layered materiality of software in mind, dependency and variability in the software performance can be better understood and documented by conservators caring for software-based artworks.

CONCLUSION

In this paper, I have introduced and demonstrated a set of analysis approaches that can help us better understand software-based artworks even in situations in which we do not have access to their source code. Ultimately, these approaches are in service of conservation planning and documentation, and might support:

Identification of the boundaries of software objects and derivation of a “recipe” for reconstructing a suitable technical environment for their execution.
Derivation of suitable metadata for component description in information systems.
Identification of key characteristics of a software performance (e.g., speed, quality, stability), including the instrumentation techniques through which they can be verified.

When combined with the acquisition of disk images taken from computer systems, this information can also support long-term preservation strategies, such as emulation (Rechert, Falcão, and Ensom 2016). There are, of course, notable limitations to the techniques I have introduced in comparison with source code analysis. Where decompilation is impossible, it is very difficult to derive information about program functionality without painstakingly reverse engineering the code—a process likely to be time-consuming and costly. This would make the process of migrating a program to a new programming language or platform particularly challenging. Therefore, acquiring source code (or, rather, source materials more broadly) should remain a high priority for those acquiring software-based artworks. A further limitation is that applying the approaches described here is often contingent on available tools. These are highly specialized tools, the development of which occurs on the edge of mainstream computer science owing to a lack of commercial interest (and even hostility) toward reverse engineering. The time-based media conservation and digital preservation communities may benefit from further engagement with those communities more openly engaged in their use, such as computer security and video game modification.

Despite inherent limitations, I propose that the practical methods and overarching theoretical framework described in this paper are worthy of a place in the growing toolbox of the conservator of software-based art. Their potential relevance to the conservator might be seen as having parallels with their relevance to the systems administrator (or sysadmin). Rather like systems administration, caring for software-based art is an activity that will continue to be concerned (among other things) with the configuration, deployment, and maintenance of software systems through time—with, of course, the nontrivial added complexity of obsolescence. While the skills of the programmer have an increasingly important role in the conservation of software-based art, the skills of the sysadmin in understanding the larger system and its hidden structures and processes will also be essential. These can help the conservator address the layered materiality of software, connect it with the characteristics of a software performance, and ensure that this performance carries with it the identity of the artwork.

ACKNOWLEDGMENTS

This research was conducted as part of a PhD project funded by the United Kingdom’s Arts and Humanities Research Council as part of their Collaborative Doctoral Partnership scheme. Thanks to my supervisors Mark Hedges, Pip Laurenson, and Patricia Falcão for guiding my research over the past four years. Thanks also to the Samuel H. Kress Foundation, administered by the Foundation of the American Institute for Conservation, and the AIC Electronic Media Group, for funding my attendance at the AIC Annual Meeting 2018 in Houston, Texas, where this paper was first presented.

REFERENCES

Act-3D. 2012. “Quest3D.” https://web.archive.org/web/20170822144850/http://www.quest3d.com/ (accessed 01/30/18).

Behrens, B. C., and R. R. Levary. 1998. “Practical legal aspects of software reverse engineering.” Communications of the ACM 41 (2): 27–29.

Chikofsky, E. J., and J. H. Cross. 1990. “Reverse engineering and design recovery: a taxonomy.” IEEE Software 7 (1): 13–17. https://doi.org/10.1109/52.43044

Das, S., W. G. Lutters, and C. B. Seaman. 2007. “Understanding documentation value in software maintenance.” In Proceedings of the 2007 Symposium on Computer Human Interaction for the Management of Information Technology. ACM. Massachusetts: Cambridge, 2.

de Souza, S. C. B., N. Anquetil, and K. M. de Oliveira. 2006. “Which documentation for software maintenance?” Journal of the Brazilian Computer Society 12 (3): 31–44. https://doi.org/10.1007/BF03194494

Dover, C. 2016. “How the Guggenheim and NYU are conserving computer-based art.” Guggenheim blog. November 4, 2016. www.guggenheim.org/blogs/checklist/how-the-guggenheim-and-nyu-are-conserving-computer-based-art-part-1 (accessed 06/27/17).

Eilam, E. 2011. Reversing: secrets of reverse engineering. New York: John Wiley & Sons.

Eltima Software. 2018. RS232 Data Logger (version 7.0.342). www.virtual-serial-port.org/products/rs232-data-logger/ (accessed 09/2/18).

Engel, D., and G. Wharton. 2014. “Reading between the lines: source code documentation as a conservation strategy for software-based art.” Studies in Conservation 59 (6): 404–15. https://doi.org/10.1179/2047058413Y.0000000115

Engel, D., and G. Wharton. 2015. “Source code analysis as technical art history.” Journal of the American Institute for Conservation 54 (2): 91–101.

Falcão, P., and A. Dekker. 2015. Virtualizing John Gerrard’s “Sow Farm” (2009), or Not? Presented at the TechFocus III: Caring for Software-based Art, Guggenheim, New York, September 25, 2015. https://vimeo.com/147884591 (accessed 07/16/18).

Fino-Radin, B. 2016. “Art In the Age of Obsolescence: Rescuing an Artwork from Crumbling Technologies.” MoMA Blog. December 21, 2016. https://stories.moma.org/art-in-the-age-of-obsolescence-1272f1b9b92e (accessed 06/27/17).

Geffner, J. 2014. “What’s the difference between a disassembler, debugger and decompiler?” Reverse Engineering Stack Exchange. August 6, 2014. http://reverseengineering.stackexchange.com/questions/4635/whats-the-difference-between-a-disassembler-debugger-and-decompiler (accessed 10/4/16).

Gerrard, J. 2009. Sow Farm (near Libbey, Oklahoma) 2009. T14279. Tate.

Hagedoorn, H. 2017. RivaTuner Statistics Server (version 7.0.0). Guru3D. www.guru3d.com/files-details/rtss-rivatuner-statistics-server-download.html (accessed 08/3/18).

Heslop, H., S. Davis, and A. Wilson. 2002. “An Approach to the preservation of digital records. National Archives of Australia Canberra.”

www.ltu.se/cms_fs/1.83844!/file/An_approach_Preservation_dig_records.pdf (accessed 06/7/17).

Jazdzewski, C. 2014. “Why can’t native machine code be easily decompiled?” Software Engineering Stack Exchange. February 21, 2014.

https://softwareengineering.stackexchange.com/questions/229761/why-cant-native-machine-code-be-easily-decompiled (accessed 07/30/18).

JPEXS. 2016. JPEXS Free Flash Decompiler (version 9.0.0). Windows. JPEXS. www.free-decompiler.com/flash/ (accessed 10/4/18).

Laurenson, P. 2006. “Authenticity, change and loss in the conservation of time-based media installations.” Tate Papers 6. www.tate.org.uk/research/publications/tate-papers/authenticity-change-and-loss-conservation-time-based-media (accessed 04/21/15).

Lozano-Hemmer, R. 2005. Subtitled Public. T12565. Tate.

Phillips, J., D. Engel, E. Dickson, and J. Farbowitz. 2017. Restoring Brandon, Shu Lea Cheang’s early web artwork. Guggenheim blog. May 16, 2017. www.guggenheim.org/blogs/checklist/restoring-brandon-shu-lea-cheangs-early-web-artwork (accessed 09/02/18).

Pistelli, D. 2012. Explorer Suite (version 3). NTCore. www.ntcore.com/exsuite.php (accessed 02/14/18).

Rechert, K., P. Falcão, and T. Ensom. 2016. “Introduction to an emulation-based preservation strategy for software-based artworks” (white paper). www.tate.org.uk/research/publications/emulation-based-preservation-strategy-for-software-based-artworks (accessed 03/23/17).

Russinovich, M. 2018. Process Monitor (version 3.50). Windows. Microsoft. https://docs.microsoft.com/en-us/sysinternals/downloads/procmon (accessed 12/3/18).

Singer, J. 1998. “Practices of software maintenance.” In Proceedings of the International Conference on Software Maintenance, 1998, 139–145. IEEE. http://ieeexplore.ieee.org/abstract/document/738502/ (accessed 09/2/18).

Thibodeau, K. 2002. “Overview of technological approaches to digital preservation and challenges in coming years.” Council on Library and Information Resources. https://web.archive.org/web/20160520092136/http://www.clir.org:80/pubs/reports/pub107/thibodeau.html (accessed 11/11/14).

Wyn Evans, C. 2006. ‘Astrophotography…The Traditional Measure of Photographic Speed in Astronomy…’ by Siegfried Marx (1987). T13645. Tate.

x64dbg (version snapshot_2018-04-05_00-33). 2018. https://x64dbg.com/ (accessed 09/2/18).

Tom Ensom
Digital Conservator
Department of Digital Humanities, King’s College London
Time-based Media Conservation, Tate
tomensom@gmail.com
http://tomensom.com/