Nearly all of the practical implementations of artificial worlds to be described are related in some way to the seminal work of John von Neumann. In the late 1940s and early 1950s, he devoted considerable time to the question of how complicated machines could evolve from simple machines.3.9 Specifically, he wished to develop a formal description of a system that could support self-reproducing machines which were robust in the sense that they could withstand some types of mutation and pass these mutations on to their offspring. Such machines could therefore participate in a process of evolution.
Inspired by Alan Turing's earlier work on universal computing machines [Turing 36], von Neumann devised an architecture which could fulfil these requirements. The machine he envisaged was composed of three subcomponents [von Neumann 66]:
A + φ[X] ⇒ X | (3.1) |
(where + indicates a single machine composed of the components to the left and right suitably arranged, and ⇒ indicates a process of construction.)
B + φ[X] ⇒ φ[X] | (3.2) |
A + B + C + φ[X] ⇒ X + φ[X] | (3.3) |
Now, if we choose X to be (A + B + C), then the end
result is:
A + B + C + φ[A + B + C] ⇒ A + B + C + φ[A + B +C] | (3.4) |
This complete machine plus tape,
A + B + C + φ[A + B + C], is
therefore self-reproducing. From the point of view of the evolvability of this
architecture, the crucial feature is that we can add the description of an arbitrary
additional automaton D to the input tape. This gives us:
A + B + C + φ[A + B + C + D] ⇒ A + B + C + D + φ[A + B + C + D] | (3.5) |
Furthermore, notice that if the input tape
φ[A + B + C + D]
is mutated in such a way
that the description of automaton D is changed, but that of A, B and C
are unaffected--that is, the mutated tape is
φ[A + B + C + D']
--then the result
of the construction will be:
A + B + C + D + φ[A + B + C + D] ⇒mutation A + B + C + D' + φ[A + B + C + D'] | (3.6) |
The reproductive capability of the architecture is therefore robust to some mutations (specifically, those mutations which only affect the description of D), so the machines are able to evolve. Von Neumann pointed out that it was the action of the general copying automaton, B, which gave his architecture the capacity for evolving machines of increased complexity, because B is able to copy the description of any machine, no matter how complicated [von Neumann 66] (p.121). This ability is clearly demonstrated in Equation 3.5 above.
The original implementation envisaged by von Neumann was a constructive system, which Burks has referred to both as the `robot model' and as the `kinematic model' [Aspray & Burks 87] (p.374). However, von Neumann decided that the system was too complicated to capture in a set of rules that were both simple and enlightening, so he turned his attention to developing the cellular automata (CA) framework with Stanislaw Ulam. Von Neumann described the detailed design of a self-reproducing machine in a cellular automata space, according to the architecture described above.3.10 Recently, a slightly modified and simplified version of this design was successfully implemented on a computer [Pesavento 95]. One of the major achievements of von Neumann's work was to clarify the logical relationship between description (the instruction tape, or genotype), and construction (the execution of the instructions to eventually build a new individual, or phenotype) in self-replicating systems. However, as already mentioned and as emphasised recently by Barry McMullin (e.g. [McMullin 92a]), his work was always within the context of self-replicating systems which would also possess great evolutionary potential.
Von Neumann concentrated on the logic required for a self-replicating machine to be able to evolve increased complication. He therefore did not specifically deal with various biological concerns, most notably concerns of energy. In fact, von Neumann was well aware of these concerns and warned that by restricting attention to the logic of self-reproduction, ``one has thrown half the problem out the window, and it may be the more important half'' [von Neumann 66].3.11 The other half of the problem to which he is referring is that of the molecular phenotype [Pattee 95a]. Burks says of the kinematic model that ``von Neumann intended to disregard the fuel and energy problem in his first design attempt. He planned to consider it later, perhaps by introducing a battery as an additional elementary part'' [Aspray & Burks 87] (p.485). Another major difference between biological organisms and von Neumann's self-replicators is the capacity of the former, but not the latter, for self-maintenance in the face of environmental perturbations. Alvy Ray Smith has pointed out [Smith 91] that some of the CA-based self-reproduction models developed by von Neumann and more recently by others are very non-biological in other ways as well; for example, reproduction in CA models does not occur by development from an `egg', most models suffer from `overcrowding' such that an individual self-replicator can only reproduce once (or a small number of times) before it runs out of space in which to place its offspring, etc.
A good review of subsequent work on self-reproduction in CAs is presented in [Sipper 98]. Much of this more recent work (e.g. [Langton 84], [Ibáńez et al. 95], [Tempesti 95], [Perrier et al. 96], [Morita & Imai 97]) has the kind of `non-biological' character mentioned above. In fact, as pointed out by Barry McMullin, much of this work does not even seem to share von Neumann's concern with the evolution of increased complication, but addresses the `problem' of self-reproduction in and of itself as if this was all that von Neumann was concerned with [McMullin 92a]. For example, Chris Langton described a much simpler self-replicator in a CA, by dropping von Neumann's requirement for universal construction [Langton 84]. Langton, like von Neumann, asserts that the self-replicators must treat their stored information in two different manners: ``interpreted, as instructions to be executed (translation), and uninterpreted, as data to be copied (transcription)'' (ibid. p.137, original emphasis), but he only insists on these requirements in order to avoid the problem of `trivial' self-reproduction3.12 rather than from any concerns relating to evolvability. The resulting automaton, referred to as `Langton's loop', is very fragile in that it cannot in general withstand perturbations and mutations, and is certainly not capable of heritable viable mutation. McMullin describes Langton's work as a ``cruel (though of course unintentional) parody'' of von Neumann's [McMullin 92a] (p.181). This `myth' that von Neumann was concerned with self-reproduction in and of itself has by no means been dispelled even now; for example, an introduction to a paper at a recent European Conference on Artificial Life reads ``The first steps in [developing a universal theory of evolution] were the cellular automata of von Neumann, who only studied self-replication. Among other [sic], Langton and Sipper also addressed the question of evolution.'' [de Dinechin 97] (p.560, emphasis added).
As already mentioned, these studies do not generally consider the ability of the automaton to actively maintain its own structure in the face of environmental perturbations. This deficiency has certainly been recognised for a long time (e.g. [Arbib 69] and, more recently, [McMullin 92a]), but very little work has so far been done to create more robust, self-maintaining, self-reproducing CAs (but see Section 3.2.3). Indeed, to anticipate the discussion presented in Chapter 7, it may be that artificial life platforms will have to model the material grounding of self-reproducing entities in some way, rather than being purely logical models, if strong selection pressure of self-maintenance is to exist. Some extensions to the cellular automata framework in this direction have been reported, such as the brief description by Myhill of a model in which machine components moved discretely in a two-dimensional environment [Myhill 64]. Von Neumann's original `kinematic model' also falls into this category [Aspray & Burks 87] (p.374). However, the majority of work still models purely logical self-reproduction, where the reproducing entities are configurations of states with no material grounding; in such systems, no collection of `raw materials' is required for reproduction, and from this it follows that there is no competition for raw materials between individual reproducers. It is likely that only when such considerations are included in these models can we expect there to be selection pressure for self-reproducers with the ability of self-maintenance, potentially leading to the evolution of autopoietic organisms.
A short time after von Neumann's death in 1957, Lionel Penrose developed a series of mechanical models of self-reproduction, which bear some similarity to von Neumann's kinematic model [Penrose 59]. Around the same time, a couple of other simple physical models of self-reproduction were also reported ([Jacobson 58], [Morowitz 59]), but Penrose provided the most discussion of his work. Furthermore, he also considered a self-reproduction scheme which was more like the relatively simple template-reproduction mechanism of the nucleic acids (DNA and RNA) than von Neumann's full self-reproduction architecture [Penrose 62]. Penrose mentions a number of deficiencies with this model (e.g. it was hard to get fully-formed copies of molecules to unbind at the right time, and there was no consideration of energy requirements), but was able to draw a number of conclusions about the properties required of a molecule which can form precisely self-replicated chains. He identified three necessary properties: (1) the molecules must have good alignment properties; (2) there must be a mechanism for strong permanent bonding down the chain (in order to form the copy); and (3) there must be a mechanism for cross-linkage between the original and the replica (which appeared to Penrose to be necessary to ensure stability). Although Penrose's approach was not particularly rigorous, his conclusions remain one of the few contributions to the `logic' of template-reproduction to this day. This topic will be discussed further in Chapter 7.
The only other significant contribution to the subject, to my knowledge, is the formal system described by Douglas Hofstadter in his book Gödel, Escher, Bach. Hofstadter called the system `Typogenetics', and explains that he ``tried to capture some ideas of molecular genetics in a typographical system'' ([Hofstadter 79] p.504). In this system, a strand of letters encodes a sequence of operations (such as copying, cutting etc.) which are to be performed upon the strand itself. Typogenetics was later developed and analysed by Harold Morris in his PhD dissertation [Morris 88], in which he showed that it was possible to produce several types of strand which were capable of self-reproduction. Louis Varetto has also reported work with self-reproducing strands in Typogenetics [Varetto 93]. In the state in which the system is presented, each strand can only inspect itself (i.e. there is no notion of a strand competing for resources with other strands in an environment), and the evolutionary potential of the self-reproducing strand is also unclear. These are both significant deficiencies from the point of view of using the system as a model of biological evolution (Hofstadter again was more interested in the process of self-reproduction in its own right). Modifications could conceivably be made to correct these--indeed, some were suggested by Varetto (ibid.)--but I am unaware of any implementations of such modifications.
The reproduction of structures in Penrose's and Hofstadter's models can be seen as the result of a self-inspection process; atomic subcomponents of the structure are inspected and copied, and these copies are then joined together to create a replica of the whole structure. More elaborate schemes for reproduction by self-inspection have been described in [Laing 77] and [Ibáńez et al. 95]. The relationship between these architectures and that proposed by von Neumann will be discussed in some detail in Section 7.2.3.
The next significantly different, substantial implementation of a system supporting self-reproducing components was reported by Steen Rasmussen and colleagues in the early 1990s.3.13 They conducted a series of experiments using a general approach they named `Coreworld'3.14 (e.g. [Rasmussen et al. 90], [Rasmussen et al. 91]), inspired by an earlier computer game developed by A.K. Dewdney called `Core Wars' [Dewdney 84].3.15 In this approach, a one-dimensional address space is initialised with assembler-level instructions taken from a set of ten (or fewer). A number of execution pointers are distributed through the address space, determining which instructions get executed at any given time. Each address also has a certain `resource' concentration associated with it, and execution of instructions is also conditional upon sufficient resources being available in the local neighbourhood. Noise is introduced into the system by new instruction pointers being introduced at random with low probability, and also by instructions being `mutated' into randomly-chosen different instructions in certain circumstances. Although this work was concerned primarily with the emergence of self-organisation and cooperative structures (described in Section 3.2.3), experiments were conducted in which the memory was initialised by inserting a single copy of a hand-written self-reproducing program, and letting it reproduce. However, it was found that the self-reproducers soon started to malfunction and die out due to the noise in the system and to copying themselves on top of each other (but Rasmussen and colleagues still found this a useful method by which to create a biased initial distribution of instructions).
A major advance in the use of computer platforms such as Rasmussen's Coreworld to study the evolution of self-reproducing individuals was made by biologist Tom Ray with his `Tierra' system ([Ray 91], [Ray 94b]). In his work, Ray places central importance upon the notion of self-replication:
``Self-replication is critical to synthetic life because without it, the mechanisms of selection must also be pre-determined by the simulator. Such artificial selection can never be as creative as natural selection. The organisms are not free to invent their own fitness functions. Freely evolving creatures will discover means of mutual exploitation and associated implicit fitness functions that we would never think of. Simulations constrained to evolve with pre-defined genes, alleles, and fitness functions are dead-ended, not alive.'' [Ray 91] (p.372).
Ray's major development was to design an instruction set for his self-replicators which was robust to mutations and therefore evolvable, while at the same time retaining the property of computational universality.3.16 The basic idea is that a virtual operating system is provided, complete with this robust and simple machine language, together with an address space (memory) of fixed size. The robustness of the language is achieved chiefly through the use of relative or template-driven3.17 addressing in branching instructions (rather than absolute addressing), and by avoiding the use of explicit memory addresses as operands to instructions. In such an environment, the proportion of functional programs in the space of all possible programs is greatly increased, and the chances of mutating a functional program and coming up with another program that also works are much higher.
An evolutionary run commences with the introduction of an ancestor program into the otherwise empty memory. The ancestor is a hand-written self-replicator which produces another copy of itself in the computer's memory when it is run. At each iteration of the system, each program in the computer's memory is allowed to execute a certain number of instructions. By only running a limited number of instructions from each program at each time, the system does not run into the halting problem,3.18 which is a potential problem for other kinds of program evolution system. A small element of stochastic behaviour is associated with the execution of the machine instructions, e.g. an ADD instruction which usually adds one to its operand may occasionally add zero or two instead, or a COPY instruction may sometimes mutate a byte of the data it is copying. The programs are also subject to point random mutations at a given low rate. In either of these ways, as a run proceeds variations of the ancestor program begin to appear. If a variation retains the ability to produce a copy of itself, then it too may be retained in the population of programs over a number of generations. As the available memory begins to fill, a `reaper' operation is performed to kill off a number of the programs. Programs which perform operations which cause their flag to be set3.19 are killed off quicker than others (by being advanced up the `reaper queue'), but otherwise the order in which programs are killed off is largely determined by their age.
A number of interesting results have been obtained from such evolutionary runs, for example the appearance of ``parasites''--short pieces of code which run another program's copying procedure in order to copy themselves. Hyper-parasites (parasites of parasites) have also been observed, along with a number of other interesting ecological phenomena [Ray 91].
It may be true to say that most of the interesting behaviour that has been seen to evolve has done so because facilities for these behaviours were engineered into the original language specification. For example, the fact that parasites emerge is a little less surprising when the mechanisms of template-driven branching are considered. I will return to this issue in Chapter 7. However, what these studies have demonstrated is that it is possible to build a robust operating system in which non-trivial self-replicating computer code can evolve.
The original goal of Tierra was to ``parallel the second major event in the history of life, the origin of diversity'' [Ray 91] (p.373). The event being referred to is known as the Cambrian explosion, and marks the first and sudden appearance in the fossil record, some 550 million years ago, of a great diversity of macroscopic multicellular organisms. Ray explains that ``rather than attempting to create prebiotic conditions from which life may emerge, this approach involves engineering over the early history of life to design complex evolvable organisms, and then attempting to create the conditions that will set off a spontaneous evolutionary process of increasing diversity and complexity of organisms'' (ibid.).
Ray admits that his early experiments with Tierra didn't actually achieve this goal, and suggests that the system is better viewed as ``an artificial world which may roughly parallel the RNA world of self-replicated molecules (still falling far short of the Cambrian explosion)'' (ibid.).
In analysing the broad patterns that emerge from evolutionary runs on Tierra, Ray notes that the observed evolutionary innovations fall into two broad categories: `ecological solutions' and `optimisations' [Ray 97]. The former include adaptations involving other programs in the environment (e.g. parasitism, immunity, etc.), and the latter are improvements within an individual organism to make the reproduction algorithm more efficient. Some experiments in which individual programs are allowed to create parallel processes have been reported ([Thearling & Ray 94], [Thearling 94], [Thearling & Ray 96]), and in these cases optimisations include the use of parallelisation to increase the speed of reproduction (but only when the system was inoculated with a two-process parallel ancestor). Parallelisation is achieved by each process owning its own instruction pointer, but these all refer to a single copy of the program's code. Speed-up is achieved by each parallel process copying a different section of this code to create an offspring.
More recently, Ray and his colleagues have been striving towards evolving differentiated parallel programs, where some of the processes may be performing other tasks (e.g. sensory), rather than all being dedicated to reproduction. These attempts have so far been largely unsuccessful, although the most recent milestone has been to create an environment in which a hand-written differentiated ancestor (comprising 10 parallel processes, two of which are devoted to reproduction, and the remaining eight to sensing the environment) is at least maintained over prolonged periods of evolution, rather than being degraded into a purely reproductive organism [Ray & Hart 98]. However, this partial success has only been achieved by adding some high-level operations to the system, such as insertion, deletion and crossover operators, and new forms of mutation. Ray writes that the inclusion of these operators represents ``a change in the philosophy towards this kind of genetic operation. These operations are being imposed by the Tierra operating system, and are not under the control of the creatures themselves. The reason for the shift is that the primary goal and focus of the work is now the attainment of complexity increase. This is a very big, and probably very difficult objective. For this reason I am willing to try anything that might aid in achieving the goal. Among the rewards of achieving the goal, should be creatures that do many more things for themselves, which should make the imposed genetic operators seem to be a fairly trivial sin.''3.20
Comparative studies using different instruction sets have also been reported [Ray 94a]. The results suggest that the evolvability of the system is very sensitive to the underlying genetic system (i.e. small changes to the instruction set lead to very different results).
A number of similar systems have been produced by other researchers. These include `Avida', developed by Chris Adami and Titus Brown [Adami & Brown 94], and `Computer Zoo', written by Jakob Skipper [Skipper 92]. Programs in both of these systems live in a two-dimensional environment. This notion was introduced mainly to provide a form of genetic isolation (to promote heterogeneity), and also to facilitate efficient implementation on parallel hardware. Although the details of these systems differ somewhat, the general mode of operation is the same. Some attempts have also been made to extend the representation of individual programs into two dimensions (i.e. each instruction of a program is physically located at a unique position on a 2D grid), but these have generally proved too brittle to support prolonged evolution (e.g. [Davidge 92], [Maley 93], [de Dinechin 97]).
John Koza has proposed a system of self-replicating programs using a slightly different approach [Koza 94]. Koza's programs are boolean trees whose nodes may be standard logical functions (specifically, AND, OR and NOT) or a number of extra functions which not only produce a boolean result but also have side effects which act upon the program itself or upon other programs in the computer's memory. For example, there is an ADD function which will search the rest of the memory for a program matching a given template, and, if one is found, will have the side effect of substituting this new program in the position where the ADD instruction resided in the original program.
One result of this approach of using boolean functions with side effects is that a program may perform a boolean calculation as well as having the side effect of producing another copy of itself somewhere else in memory. Koza encouraged his programs to perform specific boolean calculations by imposing a fitness function on the system--a program has a good chance of proceeding to the next generation only if it is able to self-replicate and performs well on the target boolean function.
In this respect, Koza's system is very similar to Avida, where programs can gain more CPU-time by successfully completing user-specified tasks. In this important sense, Avida and Koza's system differ from Tierra and the other related platforms; they have an externally specified function (or set of functions) which the programs are encouraged to perform as well as having to be able to reproduce. In this way, there are effectively two fitness measures, the former being completely unrelated to a program's ability to self-replicate.
Although this technique could conceivably be used to evolve programs which perform useful tasks (although it has not been demonstrated that it can scale to larger problems), it restricts the capacity of the system for open-ended evolution. By concentrating on applying selection pressure through the abiotic environment (i.e. by the externally defined functions), the programs are all evolving towards solving the same static problem rather than facing an indefinite variety of environments. Although biological organisms are themselves constrained by their abiotic environment, it is widely agreed (see Chapter 2) that the selection pressure for the most impressive forms of open-ended evolution (such as evolutionary arms races, symbiogenesis, and sexual selection) comes from the ecological relationships between organisms (i.e. the biotic environment). It is likely that only those systems in which organisms are able to interact with each other, and in so doing to directly affect each other's chances of survival, will display truly open-ended evolution. This issue will be discussed further in Chapter 7.