Title: The Exchange - Genetically Evolved Market Agents
Author: David J. Stein, Esq.
Version: 1.0
Written: Fall, 2004


Abstract: This application comprises a market simulation game populated by artificially intelligent trading agents. Each agent is controlled by a virtual machine that executes a simple program based on a customized machine language. The programs may be randomly generated, or may be hand-programmed by the user via an embedded assembler. Alternatively, the program can generate skilled agents automatically via a competitive genetic algorithm, which generates a number of agent programs, competes them for a number of simulations, chooses the most successful agent programs, and randomly crosses over their programs to create new agent programs. The application may also run an instance of the market-trading game, populated by agents running specific programs. As the game progresses, the application provides a realtime graphical display of market conditions and each player's resources, as well as the current events affecting the market. The user may also directly participate as a player in the game.

Installation Instructions: Nothing unusual - simply run the enclosed executable. Note that the application depends on some resources, including XML schema definition (.XSD) files, which it expects to find in the "4 Resources" folder stored with the executable.

Operation Instructions:

Overview:

            This application comprises a market simulation called The Exchange that serves as the environment for artificial intelligence agents created through a genetic programming model.

Market Simulation:

            Overview: The Exchange is a competitive game that can be played against any number of artificial-intelligence-based competitors. Each player is given an amount of credits that may be used to purchase quantities of any of the nine goods on the market at their current prices, which fluctuate in response to various conditions. The goods can also be sold back to the market, and each player may manufacture a small amount of goods (thereby increasing the total stock of goods in the market over the course of the simulation.) The simulation runs continuously for ten minutes (ten “phases”), and every second new prices are computed and player rankings are calculated, based on total player value (price x quantity held for each good + total credits.) The players compete to have the most value at the end of the game.

            Goods: Each of the nine goods traded on the Exchange has specific qualities:

·         Solar power is a commodity good that usually trades at 20-30 credits but at higher rates if the market supply drops too low. Solar power is needed for all manufacturing capabilities (see below), but can be collected for free.

·         Ore is a commodity good that usually trades at 25-35 credits but at higher rates if the market supply drops too low. Ore is used to manufacture galactite – one unit of ore must be consumed (in addition to one unit of power) for every unit of galactite created.

·         Sludge is a commodity good that usually trades at 30-40 credits but at higher rates if the market supply drops too low. Sludge is used to manufacture hydradine – one unit of sludge must be consumed (in addition to one unit of power) for every unit of hydradine created.

·         Galactite is a refined good made from ore. Its price is cyclic, with dynamic amplitude (between 50 and 180) and frequency.

·         Hydradine is a refined good made from sludge. Its price is random with momentum, fluctuating between 40 and 160 credits, and it tends to vary in broad but somewhat unpredictable trends.

·         Spice is a good with a cyclic price, having dynamic amplitude (between 20 and 140 credits) and frequency.

·         Shells are goods with a price that fluctuates randomly but with momentum. The range tends to vary more quickly than for hydradine, and is limited to the range of 20 to 140 credits.

·         Crystals are a bubble-market good. The price typically grows at a steady rate, but is prone to unpredictable crashes (more likely when the price is very high.)

·         Rum is a good with a random-walk price – it changes randomly, with no momentum, within the range of 25-160 credits.

Production: The market starts with a small quantity of each good, and each second, additional goods are manufactured. As noted above, power can always be collected, but the market can’t manufacture other goods unless the necessary resources are available (power, ore, and/or sludge.) If such resources are not available, those goods simply won’t be available, thus leading to shortages.

Players also produce goods. Once the player selects a good to produce, the player will steadily produce and accumulate units of that good.  If the player selects a good that requires other resources (like power), those other resources will automatically be consumed from the player’s holdings – if the resources aren’t available, then the player produces nothing.

Events: Random events occur that modify the market environment. Prices may temporarily double, the market may gain or lose a large stock of goods, etc. News of these events will be displayed at the bottom of the game screen. Events typically last for one phase (one minute) before ending.

Usage: Of course, a human player can play through a full run of the market simulation, either alone or against computer-controlled Exchange agents. First, select “Play Simulation” from the main menu dialog. When the main Exchange simulation window appears, press the Start button at the upper-left-hand corner. A new game menu appears, where you can enter your name and select computer opponents (either created from scratch or previously saved to disk.) When you press OK, the game will begin. The chart shows the prices of goods as they change over time, while the bar chart at the bottom show quantities held by the store and owned by you. Use the controls on the left to buy or sell goods (in 1- or 10-unit increments) or to designate one good for manufacturing. The game will run for ten phases, as shown by the timer at bottom-right.

The game can be paused and resumed by pressing the Start button. Statistics on a particular player can be viewed by clicking on the player in the rankings list at bottom left.

Exchange Agents:

            Overview: The artificial-intelligence competitors in this market simulation function by executing a program. This program is comprised of RISC-like machine instructions, and is executed once per second in a virtual machine. The program may access market information – prices, store quantities, the holdings of the agent’s player, etc. – and its goal is to purchase, sell, or produce specific goods each second that (hopefully) maximize the player’s value over the long term. For example, a very simple agent might buy a particular good when the price is below 40 and sell the good when the price is above 60.

            Virtual Machine: The virtual machine consists of 256 bytes of read-only code memory and four registers (R0 through R3) that hold signed 16-bit integers. It begins each program execution by initializing the four registers to 0, starting program execution at address 0, and running until it hits an END instruction, reaches the end of the memory space, or executes 2,048 instructions (at which point it terminates to break infinite loops.) Runtime errors are not generated – the machine simply ignores instructions that can't be executed (divisions by zero, jumps pointing outside the memory space, etc.)

            Instruction Set: The machine instruction set consists of variable-size eight-bit opcodes. Most are one byte, specifying both an instruction and a register (CLR R0), but some are followed by an eight-bit constant parameter (SET R0, 100). Two compromises have been adopted in the interest of RISC design. First, two-register math and compare operations cannot specify the same register for both operands (ADD R1, R1). Second, load operations are always performed on register R0.

Input instructions are available for loading market data, including prices, price trends (calculated as the current price minus the price from five seconds ago), and store quantities.  Other instructions provide as input the player's current holdings for any good, the player's current credits, the number of players in the game, the game phase, or the number of the current event. The machine also provides a random-number generator, and can provide random values between 0 and 255. Output instructions are available for setting the player’s BUY, SELL, and PRODUCTION values, which determine the good that the agent purchases (must be specified every turn), sells (must be specified every turn), or produces (this choice remains in place until the program makes another change.) A complete reference guide for the instruction set is provided at the end of this document.

Exchange Agent Designer (Code Assembler):

            An agent created from scratch is completely ineffective. Its memory space is filled with random data, which just shuffles numbers around without meaning.  The usual result is that it does nothing in the course of the game, typically ending with a value of about 3,000 credits (from collecting power for 10 rounds, which is the default production good.)

Two mechanisms exist for creating better agents. The first is by writing an agent program by hand, via the Exchange Agent Designer – an assembly compiler. This compiler translates instructions into opcodes – “SET R2, -2” is compiled into binary code “0x07 0xFE”. A full set of valid operations is provided at the end of this document.

The compiler also handles comments. Any line starting with the ‘ character will be considered a programmer comment and will be ignored. Similarly, a comment can be appended to any line of assembly code; the instruction will still be generated, but the ‘ character and everything following it are ignored.

The compiler also handles labels, which greatly simplify the process of moving around the code. A label can be any combination of non-whitespace characters, but must end with a colon. This marks a point in the code that may be jumped to later, e.g., by a JMP instruction. This label may either be prepended to a line of assembly code or may be located in a blank line preceding it.

The following is an example of a valid assembly program:

' Spice and Shell Trading Agent

SET R1, -1        ' This register holds the number of the good to buy

SET R2, -1        ' This register holds the number of the good to sell

LD R0, PRICE5     ' Load the price of spice

CMPLE R0, 40      ' See if the prices is low

SET R1, 5         ' If so, buy it

CMPGE R0, 100     ' See if the prices is high

SET R2, 5         ' If so, sell it

LD R0, PRICE6     ' Load the price of shells

CMPLE R0, 40      ' See if the prices is low

SET R1, 6         ' If so, buy it

CMPGE R0, 100     ' See if the prices is high

SET R2, 6         ' If so, sell it

ST BUY, R1        ' Set purchase for the appropriate good

ST SELL, R2       ' Set sale for the appropriate good

END               ' Exit program

Once the code is written (in the left textbox), If any errors are encountered, the compiler displays a list of error messages. If not, the compiler shows the output code, and the compiled program can then be saved to disk. Saved agents can later be loaded in the simulation game or in the Exchange agent arena.

Agent Arena (Genetic Programming):

            Overview: The second method of creating new agents is by relying on the included genetic programming algorithm for automatically designing new agents. This algorithm takes a particular set of agents, runs sets of them competitively through a small number of market simulations, assesses the fitness of each agent, and genetically reproduces them to create a new generation of agents that are better competitors.

            Fitness: The first step in the genetic evolution algorithm is to assign fitnesses to the population of agents. This is done by running each of them through the simulator against other agents chosen as competitors. The algorithm randomly chooses ten agents in the current agent population, runs a simulation, and records both the rank and the total value produced by each agent. The algorithm then picks ten more agents and runs another simulation. This continues until each agent has played 100 games.

At the end of this process, the algorithm has recorded an average value and average rank for each agent over 100 games. (This high number of repetitions is necessary because conditions for any particular game might produce an unfairly inflated or unfairly reduced set of statistics; averaging over 100 games ensures that the values are representative of typical performance.)

The algorithm then uses these statistics to assess fitness. First, it finds the maximum average value produced by any agent, and the maximum average rank achieved by any agent. It then assesses fitness for any agent as:

Fitness = (1.0 – (agent avg rank / max avg rank) + (agent avg value / max avg rank)) * ½

Thus, both the average rank and the average value are taken into account (equally) in measuring the fitness of the agent. This avoids assigning high ratings to two undesirable types of agents: (a) those that sabotage the market economy, (thereby producing little value but repeatedly finishing first; and (b) those that produce a great deal of value, but simultaneously facilitate competitors to produce even greater value. A good competitor should both produce great value and routinely beat its competitors.

            Genetic Evolution: Once fitnesses for the agents are assessed, the algorithm produces a new generation of agents via a typical crossover breeding scenario. Many of the child agents are produced by randomly selecting two parents (using fitnesses as selection frequencies), choosing a crossover point n, splicing the first n bytes from the first parent into the child, and then splicing the last (256 – n) bytes fro the second parent into the child.

            However, not all agents are chosen through this crossover process. The genetic algorithm also extends the genetics model by adding three other kinds of new agents to this population:

·         Kept Parents: The top-scoring parents are carried over unchanged into the next generation. This ensures that the best solutions found so far aren’t lost in an unlucky breeding round where they aren’t chosen.

·         Budding: Some parents are also chosen for “budding” – a fission-like process where a child agent inherits some of the genetic contents from one parent agent, and has completely new contents for the rest of its genome. This is carried out by randomly choosing only one parent (again using fitness as a selection frequency), randomly choosing a length from the parent agent, and splicing the length into the child agent.

This choice helps maintain diversity in the gene population. One common problem with genetic algorithms is the loss of genetic diversity – after a number of generations, the population may come to resemble a series of clones, with very little diversity. Crossing over thus produces no new agents, even if the genetic background is not the best one available (but happens to have been the best one in the first-generation agent set.) By splicing successful parent genes together with new, random genes, genetic diversity is added into the system to counterbalance the flushing of bad genes through the selection process.

·         Immigration: A small number of brand-new agents, with brand-new genetic content, are created and added to the child agent set. This is the binary equivalent of migratory addition of new agents to a biological population, and this also helps to maintain genetic diversity.

Finally, the genetic algorithm introduces a number of mutations to the population by selecting random agents and changing one byte to a random value. It is noteworthy that kept parents are not mutated – they are carried over from the parent generation – so mutations only occur to agents created by budding, immigration, or crossover.

By default, the algorithm uses a population size of 100, a kept parents rate of 10%, a budding rate of 10%, an immigration rate of 10%, and a mutation rate of 5%. However, the Agent Arena allows these parameters to be modified.

Usage: To run the genetic evolution algorithm, first select “Agent Arena” from the main menu. An initial list of agents (100 by default) is created, but these may be modified by adding new agents (either randomly generated or previously saved to disk.) Specific agents can also be deleted from the list or saved to disk, and the entire set of agents can be saved or loaded as a unit (these are stored as .exs files.) The genetic algorithm parameters can be changed using the control boxes at lower right. Finally, the contents of any agent can be viewed by double-clicking it in the agent list.

Once an agent set has been prepared, the training process can be started by clicking the Train button. This will start the competitive process (on a separate thread.) A progress indicator displays the number of “steps” (generations) that have been completed, and at the end of each generation, the current set of entities, ranked by fitness, will be displayed. This process continues until the Stop button is clicked, which ends training (at the end of the current generation.)

Instruction Set Reference:

Following is a complete description of the instruction set. Register-specific operations store the register in the opcode (e.g., CLR, the operation for setting a register to zero, is opcode 0x01 for R0, 0x02 for R1, etc.) Two-register operations are stored in the order of (Destination register), (Source register), and the registers are specified in the following manner:

(Register #1) * 3 + (Register #2 > Register #1 ? Register #2 + 1 : Register #2).

Some examples for the ADD function, which starts at opcode 0x4A, follow:

                        ADD R0, R1 (adds the value of R1 into R0)             0x4A

                        ADD R2, R1 (adds the value of R1 into R2)             0x51

                        ADD R2, R3 (adds the value of R3 into R2)             0x52

(notice that this skips over ADD R2, R2, which is impossible with this instruction set.)

Finally, it is important to note that opcode parameters are either an unsigned byte (values 0 through 255) or a signed byte (-127 through 128), depending on which is more useful for this operand. ADD Rx, CONST always uses the parameter as an unsigned byte (presuming that adding a negative is better handled by subtraction), while JMP CONST always uses the parameter as a signed byte (to permit backwards as well as forward jumps.)

 

Opcode                       Operation                               Extended?

0x00                            NOOP                                     No

No Operation instruction.

0x01-0x04                   CLR Rx                                  No

Sets a register to zero.

0x05-0x08                   SET Rx, CONST                    Signed

Sets a register to a signed constant.

0x09-0x14                   SET Rx, Ry                             No

Sets a register to the value of another register.

0x15-0x20                   SWAP Rx, Ry                                    No

Swaps the contents of two registers.

0x21-0x29                   LD R0, PRICEx                     No

Loads the current price for the specified good into R0.

0x2A-0x32                  LD R0, TRENDx                   No

Loads the price trend for the specified good into R0.

0x33-0x3B                  LD R0, QUANTITYx                        No

Loads the store quantity for the specified good into R0.

0x3C-0x44                  LD R0, HOLDINGSx                        No

Load the player's current holdings for the specified good into R0.

0x45                            LD R0, CREDITS                  No

Load the player's credit total into R0.

0x46                            LD R0, EVENT                      No

Load the numeric identifier for the current event into R0.

0x47                            LD R0, PHASE                      No

Load the current game phase (0 through 9) into R0.

0x48                            LD R0, PLAYERCOUNT     No

Load the player's credit total into R0.


0x49                            LD R0, RANDOM                 No

Looks at the value of R0, loads a random number from 0 through R0 - 1, and stores it in R0.

0x4A-0x4D                 ADD Rx, CONST                  Unsigned

Adds an unsigned constant to a register.

0x4E-0x59                  ADD Rx, Ry                           No

Adds the value of register Ry into Rx.

0x5A-0x5D                 SUB Rx, CONST                   Unsigned

Subtracts an unsigned constant from a register.

0x5E-0x69                  SUB Rx, Ry                            No

Subtracts the value of register Ry from Rx.

0x6A-0x6D                 MLT Rx, CONST                   Unsigned

Multiplies a register by an unsigned constant.

0x6E-0x79                  MLT Rx, Ry                           No

Multiplies the value of register Ry by Rx.

0x7A-0x7D                 DIV Rx, CONST                    Unsigned

Divides a register by an unsigned constant.

0x7E-0x89                  DIV Rx, Ry                            No

Divides the value of register Ry by Rx.

0x8A                           JMP CONST                           Signed

Unconditional jump. Forward jumps start with the opcode following the constant (so JMP 0 is equivalent to NOOP); backward jumps start with the preceding address (JMP -1 steps to the address preceding the JMP opcode.) Note that this offset is measured in addresses, not opcodes, so extended opcodes should be counted as two in the jump offset. Also, note that this instruction accepts a signed offset, so it can only jump ahead 128 addresses or back 127 addresses.

0x8B-0x8E                  JZ Rx, CONST                       Signed

Conditional jump - jumps if the specified register is zero. Note comments about JMP offset calculation.

0x8F-0x92                   JNZ Rx, CONST                    Signed

Conditional jump - jumps if the specified register is not zero. Note comments about JMP offset calculation.

0x93-0x96                   CMPE Rx, CONST                Unsigned

Conditional execution - execute the following instruction only if the register is equal to the specified value.

0x97-0xA2                  CMPE Rx, Ry                         No

Conditional execution - execute the following instruction only if register Rx is equal to register Ry.

0xA3-0xA6                 CMPNE Rx, CONST             Unsigned

Conditional execution - execute the following instruction only if the register is not equal to the specified value.

0xA7-0xB2                 CMPNE Rx, Ry                      No

Conditional execution - execute the following instruction only if register Rx is not equal to register Ry.

0xB3-0xB6                 CMPG Rx, CONST                Unsigned

Conditional execution - execute the following instruction only if the register is greater than the specified value.


0xB7-0xC2                 CMPG Rx, Ry                                    No

Conditional execution - execute the following instruction only if register Rx is greater than register Ry.

0xC3-0xC6                 CMPGE Rx, CONST             Unsigned

Conditional execution - execute the following instruction only if the register is greater than or equals the specified value.

0xC7-0xD2                 CMPGE Rx, Ry                      No

Conditional execution - execute the following instruction only if register Rx is greater than or equals register Ry.

0xD3-0xD6                 CMPL Rx, CONST                Unsigned

Conditional execution - execute the following instruction only if the register is less than the specified value.

0xD7-0xE2                 CMPL Rx, Ry                         No

Conditional execution - execute the following instruction only if register Rx is less than register Ry.

0xE3-0xE6                  CMPLE Rx, CONST              Unsigned

Conditional execution - execute the following instruction only if the register is less than or equals the specified value.

0xE7-0xF2                  CMPLE Rx, Ry                      No

Conditional execution - execute the following instruction only if register Rx is less than or equals register Ry.

0xF3-0xF6                  ST BUY, Rx                           No

Store Rx as the purchased good for this round. -1 = nothing.

0xF7-0xFA                 ST SELL, Rx                          No

Store Rx as the sold good for this round. -1 = nothing.

0xFB-0xFE                 ST PRODUCE, Rx                 No

Store Rx as the produced good (stays until changed.)

0xFF                            END                                        No

Terminates program execution.

Comments:

A brief observation regarding opcode set design: Some very useful improvements could be made to the instruction set that would more closely tailor the Exchange programs to the problem at hand. The SWAP instructions are essentially pointless, and could have been replaced by operations with more value. The opcode set also lacks the ability to trade more than one unit of any good – given the speed with which prices change, trading at most two units per second is probably a serious disadvantage. One would hope that the machine could compensate for this drawback by its ability to analyze every piece of information every second. But at least the machines would not be on uneven footing compared to the human player.

A brief observation about fixed-length program sets: Due to the presence of the END instruction, virtually no agent fully utilizes the whole program space – many use half, or even less. This means that large portions of the genes of these agents are never executed and have no bearing on the fitness of the agent. Yet, these junk sequences basically ride on the coattails of the functional portion of the agent, and are spliced into child agents with great frequency. This leads to many crossover child entities that should be quite fit, but actually inherit random/junk genetic data and simply flail. This is very inefficient. Biological organisms have the same problem: a very large percentage of the genetic contents of most eukaryotes is noncoding. (A very small number of noncoding regions have known functions, like regulating expression of coding regions. Most have no known function. But it is apparently not “junk DNA,” as every attempt to splice junk DNA out of organisms uniformly results in dead/failed organisms.)

One broad observation was the relative difficulty of evolving a competitive algorithm by this technique. A 2.6GHz Pentium 4 machine was used as a dedicated, continuous agent arena for six days, producing and testing 12,000 generations of 100 agents (thereby running a combined 12,000,000 simulations.) The best result after this dedicated training was an agent capable of averaging about 18,000 credits of value per game (whereas human players can average 30,000+ with little experience.) Given the breadth and flexibility of the instruction set, a long training process should have produced a much savvier agent.

The likely cause of the difficulty is the delicate nature of computer code – its great sensitivity to small changes. An evolutionary model is great for solutions that can be gradually evolved, with increasing fitness, until a very good or best solution is realized. The traveling salesman problem is a good example of this process: incrementally better routes produce incrementally better fitnesses. A solution that is 90% as good as the best solution will be assigned a fitness approximately 90% of the fitness of the best solution.

Software is very different in this regard. Even the very dumb shell and spice trading algorithm is comprised of 25 bytes – and almost all of those opcodes must have a precise value for the algorithm to function. Changing any opcode – CMPGE to CMPLE, or LD PRICE1 to JMP – will have a disastrous impact on the logical operation of the code. Those bytes representing parameters can be slightly changed to produce a better result: CMP R0, 90 (when R0 holds the price of a good) might produce a better agent than CMP R0, 85. But surely CMP R0, 220 and CMP R0, 15 will produce very different results. Thus, almost any single change to this algorithm will not simply produce a slightly better or slightly worse fitness – it will produce a completely different fitness value, due to its completely different logical operation.

Indeed, this observation closely mirrors an ongoing debate within the field of (biological) evolution: How is it possible for the incremental, random mutation process of evolution to produce very complex biological structures? Hundreds of genes contribute in an essential capacity to the development of the eye – if any one gene does not properly function, the whole eye becomes a useless structure. Evolution dictates that this sophisticated organ developed by having virtually all of these genes arise and begin functioning together within a short time span; otherwise, natural selection would have pushed these useless genes out of the gene pool. Some have analogized the creation of sophisticated structures by stepwise evolution to “expecting a Boeing 747 to self-assemble out of the machine parts in a junkyard.”

The same predicament applies here. Despite the very limited machine of 256 bytes, the number of possible opcode sets is immense: 256256 possible programs, approximately 10616 combinations. Only a very small number of those are likely to be good agents. The challenge, then, lies in finding even one of those acceptable programs in such an infinitely vast search space. Evolution is unlikely to help – changing even a single byte in any such program will completely destroy its functional value. It becomes a matter of luck: one must stumble upon the proper solution via chance, and given the vast number of possibilities, this is very unlikely to occur.

More research is needed in this field to establish a better mechanism for evolutionary programming than this stab-in-the-dark approach. It is likely that Dr. John Koza and the founders of the genetic programming movement have advanced far past this rudimentary model of evolutionary programming, and the author of this assignment will continue to explore the state of this field.

Application History: This application was written for an artificial intelligence course taught by Dr. Toshinori Munakata in Fall, 2004. This application is illustrative for many purposes: an experiment with genetic programming, the design of an opcode set, a virtual machine, a two-pass compiler, a market simulation, etc. It may also be useful for a study in object-oriented program design; on which it relies very heavily; and even as an introduction to applied XML.

Questions/Comments: Please contact David J. Stein, Esq. via email at djs10@po.cwru.edu.

Terms and Conditions of Use: Please see the enclosed "License.html" file for terms and conditions of use of this software package.