Register | Sign In


Understanding through Discussion


EvC Forum active members: 65 (9162 total)
2 online now:
Newest Member: popoi
Post Volume: Total: 915,817 Year: 3,074/9,624 Month: 919/1,588 Week: 102/223 Day: 13/17 Hour: 0/0


Thread  Details

Email This Thread
Newer Topic | Older Topic
  
Author Topic:   The Minkowski's challenge
Barbarian
Inactive Member


Message 54 of 120 (354666)
10-06-2006 3:31 AM
Reply to: Message 47 by Albert Godwin
10-05-2006 12:58 PM


Hi, Albert -
quote:
i hope that you will try to work on the subject after that without much questions, because i will be busy those days.
I think it bears repeating that I did not accept the challenge yet. I might try to do it anyway for reasons stated above, but it might still turn out to be asking for the impossible (and irrelevant).
Obviously, I have to ask questions until all is clear. Sorry, that's the way it works. We do not have a deadline at all, so if you cannot answer my questions during the next month, it is not a problem for me.
That being said, there are chances that after this round of questions, the challenge becomes clear enough.

This message is a reply to:
 Message 47 by Albert Godwin, posted 10-05-2006 12:58 PM Albert Godwin has not replied

  
Barbarian
Inactive Member


Message 62 of 120 (355321)
10-09-2006 2:54 AM
Reply to: Message 55 by Albert Godwin
10-08-2006 8:26 AM


Hi, Albert -
clarification:
quote:
> There is a question about whether the decryptor code must be correctly executable in all generations. I would, just to keep it realistic, propose that it does not need to be.
IF it is not correctly executable then the file will not work. Sorry but all the offspring from the first file to the goal must be executable.
The 'files' would be 'executable' all right, even more, they would be required to produce an exact copy of themselves as a result of being executed (they will also be allowed to produce multiple copies, totally unexecutable code etc. - as long as they also write at least one exact copy of themselves). What I was referring to here was the requirement that the decryption logic has to work from its inception in all intermediate generations, not just in the final one(s). My reading would be that the intermediate generations don't matter, and only the final result should have the mutually dependent and hence irreducibly complex structure of decryption + encrypted code.

This message is a reply to:
 Message 55 by Albert Godwin, posted 10-08-2006 8:26 AM Albert Godwin has not replied

  
Barbarian
Inactive Member


Message 63 of 120 (355322)
10-09-2006 3:50 AM
Reply to: Message 56 by Albert Godwin
10-08-2006 8:36 AM


quote:
> We are approaching this stuff from very different positions. The challenge is not yet entirely clear, e.g. see the discussion about encryption above. Depending on the clarifications, it could turn out to ask for the impossible or for the impractical.
So far, do you accept the challenge or think that it is fair enough?
See my previous post for a clarification request; apart from that, pretty much the only thing I still don't have a handle on is the required volume of the simulation. I have a few possible selector sequences which would lead to the irreducible complex setup we talk about here, but the system is complex enough to surprise me even before it is ready to run. Apart from the uncertainty regarding the necessary time, another nice problem is that in such a reverse-engineered situation, the winner of every 'geological period' must be the one I intended it to be, but it can always happen that the artificial world discovers a better solution to the constraints of the selector, and this better adapted program completely displaces the one I tried to evolve to serve as an intermediate step.
quote:
But to me, it is an excellent method to practically show the programmers how evolution is very unlikely to happen.
Many other posters have already pointed out to you that this challenge has little to do with biological evolution, in particular the feasibility of the two processes are all but unrelated. I get it that you think irreducibly complex structures cannot evolve at all, but this particular sort of structure might be unevolvable for entirely different reasons than unfeasibility of biological evolution. It could happen that the only selector for which encryption + decryption is the best solution is one similar to the WEASEL selector (measuring the distance of certain dead code from a prepared decryption + encrypted code structure - this would be cheating and of course I won't use it), and all other selectors understandable by humans admit better solutions than the encryption+decryption one. It could happen that the scaffolding needed to evolve this structure is so huge that the program won't finish evolving and then dismantling it in our lifetime. Etc. etc.
This is why I try to handle this as a programming challenge. The most severe constraint I see right now is that it has to happen within reasonable time. I do not intend to, nor can easily enlist anything beyond a machine with a 2400 MHz Intel processor, and the ratio between what can be done on such a machine and what biological evolution had at hand to work with simply defies imagination.

This message is a reply to:
 Message 56 by Albert Godwin, posted 10-08-2006 8:36 AM Albert Godwin has not replied

  
Barbarian
Inactive Member


Message 66 of 120 (355540)
10-10-2006 8:09 AM
Reply to: Message 65 by Albert Godwin
10-10-2006 4:21 AM


Hi, Albert -
quote:
>It could happen that the only selector for which encryption + decryption is the best solution is one similar to the WEASEL selector (measuring the distance of certain dead code from a prepared decryption + encrypted code structure - this would be cheating and of course I won't use it),
Yes, I object to using it, not because it is cheating, but because many intermediates will not be executable in this way.
Since I do not intend to use such a trick anyway, it might be superfluous to point out that you object to the use of a WEASEL selector for entirely the wrong reason. Executable programs may contain garbage code if said code is never executed or is executed only after an exact copy of the program was written out and declared final (i.e. the 'file' was 'closed'). Such garbage code could be evolved to whatever I want it to be using a WEASEL selector. The reason it should not be used, besides smelling like cheap cheating, is simply that results obtained using such a selector usually fail to impress young-Earth creationists, who can claim that the end result was too directly specified by this method. The original WEASEL program came under exactly this kind of attack.
Perhaps this is the moment when I have to bring up the fact that selectors are situated on a continuum, with WEASEL at one extreme and no selector whatsoever at the other one. Given this fact, it is easy to foresee the opportunity for bitter dispute about a successful selector being "too specific". So far none of the selectors making up the sequences I considered were even close to being too specific, but I still wonder how would you draw the line between WEASEL-like and acceptably general selectors.
All else considered, I think we have touched on all the necessary basics here. I will summarize my understanding of the discussed details probably before Sunday this week and try to force it into the resource envelope of a usual hobby project. It could take a long time even if it proves to be doable on small machines, potentially lasting for months, since I can only work on it on weekends, although I expect the whole environment to be ready to run by the end of this week.
Edited by Barbarian, : English grammar ...

This message is a reply to:
 Message 65 by Albert Godwin, posted 10-10-2006 4:21 AM Albert Godwin has replied

Replies to this message:
 Message 68 by Albert Godwin, posted 10-11-2006 1:50 PM Barbarian has replied

  
Barbarian
Inactive Member


Message 69 of 120 (356718)
10-15-2006 4:03 PM
Reply to: Message 68 by Albert Godwin
10-11-2006 1:50 PM


As promised earlier, I have summarized the rules I try to work within. I am doing this for my own amusement - I am done with the tedious work of testing the assembler, the virtual machine, the debugger and the mutation mechanism - and have no idea when will I finish. At the current rate, sometime around the end of the year ...
Markings such as "(-> number)" mean that the notion being used is elaborated under the paragraph marked "(number)". This was needed because I could not be bothered to give the stuff a pedagogical structure, so references run both forward and backward.
So, the rules:
(1) the task is to evolve (-> 2), with appropriate choice of selectors (-> 3), a program (-> 4) featuring viral behavior (-> 11) and encryption (-> 5), starting from one which shows viral behavior only (-> 6).
(2) evolution proceeds over a finite population of possibly different programs (-> 4), by means of the selector (-> 3) granting them a chance to procreate (-> 7) proportional to their fitness value (-> 10). The task is not simply to evolve such a specimen once but to provide a way to repeat the experiment as many times as one wants.
(3) the selector functionality is based upon a fitness function (-> 10), with the fitness values directly influencing the chances of leaving offspring. Multiple selectors can be used either sequentially - i.e. evolving a trait, then switching to the next selector - or in parallel, simulating co-existing biological niches. Most likely only the sequential variant will be used.
(4) programs are codes in a low level language defined by me and runnable in a virtual machine. A program is a list of segments, while a segment is a list of instructions. The language will be publicly specified only together with the solution, because it might still change in order to make evolution fast enough for my ancient PC. (It is a stack-based language much like Forth, but without Forth's structured approach - apart from not having registers, it is as non-stuctured as any machine code can be.)
(5) presence of encryption is defined as the observed execution of all special instructions during program runs, together with their absence from the program code.
(6) the lack of encryption in the start-up code is established by simply inspecting its assembly source.
(7) procreation proceeds under the supervision of the framework, which selects a member of the population with a probability linked to its fitness value (-> 10). The selected individual is copied verbatim and then, depending on a random choice, zero or more mutations (-> 9) are applied to it (most likely this will be simplified to zero or one). Finally, the new individual is rated for fitness and admitted into the population, unless it has a fitness rating of zero (stillborn).
(8) special instructions are instructions which from the POV of the virtual machine do nothing. As such, they act as NOPs and cannot alter the state of the machine in any significant way (except the standard update of the current instruction pointer). They do, however, interact with the environment, and for the sake of this particular simulation I plan to use one of them for writing out a program segment (-> 4), another one for declaring the written-out list of segments closed (closing the file), and some number of them to do nothing except register the fact of their execution with the environment. I do not know beforehand how many of them will be useable within a reasonable time frame - originally I proposed a total of 12 such instructions, but it could be less if I want to ever finish.
(9) mutations are applied by a mutator mechanism, which has some understanding of the program structure (-> 4). This means that in addition to individual instructions being altered, deleted or inserted, whole segments can be fused, split, duplicated or dropped. The probabilities of these mutation events can be finely adjusted, but they stay unchanged during the whole experiment.
(10) a fitness function attaches some numerical value to a particular program. The value shall be a non-negative integer, with a finite upper bound. In order to exclude WEASEL-like selectors, the fitness function is limited to seeing only certain aspects of the program, such as the number and individual length of program segments and knowing about the presence of special instructions (-> 8) in the code. In addition, the fitness function will be able to inspect the result of effects of special instructions (-> 8) upon the environment.
(11) viral behavior is defined as the capability of the program to write out an exact copy of itself. The program has a way to delimit units (close files) it writes out. It does not matter if units before or after the exact copy are additional exact copies or perhaps just junk. In addition, if the copy is written out, it does not matter how the program terminates - by error, by exceeding the time limit or by normal termination.

This message is a reply to:
 Message 68 by Albert Godwin, posted 10-11-2006 1:50 PM Albert Godwin has not replied

Replies to this message:
 Message 70 by Percy, posted 10-16-2006 9:43 AM Barbarian has replied

  
Barbarian
Inactive Member


Message 71 of 120 (356992)
10-17-2006 4:54 AM
Reply to: Message 70 by Percy
10-16-2006 9:43 AM


I hear you, Percy. I started this exchange with the hope of making Albert re-think his position while we argue the rules, then set the somewhat lower aim - still not entirely dropped - that if this challenge ever makes it to the Index (of creationist claims), it would be nice to be able to post the rebuttal at the same time, and finally set it as a programming challenge. So no, my primary goal is not to convince Albert, but I do not discard the possibility of such an outcome either (mostly because the possibility is inconsequential to me at this point).
I do expect some arguments about the ways my proposed virtual machine programs differ from usual machine code. I included some behaviors usually not found in real-world machine code in order to make the necessary evolutionary timeframe shorter, but I always tried to do it in a way mimicking some part of biological evolution. E.g. insertions/deletions of instructions usually do not dislocate jump targets, as jumps refer to labels attached to individual instructions, not to whatever instruction is located at a certain address (in order to localize the effect of mutations, as they are localized - in a different way - in the genome, but not localized in the usual machine code). Even so, the analogy between my toy world and biology is, for lack of a better world, fractured. E.g. instructions could be claimed to correspond to codons (I even have a STOP and a START instruction, suspending between them the execution - but not the scanning - of instructions), and perhaps segments can be thought as corresponding to chromosomes, but there is no direct correspondence to genes (perhaps the execution history between two writings of segments could have a claim to this), and even less correspondence with expression of genes (I use a mix of "structure of program", "presence of special instructions" and "effect of executing the special instructions" to correspond to the real-world test of the result of gene expressions). These programs are more like ribozymes catalyzing their own synthesis, but then why do I include a mechanism like the chromosome? So, there is ample opportunity for arguing away their relevance. This whole experiment is very far removed from biology.

This message is a reply to:
 Message 70 by Percy, posted 10-16-2006 9:43 AM Percy has not replied

Replies to this message:
 Message 72 by Albert Godwin, posted 10-17-2006 5:16 AM Barbarian has replied

  
Barbarian
Inactive Member


Message 73 of 120 (356994)
10-17-2006 6:03 AM
Reply to: Message 72 by Albert Godwin
10-17-2006 5:16 AM


Re: That's enough
Hi, Albert -
thanks for the invitation to go to the other forum but I have to decline it for now. I have a certain aversion to signing up to new boards without a decent amount of previous lurking, and besides, there is little else to discuss about this particular experiment. This whole challenge has now become just a programming pet project for me, which I will try to complete under the terms we discussed but otherwise I would rather have it completed first and justify my additional design decisions later.

This message is a reply to:
 Message 72 by Albert Godwin, posted 10-17-2006 5:16 AM Albert Godwin has not replied

  
Barbarian
Inactive Member


Message 87 of 120 (357519)
10-19-2006 4:23 PM
Reply to: Message 85 by Percy
10-19-2006 12:20 PM


quote:
Barbarian will have to confirm, but my understanding of his approach is that the instruction is the unit of mutation, not the individual bits coding for machine instructions. I don't think his abstract machine includes the concept of bit-encoded machine-level instruction
That's correct: one unit of mutation is the instruction, another one is the segment.
In fact, I have a very basic draft specification of the language and the mutation mechanisms, which I feel to be on-topic enough to post here:
Programs are non-empty sequences of segments; segments are non-empty sequences of instructions.
A program is executed starting with the first segment and executing instructions in order - potentially resulting in control transfer to other segments - until either an error condition occurs or execution runs off the end of a segment. Execution does not fall over to the next or any other segment - it has to be explicitly transferred by a JUMP, CALL etc. instruction.
The context of a program consists of a value stack and a return stack. The value stack can contain any instruction; the return stack can contain addresses only. Both stacks start out empty.
Instructions fall into the following categories: literal integers, literal addresses and simple instructions. Simple instructions are further subdivided into special instructions (see below) and regular instructions (all the rest).
Literal integer values (0 to 7, but this is a compile-time option and can go all the way to include all integers representable on 63 bits) and simple instructions form the class of values. Repeated incrementing starting from literal zero will pass through all integers, then all simple instructions (first regular, then special ones), then loop back to zero, IOW it loops through all values in a certain order; semantics of addition and negation are derived to be consistent with this behavior. The length of this cycle has a fundamental role in the structure of programs, and it is called FIRST_INVALID (as it is the first invalid numeric value if we start enumerating them from zero). The cycle also establishes an one-to-one correspondence between values and corresponding numeric values (nonnegative integer numbers up to FIRST_INVALID - 1).
Technically, addresses are pairs of segment identifiers and offsets, although they are somewhat smarter than that - more about this in the mutation part. Addresses can by construction only point into valid, existing segments, but they can point to invalid (i.e. too high) offsets within those segments. Offsets start with zero for the first instruction in the segment, and extend all the way up to (but not including) FIRST_INVALID. Thus, any value can be used as offset (via its numeric value). Conversely, the numeric value of an address is the numeric value of its segment.
Incrementing an address increments its offset modulo FIRST_INVALID, negating it amounts to negating its offset modulo FIRST_INVALID; both instructions yield addresses with a new offset and the original segment. If an address is the top/second operand of an AND or XOR operation, it behaves as if it was a value corresponding to its offset (the segment gets discarded). If an address is the bottom/first operand of these operations, its offset is used in carrying out the computation and then the result is constructed to be an address with the original segment of the first operand and the result of the computation for offset.
Equality means complete equality, in the case of addresses the segments and offsets must be pairwise equal, and no address can be equal to any value. As for ordering, values are ordered by their numeric value and are smaller than any address; addresses are ordered by the order of their segments, then by their offsets.
I list the instructions below. Unless otherwise noted, 'expects on stack' or 'continues at address' mean that the program exits with an error condition if the expectation is not met or the address is invalid. Expected stack context is enumerated starting from the top of the stack which is imagined to grow upwards; 'expects an address and two values on the value stack' means that the address is the topmost, last pushed element on the stack.
The list of instructions is the following:
Literal integer (written # N e.g. #5) - pushes itself (i.e. the integer) onto the value stack.
literal address (written *LABEL eg *LOOP1 or * <2:4> for segment=2 and offset=4) - pushes itself (i.e. the address) onto the value stack. It does not matter whether the address is valid or invalid.
NOP - does exactly nothing
NOTEMPTY - expects an address on top of the value stack; if the stack beneath the address is empty, jumps to the point denoted by the address, else removes the address from the top of the value stack and continues to the next instruction
ADD - expects two instructions on the top of the value stack, removes them and pushes their sum on the value stack.
INCR - expects an instruction on the top of the value stack, increments it according to the rules above
NEG - expects an instruction on the top of the value stack, negates it as described above
XOR - expects two instructions in the value stack, combines them as described above
NEWSEG - expects an instruction on top of the value stack; removes it and adds a new segment containing the instruction as the only element to the program; pushes the address of its first (and only) element onto the value stack
OFFSET - expects an arbitrary instruction and an address on the value stack; replaces them with the address built by taking the segment from the first/lower operand and the numeric value/offset of the second/top operand for offset.
NEXTSEG - expects an address on top of the value stack, replaces it with the address of the beginning of the next segment if any, or with an integer 0 if there is no segment after the one pointed to by the top one.
LOAD - expects an address on top of the value stack, replaces it with the instruction to which it points
PUSH - expects an address and a value on the value stack, removes them and prepends the value to the segment pointed into by the address (offset gets ignored)
ESC - does nothing, but the immediately next instruction is pushed onto the value stack instead of being executed. This is the only way to explicitly push simple instructions on the stack.
DROP - expects the value stack to hold at least one instruction; discards the topmost element.
DUP - expects an instruction with numeric value N on the top of the value stack; replaces it with the element at depth N in the value stack (N=0 corresponds to the topmost element)
SWAP - expects the value stack to contain at least two elements; swaps the topmost two elements
LIFT - expects an instruction with numeric value N on the top of the value stack; drops it, removes the element at depth N and puts it onto the top of the value stack.
JUMP - expects an address on the top of the value stack; removes it and continues with the instruction pointed to by said address
CALL - expects an address on top of the value stack; removes it and continues with the first instruction in the segment pointed to by said address (the offset is disregarded!) but before doing so pushes the address of the instruction after the CALL onto the return stack
RET - expects the return stack not to be empty; removes the topmost address and continues execution with the instruction pointed to by it
STOP - changes the execution mode such that all subsequent instructions except START will count as NOPs. In particular, more STOPs do not open nested levels. Falling off the end of the segment counts for regular termination
START - restores normal execution mode. If the machine was not operating under the effect of a previous STOP, START acts as a NOP.
IFEQUAL - expects an address and two instructions on the value stack; removes them; if the instructions are exactly equal, continue at the address, else with the next instruction
IFLESS - expects an address and two instructions on the value stack; removes them; if the first/deepest value is less than the second/middle one, it continues at the specified address, else with the next instruction.
INVALIDADDR - expects an address and an instruction on top of the value stack; removes them; if the instruction is not a valid address, it continues at the given (top) address, else with the next instruction
Special instructions, written SPECINSTR N e.g. SPECINSTR 2 - act as NOPs as far as the machine is concerned, with the exception that they can raise error conditions. These are the instructions whose execution can convey information to the environment. I am undecided about their required number. I am sure I will use SPECINSTR 0 to expect an address on the value stack and - leaving it there - I will use this as a request to write out the segment pointed to by the address; I will also use SPECINSTR 1 to signal the end of a set of written segments. A sequence of SPECINSTR 0's followed by a single SPECINSTR 1 is akin to writing and closing a file - this can be repeated then with new values. Other SPECINSTR's simply register the fact of them having been executed.
Mutations can affect instructions or structure/segments. First, it is decided whether a mutation must take place or not. If the answer is yes, we decide whether it be an instruction-level or a structure mutation.
Instructions can be deleted, inserted or replaced. *EDITED* Insertion is not always possible, but the other two are; the type of mutation is chosen from an uniform distribution of possible mutation types.
Deletion does not care about what instruction it deletes, but readjusts all addresses pointing into the segment of the deleted instruction and to a point higher than the deletion occurred. If the segment had only one instruction, then the whole segment is deleted, see below.
Insertions are possible before the first, after the last or between any two instructions, provided that the resulting segment is not longer than FIRST_INVALID. Addresses will be adjusted accordingly. The environment first decides whether to insert an address or a regular instruction. If it is an address, the segment is chosen in an uniformly random way, while the offset will be chosen from a truncated geometric distribution. If it is a regular instruction, it is chosen in an uniformly random way.
Replacements are sensitive to what they change. The three classes are addresses, integers and the rest. There is a small chance of changing the class; if it happens, the new value is chosen uniformly from the available values in the new class, except for address offsets which are chosen from a truncated geometric distribution. If there is no change of class, integers turn into a strictly different integer, chosen from an uniform distribution; instructions turn into a strictly different instruction, chosen from an uniform distribution; addresses change either their segments or their offset or both; if the segment is changed, it is chosen from an uniform distribution of all valid segments; if the offset is changed, it is chosen from a truncated geometric distribution but must differ from the previous one if the segment did not change.
Stucture/segments can be mutated in four ways: duplication, deletion, fusing and splitting. Duplication is always possible, the other three may sometimes be impossible. The actual mutation type is chosen from a uniform distribution of possible types.
A duplicated segment is placed at the end of the program. No adjustments of addresses will take place. All segments have equal probabilities of being duplicated.
Deletion is accompanied by adjusting all addresses pointing to other segments, and replacing all addresses pointing into the deleted segment by literal integer 0's. All segments have equal probabilities of being deleted.
Fusing is done by removing one segment and concatenating it to another one (the base segment). This is only possible if the length of the new segment is not greater than FIRST_INVALID. Addresses pointing into the removed segment are adjusted accordingly. The last RET or JUMP instruction in the base segment is replaced with a NOP or a DROP, respectively. All joinable segment pairs have the same probability of being chosen for fusing.
Splitting is only possible between two instructions (you can't split off the empty end or beginning of a segment, splitting before the first or after the last instruction). The split-off part is added to the end of the program, after other segments, and addresses are adjusted accordingly. All splitting points have the same probability of being the site of a split.
[ETA!] The literal address of the start of the new, split-off segment and a JUMP are added to the end of the truncated segment part. Accordingly, segments having maximal length (FIRST_INVALID) cannot be split between the last two instructions.
Edited by Barbarian, : Reason for edit: forgot to copy the last two sentences into the text box...
Edited by Barbarian, : Display of smilies interferes with syntax of literal address
Edited by Barbarian, : Of course, deletion of an instruction is always possible, what I had in mind was insertion ...

This message is a reply to:
 Message 85 by Percy, posted 10-19-2006 12:20 PM Percy has not replied

Replies to this message:
 Message 88 by Jazzns, posted 10-19-2006 4:53 PM Barbarian has replied
 Message 90 by Barbarian, posted 10-20-2006 5:36 AM Barbarian has not replied

  
Barbarian
Inactive Member


Message 89 of 120 (357529)
10-19-2006 5:10 PM
Reply to: Message 88 by Jazzns
10-19-2006 4:53 PM


quote:
Did you design this architecture?
Yes. It is an extreme deviation from another one I designed to serve as an intermediate language in a compiler infrastructure I was (and still am) trying to put together. As you can tell, I have really weird hobbies.
quote:
Are you going to put in an instruction execution limit to avoid infinite looping?
Yes again. The language is Turing-complete (*), so the halting problem is not generally solvable for programs written in it. I toyed with the idea of proposing a non-Turing-complete language, but decided against it. Instruction count limitation is much simpler.
(*): well, theoretically. Practically the limit embodied by FIRST_INVALID makes it non-Turing-complete, but by that standard few other languages are complete. Assuming an infinite value for FIRST_INVALID, I am pretty sure Turing completeness would be tedious but straightforward to prove, and I am also sure that the potential infiniteness of segment count is not rescuing me from Turing-incompleteness in the presence of a finite FIRST_INVALID.
Edited by Barbarian, : Added further musings about Turing-completeness

This message is a reply to:
 Message 88 by Jazzns, posted 10-19-2006 4:53 PM Jazzns has not replied

  
Barbarian
Inactive Member


Message 90 of 120 (357636)
10-20-2006 5:36 AM
Reply to: Message 87 by Barbarian
10-19-2006 4:23 PM


Populations and selectors
Some more details, this time mostly about code I haven't written yet (but it should not be an issue - the biggest challenge I see now is saving and reloading the state):
Populations are size-limited collections of genotypes, organized by their fitness function. Genotypes with identical fitness values are grouped into fitness classes. After additions (including mergings of two populations) the population is always cut back to the size limit.
Procreation means the building of a second collection of individuals, by giving every genotype in the source population as many chances as indicated by its fitness value to procreate. Procreation means the creation of a copy with at least one mutation (due to the fact that the parent population is not discarded but merged with the new one, there is no need to bother with non-mutated offspring), computing its fitness value and adding it to the child population (several optimizations are planned here, e.g. offspring with a fitness value lower than the current low in the parent population are not added but discarded). I think for now I can do with exactly one mutation per offspring, but I am not excluding the possibility of drawing the number of mutations from a geometric distribution.
After the child collection is completed, it is merged with the parent population and trimmed to size (low fitness values get eliminated). The last fitness class, potentially straddling the population size limit, is randomly culled to fit.
I plan to use a chain of selectors, connected as follows: every selector has its own associated population. Initially, only the first selector in the chain has a non-empty population. A new generation is produced as follows: first, the first population undergoes a procreation step as discussed above. The next population undergoes the same, except its offspring population gets initialized with the collection of offspring admitted into the first population. This is repeated all the way to the last population, where I will be waiting, watching the fitness of top performers and running the experiment until I see a spike in said fitness.
Right now I score almost 3 million instructions executed per second (on a 800MHz Athlon), while the mutation engine can build new individual programs worth about 4.5 million instructions per second. These numbers combine to an estimated procreation speed of creating programs worth 1.5 million instructions per second. Making my own work easy, I assume the average size of a program is 30 instructions, and come to the estimate of about 50,000 individual procreation events per second. I don't think this can be significantly enhanced - I can imagine a fivefold or sixfold speedup from complete rewriting in C or assembler, but the required amount of work and maintenance (and for the assembler solution, portability issues) mean that the speedup is simply not worth it.
I thought I would implement sort of a predictor mechanism, which would look like the above except two changes (1) it would not enforce the population size limit if there was only one fitness class in it and (2) it would create an offspring with every conceivable single mutation. Optimization demands that in subsequent generations, whoever had a chance of procreation should not be given such a chance again. With a smaller population, this method would in fact try to find the most likely direction of evolution. It is, unfortunately, easily defeated by the appearance of a population consisting of one single but huge fitness class - the population size is unlimited in this case. I am just playing with this idea now.
The structure of the programs would make mating / horizontal gene exchange simulation pretty easy, but in order to stay within the valid-analogy radius of the original challenge, I'll go with asexual division / parthenogenesis / whatever it is called.
I planned to write a bit about the selector sequences, but I won't have the time to do both coding and writing. Basically, the most promising idea is to evolve programs which write two copies of themselves, then alter the code in a way to cause one of the copies to contain freak decryption of a minimal part of the code (thereby making this copy an invalid one) and execute it (crashing the program) after the valid copy was written out, then wait until the program acquires another mutation which suddenly turns the henceforth invalid copy into a valid one (and the so far valid copy into an invalid one), then forcing all SPECINSTRs to reside in the decrypted code only. This is just one of the ideas; I have a number of other lines to try.

This message is a reply to:
 Message 87 by Barbarian, posted 10-19-2006 4:23 PM Barbarian has not replied

Replies to this message:
 Message 91 by Jazzns, posted 10-20-2006 12:47 PM Barbarian has replied

  
Barbarian
Inactive Member


Message 95 of 120 (357960)
10-21-2006 2:35 PM
Reply to: Message 91 by Jazzns
10-20-2006 12:47 PM


About helping the project
Thanks for volunteering, folks.
Thinking about the best cutting point in the project, I think it will come when the framework is complete and experimentation with selector/fitness rater sequences can begin; we can then try parallel experiments, everyone with his/her preferred selectors. I am so close to completion that I don't think I can efficiently involve anyone instead of just sitting down and finishing it (this is a long weekend, Monday is a national holiday here and bad weather was promised).
Selectors will have to be hard-coded and the whole framework recompiled for them to work, so whoever volunteers to find proper evolutionary pathways will either have to write OCaml code or give me the specification of the preferred selector and compile the code I provide in return. As a preparatory measure, I think you could try to download and install the OCaml distribution. Those on Linux or other Posixoid systems with a C compiler and a basic 'make' will have it easy, but those on Windows will need Cygwin+GCC+GNU Make for running real experiments. One can play with the full functionality of the framework on Windows even in the absence of Cygwin GCC; it's just the optimizing compiler, needed for the millions-of-instructions-per-second class of performance, that won't work without it. OCaml provides another optimizing compiler relying upon the Microsoft compiler suite, but I am not well-versed enough in the use of its 'make' functionality to support it.
I plan to simply release the source code into public domain, and I welcome any ideas about how to physically do this, given that I do not have a website on which to post it. Right now the code contains no comments whatsoever, but adding them should be easy and fast, and of course I plan to do it before publishing.

This message is a reply to:
 Message 91 by Jazzns, posted 10-20-2006 12:47 PM Jazzns has replied

Replies to this message:
 Message 96 by Jazzns, posted 10-21-2006 3:21 PM Barbarian has not replied
 Message 97 by jar, posted 10-21-2006 4:32 PM Barbarian has not replied

  
Barbarian
Inactive Member


Message 99 of 120 (358788)
10-25-2006 1:41 PM
Reply to: Message 98 by Percy
10-21-2006 5:25 PM


Re: About helping the project
Thanks in advance for the offered help wrt hosting; some delay is even welcome, since I just realized I need to reimplement population management over a new data structure to speed it up, as it turns out to be the slowest part of the framework. I thought I could implement populations using the Map and Set structures from the OCaml library, but what is needed here is a somewhat modified variant of binomial (min)heaps (which essentially reflects the fact that the populations are actually priority queues with programs queueing up in them waiting for extinction, reverse prioritized by their fitness values - I find this funny but also very helpful, as it just feels to be the proper paradigm). Otherwise only the read-eval cycle and save/restore functionalities are to be implemented, and we are set to go.

This message is a reply to:
 Message 98 by Percy, posted 10-21-2006 5:25 PM Percy has not replied

  
Barbarian
Inactive Member


Message 100 of 120 (362347)
11-07-2006 3:50 AM


Status
Short update on the project.
I am at the final step of connecting up the main command reader-interpreter-result printer loop with the already coded procedures. IOW I can already compile a complete framework, only that some commands do nothing except telling me they aren't yet executed. The code they will execute is there already, only they are not linked up. Should be ready by the end of the week, and then serious testing can begin - probably another week or so.
The framework has a command prompt interface only, with very rudimentary control over the amount of information dumped at request, e.g. there's no paging of any kind; to help this situation, I use it from within an Emacs shell. I might later create a dedicated Emacs mode for it based on comint, mostly because assembler error messages are of the "something is wrong somewhere in the code" variety, and proper syntax highlighting would help a lot in spotting syntax errors.
I have said before that saving and loading of workspaces would be the biggest challenge I see ahead of me. Well, it is done now, but I had to implement my own stream tokenizer for it, in order to allow me to distribute the load and save functionalities between modules. The main issue to be solved was that the workspace is pretty redundant while residing in memory (speedup tricks et al). Now this redundancy does not hurt me because I took care to share structures whenever possible, but such sharing is destroyed both by the standard - binary - dump/reload mechanism (Marshal) in the OCaml library and by the straightforward visit-all-nodes type saving. This would bloat the saved state file and would result in hitting an out-of-memory condition when loading a saved state on the same machine it was saved. So I had to implement an explicit management of structure sharing. There are libraries out there which do this, but I want to keep everything workable with just the basic distribution of OCaml, so I could not use them. The saved state is a pure text document, just for the paranoid who would want to check what kind of files are written by the framework. I certainly would want to do so.
I was trying to find a good data structure for populations, and was toying with binomial heaps for about a week, in what turned out to be a severe bout of overengineering. Careful analysis showed that the best amortized complexity is given by simple sorted lists, assuming that sorting is strategically delayed. Experiments supported this idea, so now I have the most primitive collection type - linked lists - holding the populations.
I claimed before that for a finite upper bound of representable values the language does not look like being Turing-complete. That turned out to be false: I have found a way to implement any Turing machine in the language, capitalizing on unboundedness of segment count alone. So, the language is Turing-complete as it stands.
Writing new selectors should be easy. You have to specify the name (an unique identifier), the description and the fitness rater function. The fitness rater receives a program and returns an integer - there are utilities to decide if the program can write an exact copy of itself and to assemble a histogram of SPECINSTR calls made during a run. The framework needs to be recompiled for a new selector to start working; earlier saved states can be reloaded into the framework after such a recompilation, unless some already used selector was changed - then the results are unpredictable. I will provide a template for doing all this in a clean way.
The toy world consists of a chain of populations, each population with its own dedicated selector. The first population can only grow by procreation of members and admission of programs from external files; all subsequent populations can only grow by procreation of members and admission of newly admitted members of the previous population. Populations can be added to the end of this sequence at any time, immediately loaded with the members of the previous population. Any number of populations can be cut from the beginning of the chain, such that some population from the middle becomes the first one. After any such cut, it is no longer possible to admit programs from external files into the first population.
Populations hold different genotypes, and are thus limited in diversity, not in count of individuals. Every genotype is present only once, but gets a number of chances to procreate - with a mandatory mutation - equal to its fitness value. In addition, the unchanged parent is also admitted into the next generation. If a population grows beyond its size limit - which is independently settable for every population - then it is cut back to size by removing individuals with the lowest fitness values. One can run the framework with a number of desired generations (full updates along the population chain) or until a certain threshold of fitness value is reached within a certain population. The program can be interrupted from the keyboard, stopping in the latest consistent state.

Replies to this message:
 Message 101 by Jazzns, posted 11-07-2006 10:24 AM Barbarian has replied

  
Barbarian
Inactive Member


Message 102 of 120 (362428)
11-07-2006 12:40 PM
Reply to: Message 101 by Jazzns
11-07-2006 10:24 AM


Re: Status
quote:
Why is this? Is is just that previously conserved elements will no longer be conserved or is there some mechanical reason things will become unpredictable? One of the things that may be necessary to get the desired result it to change a selector. Is that possible or are you concieving that a change in a selection behavior can always be achieved by the creation of a new selector.
I have to start from afar to explain this. I could not make up my mind about the proper use of multiple selectors: should they be applied to a single population one after the other, allowing a fair bit of evolution to happen under each one (geological age approach) or should they coexist as a chain, each accepting migration from the previous one (migration between ecosystems approach)? So I chose a generalization of both (described in my previous post). In this approach, if you alter a selector, you are supposed to scrap the associated population and all populations coming after it, because you want to be able to repeat the whole evolutionary exercise multiple times, and you either cannot do that because the scrapped version of the selector is necessary for some phase, or you can do that but then you better make sure you can indeed, by repeating the evolution of subsequent populations. This being the case, I decided not to try to deal with validation issues and simply assume that the selector behind a certain identifier is invariant. Right now changing the selector would not technically break stuff (although the evolutionary pathway could be rendered un-repeatable - the old selector could be required for a certain phase), but as the application keeps being changed, and I will not care about validation, there's no telling what could happen - this is why I said that the results will be unpredictable.
The way I expect the framework to be used is this: figure out an evolutionary pathway, described by a sequence of selectors. Implement all selectors. Create a world with a single population under the control of the first selector. Start evolution. If the required results are showing, add a second population after the first, controlled by the second selector. Continue evolution, ... and so on. As soon as practical, cut off populations from the beginning to save evaluation time. If at some phase you realize that a selector is buggy or just apt to drive evolution towards the wrong goal - this will most likely happen in the last population - then you fix the selector and scrap the population obtained with the wrong selector variant (either implicitly, by reloading the state saved just before the population with the faulty selector was added, or explicitly, if I add a DropLastPopulation command).
There are exceptions. If the saved state does not contain populations referring to the selector, then the selector can be freely changed. (Indeed, this is the way I expected it to happen: you change the selector and continue with an early save of the state.) The implementation of a selector can of course be changed at any time if the changes preserve its functionaility (i.e. for the same program it will return the same rating).

This message is a reply to:
 Message 101 by Jazzns, posted 11-07-2006 10:24 AM Jazzns has not replied

  
Barbarian
Inactive Member


Message 103 of 120 (363401)
11-12-2006 2:42 PM


Framework complete, testing begins
Like the title says. I have committed the first full version back into the version control system. (12 source files, 3082 total lines, 81537 total characters.) testing + code sanitizing + adding comments should not take more than the next two weeks; after that the code will be ready to share and experiment with - make it evolve encryption, that sort of thing.

  
Newer Topic | Older Topic
Jump to:


Copyright 2001-2023 by EvC Forum, All Rights Reserved

™ Version 4.2
Innovative software from Qwixotic © 2024