EvC Forum: The Minkowski's challenge

Email This Thread

Newer Topic | Older Topic

Author

Topic: The Minkowski's challenge

Barbarian

Inactive Member

Normal Thread Display

Message 104 of 120 (367621)
12-04-2006 3:02 AM

I am not doing well.

I have an evasive bug, which takes ages to manifest itself even in the native code executable, and is therefore likely to need weeks of continuous running in the interpreted version (which is a problem because the debugger only understands the latter). Obviously, I have to revert to other means to find it, and this will take some more time.

Overall performance is good but not as good as I expected it to be. Profiling shows that the largest chunk of time (about 15%) is spent in runtime system functions managing memory allocation and deallocation; no function of mine uses more than 1% of total time, so on one hand this means I am good at before-the-fact optimization, on the other hand I have no handle on further speed improvements.

Apart from that, the stuff works pretty well. According to ancient custom, I have tried to evolve the shortest self-replicating program and got it quite quickly:

prog
     *
     specinstr 0
     specinstr 1
   end

It really sounds like the shortest possible variant.

Replies to this message:
	Message 105 by Percy, posted 12-04-2006 10:31 AM	Barbarian has not replied
	Message 106 by Percy, posted 12-04-2006 2:33 PM	Barbarian has replied

Barbarian

Inactive Member

Normal Thread Display

Message 107 of 120 (367805)
12-05-2006 2:14 PM

Reply to: Message 106 by Percy
12-04-2006 2:33 PM

Actually, it is not me who allocates and deallocates, I just forget about the objects I no longer use and the runtime library frees them when sees so fit. I am not given a chance to write my own suballocator, and considering the years that went into the generic garbage collector of OCaml, I would need lots of optimism to even try. GC is not usually a showstopper, as I would waste more time re-debugging the allocation-deallocation schemes in C whenever I change the code.

I agree it is unusual to have such an even spread of CPU usage. Maybe it would be advisable to try to profile it under Linux too - I did so only over Cygwin, the native-code compiler can produce code which writes logs for gprof to process, but I cannot exactly vouchsafe for the compatibility of gprof and the OCaml compiler.

OCaml can produce both interpretable and native code - the latter boasts the 50% of the performance of a decent C compiler. In my case only native code can be profiled, for the profiler of the interpreted code insists on no preprocessing whatsoever, and I use the standard camlp4o preprocessor for stream support.

I don't think the performance is that bad, actually; it is just not so lightning-fast as I imagined it to be. It's a few hundred births per second for a normal program, whereas I hoped for about 2000 births per second. I just produced a different profile which shows List.length as first, probably due to extensive actual evolution happening. Since you asked for the other functions towards the top of the profile, I quote it here. Functions above 1%, accounting for a total of 153sec out of 162sec total are shown:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name
 19.72     31.99    31.99                             camlList__length_aux_57
 17.39     60.21    28.22                             camlPervasives__$40_166
 13.38     81.92    21.71                             sweep_slice
 12.38    102.01    20.09                             mark_slice
  6.09    111.89     9.88                             caml_obj_dup
  4.54    119.25     7.36                             camlMachine__exec_instr_306
  3.09    124.27     5.01                             caml_oldify_mopup
  3.04    129.20     4.93                             caml_modify
  3.03    134.12     4.92                             caml_oldify_one
  2.23    137.74     3.62                             caml_fl_merge_block
  2.13    141.19     3.46                             caml_alloc_shr
  1.68    143.92     2.73                             caml_fl_allocate
  1.67    146.63     2.71                             caml_alloc_small
  1.45    148.99     2.35                             caml_oldify_local_roots
  1.43    151.31     2.32                             camlMachine__exec_aux_412
  1.34    153.47     2.17                             allocate_block

Only the two camlMachine_* ones are mine.

I do not think excessive optimization is worth it - I might try to remove excessive calls to List.length(), but only after the mysterious bug is caught.

This message is a reply to:
	Message 106 by Percy, posted 12-04-2006 2:33 PM		Percy has replied

Replies to this message:
	Message 108 by Percy, posted 12-05-2006 2:58 PM		Barbarian has replied

Barbarian

Inactive Member

Normal Thread Display

Message 109 of 120 (368129)
12-07-2006 4:22 AM

Reply to: Message 108 by Percy
12-05-2006 2:58 PM

More like 65%, since camlList__length_aux (with almost 20%) is a library function I might be calling unnecessarily too often. But yes, memory management takes about 65% of the time, and that is unexpectedly high. I suspect that the unusually high List.length() usage and the memory management issue are related somehow; most likely, the program, being excessively functional, keeps recomputing some value unnecessarily and I haven't found the place yet where this happens. It may also be that the code is not doing what I intended it to do, hence the excessive use of such functions. The conclusion is that I have to find the reason for this anomaly because it could be a symptom of a logical error; otherwise, the performance would not bother me that much.

I think the choice of language is justified. For one, I know this language fairly well, so there is no overhead of library lookups etc. Then, I am sure these results are abnormal in some sense, so if I find the mistake, everything is well. Finally, the time I lose when running the program is more than compensated for what I would waste debugging a manual memory management scheme after every change in the code, and changes are bound to happen fairly often to this code.

As for comparison with Perl, I really cannot tell; I haven't used Perl for years now and consequently have little relevant experience. I do not think the OCaml native code compiler is so simple - to me it looks like a bona fide optimizing compiler, which only calls routines for memory management, otherwise operations are performed in generated code. Most of the OCaml library is written in OCaml anyway, and there are extremely few built-in functions.

This message is a reply to:
	Message 108 by Percy, posted 12-05-2006 2:58 PM		Percy has not replied

Barbarian

Inactive Member

Normal Thread Display

Message 110 of 120 (369239)
12-12-2006 6:50 AM

The tool is ready for production

Have found the source of the mysterious bug, fixed it and the tool seems to be able to run now for hours without hitting any errors. IOW, it is ready to use. I could add some cosmetic touches including even more comments to the source code, but all that is optional, really.

As a bonus, unnecessarily complex self-replicating proglets can now evolve into the minimal self-replicating one without throwing exceptions. To illustrate this, let me show a sample session (with the tool run in a DOS command window) aimed at evolving the shortest possible self-replicating proglet from a complex one:

D:\My Data\TheZone\Misc>evot
Evot launched
(evot) help
    The following are valid commands
    newworld SELECTOR FILENAME*
                           : create new world
    load FILENAME          : load world from file
    save FILENAME          : save world to file
    sels                   : list available selectors
    newpop SELECTOR        : add new population based on selector
    back                   : move active population pointer back
    forward                : move active population pointer forward
    cut                    : cut all populations before the current one
    drop                   : drop all populations above the current one
    do COUNT               : run COUNT generations' worth of evolution
    run                    : run evolution until interrupted
    try THRESHOLD          : run until last population reaches threshold
    pops                   : enumerates populations
    mems                   : lists all members
    mems FIRST LAST        : lists members between FIRST and LAST
    mems COUNT             : lists the topmost COUNT members
    list INDEX             : disassemble member identified by INDEX
    dbug INDEX             : launch identified member in debugger
    dump INDEX FILENAME    : save member code to file
    size SIZE              : set max size of current population
    globsize SIZE          : set max size of new populations
                    : print short status
    quit                   : leave program
    help or '?'            : this list
(evot) sels
    Selectors:
    ----------
    Null (max 1) = Accepts any self-reproducing program
    SegmentCount (max 1073741823) = Counts the segments of the program
    Stomp (max 36) = Selects for smaller max segment length
(evot) newworld Stomp "reprod2.gel"
    Current world: 0 cut and 1 active populations, 1 total
    Global max fitness = 9
    Current population is #1
    Selected by 'Stomp', size = 1, top fitness = 9
(evot) try 36
    Top fitness = 36, generation #367, target reached
    Current world: 0 cut and 1 active populations, 1 total
    Global max fitness = 36
    Current population is #1
    Selected by 'Stomp', size = 200, top fitness = 36
(evot)

(evot) is the prompt of the evo-tool. The first command asks for a listing of available commands, just to show off. The second one asks for a listing of installed selectors/fitness evaluators. The third one instructs the tool to set up a new world with one single population, having Stomp as a selector and the proglet whose source can be found in the file "reprod2.gel" as the only individual. The Stomp selector grants a fitness of zero to any proglet which fails to self-replicate, otherwise grants two points for every loss of an instruction from the longest segment, one point for every loss of a complete segment and applies a five-point penalty for proglets finishing by timeout.

The fourth command instructs the tool to run evolution until an individual with fitness =36 appears (36 is the theoretical maximum for self-replicating proglets under Stomp). It took about two minutes to produce the following:

prog
    *Main
    specinstr 0
    specinstr 1
end

which is indeed the shortest possible self-replicating entity.

For your reference, here is the source listing ("reprod2.gel") of the proglet we started with (please refer to my post earlier where I explained the instructions, and please ignore the comments between (* and *), which contain the assertions used to prove the correctness of the proglet and hence find the bugs in it):

prog
    #0 newseg                   (* [*FirstData] *)
    *Main                       (* [*Main,*FirstData]  *)
Cycle1:                         (* [*Addr,*FirstData] *)
    #0 dup                      (* [*Addr,*Addr,*FirstData] *)
    *CopyToStack call           (* [count,val,val,...,val,*Addr,*FirstData] *)
    *CopyToNewSeg call          (* [*NewSeg,*Addr,*FirstData] *)
    specinstr 0                 (* -- *)
    drop                        (* [*Addr,*Firstdata] *)
    nextseg                     (* [*Addr',*FirstData] *)
    #1 dup                      (* [*FirstData,*Addr',*FirstData] *)
    #1 dup                      (* [*Addr',*FirstData,*Addr',*FirstData] *)
    *Done1 ifequal              (* [*Addr',*FirstData] *)
    *Cycle1 jump                (* -- *)
Done1:
    specinstr 1
                                (* *)
seg CopyToStack                 (* [*Addr|...] *)
                                (* [*Addr,|...] *)
    #0 dup                      (* [*Addr,*Addr,|...] *)
    *Done2 invalidaddr          (* [*Addr,|...] *)
    #0 dup                      (* [*Addr,*Addr,|...] *)
    load                        (* [i,*Addr,|...] *)
    swap                        (* [*Addr,i,|...] *)
    incr                        (* [*Addr',i,|...] *)
    *CopyToStack jump           (* [*Addr',i,|...] *)
Done2:                          (* [*Addr,|...] *)
    val                         (* [count,|...] *)
    ret                         (* -- *)
                                (* *)
seg CopyToNewSeg                (* [count,|...] *)
    #1 neg add                  (* [count-1,|...] *)
    swap                        (* [fistval,count-1,|...] *)
    newseg                      (* [*Addr,count-1,|...] *)
Cycle2:                         (* [*Addr,ccnt,|...] *)
    #1 dup                      (* [ccnt,*Addr,ccnt,|...] *)
    #0                          (* [0,ccnt,*Addr,ccnt,|...] *)
    *Done3 ifequal              (* [*Addr,cnt,|...] *)
    #2 lift                     (* [firstval,*Addr,cnt,|...] *)
    #1 dup                      (* [*Addr,firstval,*Addr,cnt,|...] *)
    push                        (* [*Addr,cnt,|...] *)
    swap                        (* [cnt,*Addr,|...] *)
    #1 neg add                  (* [cnt-1,*Addr,|...] *)
    swap                        (* [*Addr,cnt-1,|...] *)
    *Cycle2 jump                (* -- *)
Done3:                          (* [*Addr,0|...] *)
    swap drop                   (* [*Addr|...] *)
    ret                         (* -- *)

end

I might add that in every generation the intermediate form was able to write a perfect replica of itself, and that I find it amazing that there exists a line of single mutations leading from the long quoted proglet to this small one all the while being able to self-replicate. Otherwise, the complete mutation history is preserved for every individual, so it can be inspected for the benefit of the doubters.

Now the project can enter the second phase: proper selectors have to be implemented in order to evolve encryption. We should also return to the issue of how to distribute the source if anyone is interested in playing with it.

Edited by Barbarian, : missing space, missing 'end' from the short proglet

Replies to this message:
	Message 111 by Jazzns, posted 12-12-2006 12:26 PM		Barbarian has not replied

Barbarian

Inactive Member

Normal Thread Display

Message 113 of 120 (369299)
12-12-2006 1:35 PM

Reply to: Message 112 by Jazzns
12-12-2006 12:29 PM

Re: Email Exchange

If that is enough, I do agree with Percy giving my e-mail address to you, although it would also be enough if I got yours, and sent you the packed sources.

Concerning the number of generations needed for the evolution of a minimal replicator: in the particular case I posted, it took 367 generations, but I have seen extremes like 210 and 1580 generations too - the framework re-seeds the random number generator at startup, so evolution is not repeatable (used to be, during debugging ...) and every run takes different paths and different generation counts. The range depends on the ur-proglet, as the first one I wrote only needed about 8-10 generations to get to the minimal case. But the one I posted above calls two subroutines in a cycle, the first subroutine copies a segment on the stack while the second one copies the stacked instructions to a newly allocated segment. In addition the subroutines are - in fact, must be - located in separate segments. From this we get a single-segment, no-subroutines program which then fairly quickly evolves into the minimal one; it takes a lot of time to do away with the subroutine mechanism, and the maximal fitness shows stagnation for the first 85-90% of the total time, after which it quickly goes up.

This message is a reply to:
	Message 112 by Jazzns, posted 12-12-2006 12:29 PM		Jazzns has replied

Replies to this message:
	Message 114 by Jazzns, posted 12-12-2006 1:46 PM		Barbarian has replied

Barbarian

Inactive Member

Normal Thread Display

Message 115 of 120 (369311)
12-12-2006 2:20 PM

Reply to: Message 114 by Jazzns
12-12-2006 1:46 PM

Picking random seeds

Done.

Invocation              Effect
--------------------------------------------------
evot                 => initializes seed from clock
evot -randseed 123   => seed set to 123
evot -randseed 0     => initializes seed from clock

This message is a reply to:
	Message 114 by Jazzns, posted 12-12-2006 1:46 PM		Jazzns has not replied

Barbarian

Inactive Member

Normal Thread Display

Message 117 of 120 (369542)
12-13-2006 2:02 PM

Reply to: Message 116 by Percy
12-12-2006 2:40 PM

I will be away from the internets from now till Friday afternoon (in CET = GMT + 1), so you don't have to hurry.

This message is a reply to:
	Message 116 by Percy, posted 12-12-2006 2:40 PM		Percy has not replied

Replies to this message:
	Message 118 by Barbarian, posted 12-19-2006 11:29 AM		Barbarian has not replied

Barbarian

Inactive Member

Normal Thread Display

Message 118 of 120 (370859)
12-19-2006 11:29 AM

Reply to: Message 117 by Barbarian
12-13-2006 2:02 PM

Is it possible that we are waiting for each other in a deadlock? For instance, did anyone send me an e-mail and holds now the vain hope that it got past the draconian spamfilter of my ISP?

This message is a reply to:
	Message 117 by Barbarian, posted 12-13-2006 2:02 PM		Barbarian has not replied

Replies to this message:
	Message 119 by Jazzns, posted 12-19-2006 11:41 AM		Barbarian has not replied

Date format: mm-dd-yyyy

Timezone: ET (US)

Newer Topic | Older Topic

Do Nothing Button

3 online now:	kjsimons, Minnemooseus (Adminnemooseus), PaulK
Newest Member:	ChatGPT
Post Volume:	Total: 916,916 Year: 4,173/9,624 Month: 1,044/974 Week: 3/368 Day: 3/11 Hour: 0/0