|
Register | Sign In |
|
QuickSearch
EvC Forum active members: 65 (9162 total) |
| |
popoi | |
Total: 915,813 Year: 3,070/9,624 Month: 915/1,588 Week: 98/223 Day: 9/17 Hour: 5/1 |
Thread ▼ Details |
|
Thread Info
|
|
|
Author | Topic: Information for MPW | |||||||||||||||||||||||||||
Percy Member Posts: 22391 From: New Hampshire Joined: Member Rating: 5.2 |
Hi, :ae:!
I'm afraid what you're saying still makes no sense to me. I'll just focus on a small part:
It could be argued, I suppose, that simply defining the message set reduces uncertainty and therefore supplies information, but if we were to define a message set of 5 elements out of the literally infinite number of possible elements that exist in the universe's message set prior to definition, our calcluation would be:
When you say, "Simply defining the message set reduces the uncertainty," I think it reflects a fundamental misunderstanding of information theory. The message set is *always* predefined. What the sender wants to communicate to the receiver is not the message set, because they have pre-agreed on that, but individual messages from the message set. It isn't that the receiver doesn't know the message set, because he most certainly does! What he doesn't know, what is uncertain, is which message of the message set will be sent next. You also need to explain why you keep taking the log base 2 of any number that strikes your fancy, in this case 5 over infinity. --Percy
[Minor phrasing improvement. --Percy] [This message has been edited by Admin, 02-12-2004]
|
|||||||||||||||||||||||||||
:æ:  Suspended Member (Idle past 7184 days) Posts: 423 Joined: |
Percy writes:
Let's look at the general formula: You also need to explain why you keep taking the log base 2 of any number that strikes your fancy, in this case 5 over infinity. I agree that the message set is always predetermined, and my example of trying to calculate the selection of 5 messages out of infinite possible messages I think distracted from the real point which is that there is no meaningful calculation of information over an entire message set as a whole. Basically what I was trying to illustrate is that in defining a message set, you're selecting 5 messages out of an infinity of possible messages (because they could be literally anything), and that since the definition of the set is somewhat like a selection process, one might think that a meaningful calculation of information could be made over the set as a whole. So in my calculation, 5/∞ was supposed to be the value of pi, the probability that 5 messages would be selected out of an infinity of possible messages.
Percy writes:
Actually, more than one message can be sent in a single transmission, however uncertainty is not reduced as much as if only one message were sent. That's why you gain less information taking the negative log base 2 of 2/5 than you get taking the negative log base 2 of 1/5. What you're communicating from sender to receiver is not the message set, but a single message. Does that help? [This message has been edited by ::, 02-12-2004]
|
|||||||||||||||||||||||||||
Saviourmachine Member (Idle past 3553 days) Posts: 113 From: Holland Joined: |
Percy writes:
I'm afraid it's not falsified in that way. Your simply stating that an extra allele will increase information. Is this a new allele? What are the influences on the cell? What kind of information are you speaking about? Keep in mind that the Creationist argument is that evolution (meaning in this case reproduction and mutation) cannot increase information. This is clearly wrong. If the allele set size for a gene in a population is 8, then I=3. If there's a mutation and the allele set size grows to 9, then I=3.12. Creationist argument falsified. Two examples:1. this this 2. suppress this - Does encoding for the second 'this' add information? - Does the adding of 'suppress' add information? And the deletion of 'suppress'!? For the rest I agree with Percy, not with ae.
|
|||||||||||||||||||||||||||
Percy Member Posts: 22391 From: New Hampshire Joined: Member Rating: 5.2 |
It is possible that you're trying to make one point while I'm making another, but most aspects of your position seem wrong to me:
Selecting an allele for a gene during reproduction is equivalent to sending one message. The number of alleles in the population is the size of the message set for that gene. As Shannon says in his paper right on page 1, "The significant aspect is that the actual message is one selected from a set of possible messages." It is significant that Shannon goes on to deny your post facto claim in the very next sentence when he says, "The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design." Common sense tells us the same thing. In other words, the communication channel is designed to send one message from the message set. The communications channel must be of a size capable of sending one message at a time, and you can send consecutive messages in order to build more complex higher-level messages. An example: the ASCII character code. It contains 256 different characters, and requires log2256=8 bits of channel. A message consists of a single character. You can send consecutive messages (characters) in order to send higher level messages of greater complexity. The alleles for a gene of a population can be thought of in the same way. The message set size is equal to the number of different alleles in the population, call it a. The minimum number of bits of communications channel necessary to communicate a single message of message set size a is log2a. That means each organism needs at least log2a bits of capacity to store a single allele, and we can think of the minimum bit capacity as a measure of the amount of information present in the entire population for that gene location. We can think of this gene location as a communication channel for transmitting (through the reproductive process) the allele to the next generation. The analogy with the ASCII character code breaks down after that point. With heredity, more complex messages are sent through the addition of more genes with their own set of alleles, and not by sending more alleles of the same gene. But the principle should still be clear. --Percy
|
|||||||||||||||||||||||||||
Percy Member Posts: 22391 From: New Hampshire Joined: Member Rating: 5.2 |
Read what I said again. I'm talking about population allele set size:
Percy writes: Keep in mind that the Creationist argument is that evolution (meaning in this case reproduction and mutation) cannot increase information. This is clearly wrong. If the allele set size for a gene in a population is 8, then I=3. If there's a mutation and the allele set size grows to 9, then I=3.12. Creationist argument falsified. It doesn't matter how many individuals in the population have the "this" allele, it still only counts one toward allele set size. --Percy
|
|||||||||||||||||||||||||||
Percy Member Posts: 22391 From: New Hampshire Joined: Member Rating: 5.2 |
I was thinking about this some more, and I think I see now where I'm having the biggest problem with your approach. This is from your Message 12 (note that I've changed terminology form Shannon's "message" to your link's "symbol", which seems better suited):
Now say all of the members of the first set pass along their traits to one descendent and then die, except one member has two offspring and a new allele appears (the offspring are an a1 and an a5). Meaning now we could have 1 of the a5, 10 of a1, 10 of a2, 10 of a3, and 10 of a4. The probabilities shift so that the probability of a5 is 1/41, the probability of a1 is 10/41, a2 is 10/41 and so on. Pass them through the filter, and suppose that again only the a1's pass through. Now we have: What you describe in qualitative terms, both here and elsewhere, about the probability of symbols in our symbol set is correct, but your math is nonsense. Check out Shannon's original paper, or reread your own link, especially section Information, Entropy, and Uncertainty. You won't see anything like the games you're playing with your numerator. Determining the necessary channel width in bits for a symbol set of unequal probability is not a straightforward operation. It certainly isn't equal to the log base 2 of the probability of the particular symbol just sent. You may be doing this because you're confusing the information actually transmitted (eg, alleles inherited by an offspring) with communication channel capacity. You are correct that the minimum number of bits is much smaller for a very likely allele, but it is *not* the same thing as the minimum channel width necessary to transmit any allele of the set. Concerning unequal probabilities of symbols, this portion from your link is applicable:
Shannon and R.M. Fano independently developed an efficient encoding method, known as the Shannon-Fano technique, in which codeword length increases with decreasing probability of source symbol or source word. The basic idea is that frequently used symbols (like the letter E) should be coded shorter than infrequently used symbols (like the letter Z) to make best use of the channel, unlike ASCII coding where both E and Z require the same seven bits. Obviously you know this already, but note that heredity doesn't really work this way, at least not at the simple level that we should initially approach it. Just because a particular allele overwhelmingly dominates the rest and is 99% likely has no impact on the code length necessary to represent that allele. The gene of an individual must be capable of representing the most unlikely allele in the population. The gene of an individual is a static record, not a symbol in a message stream. We cannot gain the efficiencies of special codings because that cannot be done across generations (ie, that the only way you could completely decode the allele of an organism is to know the coding for the alleles of the ancestors back one or more generations - coding is a science in itself - many is the time that Shannon's equations have said that some symbol set with some probabilities can be represented in some incredibly small number of bits, but the equations don't tell you the encoding, only that it's possible, and there's the catch). The gene location doesn't change its size in each individual organism based on the likelihood of the inherited allele (in reality I wouldn't put it past nature to contain examples of doing just this, but we want to stick with a simple model for now). So we can only conclude that the most reasonable measure of information in a population of organisms is the log base 2 of the number of alleles for each gene, independent of the probability of those alleles because our channel capacity is fixed, at least for the scenarios we are considering at this time. --Percy
|
|||||||||||||||||||||||||||
:æ:  Suspended Member (Idle past 7184 days) Posts: 423 Joined: |
Alright, cool... it looks like we're making some good progress toward understanding eachother. I'm putting this response here to let you know that I've seen your most rencent posts, although I haven't really gone over them in depth. I don't think I'll be able to really get back into this until early next week, but I do intend to continue.
Have a great weekend and take care,::
|
|||||||||||||||||||||||||||
Saviourmachine Member (Idle past 3553 days) Posts: 113 From: Holland Joined: |
Analogy: If we have 10 text documents containing the words on line 1 (gene 1):
Percy writes:
So, if you want to disprove this, you've to look at the kind of information for at least the analogy above. Not some pretty simple thing like Shannon's definition. Keep in mind that the Creationist argument is that evolution (meaning in this case reproduction and mutation) cannot increase information. This is clearly wrong. See here for a definition of information I suggested in the thread about complexity.
|
|||||||||||||||||||||||||||
Percy Member Posts: 22391 From: New Hampshire Joined: Member Rating: 5.2 |
Saviourmachine writes: Then the number of alleles increases from 2 to 3. And 'information' is added. I doubt that, I doubt the use of Shannons definition of information in this context. That's your rebuttal - you doubt it?
So, if you want to disprove this, you've to look at the kind of information for at least the analogy above. Not some pretty simple thing like Shannon's definition. If you really think Shannon's definition is simple, or that it is insufficiently nuanced to represent genetic information, then I suggest you read his paper. Your approach in Message 21 of that other thread lacks rigour, has a confused presentation, and I'm not convinced time and energy invested in its decipherment would be repaid. --Percy
|
|||||||||||||||||||||||||||
Saviourmachine Member (Idle past 3553 days) Posts: 113 From: Holland Joined: |
Percy writes:
Now you're talking about genetic information. Doesn't it matter how the genes are decoded? In your claim you was talking about information in general. If you really think Shannon's definition is simple, or that it is insufficiently nuanced to represent genetic information, then I suggest you read his paper. I know something about telecommunication, compression techniques, and so on, of which Shannon's information type is the very basis. I doubt the use of it in the context of information contained by a biological system. For example, the informative value of this post isn't easy to calculate. You want to count the different words I use? Or the amount of letters? So, what the nucleotides are encoding does matter, where the alleles encode for does matter, the function of decoded proteins does matter. Lacks rigour, that's because I first want to define the terms in comprehensive way. You both are calculating things, without thinking about the context. I look at your maths the same way you did towards ea. I'm going to read Shannon's paper anyway, because the original papers are most often easy (and even nice) to read. He should have said something about the range of his type of information.
|
|||||||||||||||||||||||||||
Percy Member Posts: 22391 From: New Hampshire Joined: Member Rating: 5.2 |
Saviourmachine writes: Now you're talking about genetic information. Doesn't it matter how the genes are decoded? In your claim you were talking about information in general. I said that each allele of a gene in a population of organisms is a symbol of the symbol set, and the symbol set size a is equal to the number of unique alleles for that gene in the population. The information measure for that gene is I=log2a. That seems fairly specific. --Percy
|
|||||||||||||||||||||||||||
Percy Member Posts: 22391 From: New Hampshire Joined: Member Rating: 5.2 |
Hi, :ae:!
Saviourmachine's messages prompted another thought. Shannon information isn't really a measure of information, but only a measure of the minimum bits required to represent or transmit members of a symbol set. You tend toward examples where the set members have unequal probabilies, and this can yield very small values for I relative to symbol set size. But allele frequency in a population changes over time, while the genetic code for the alleles of a gene is fixed, so we must pessimistically assume equal probabilities for all set members, which always yields the largest values for I. We must also keep in mind that the Shannon equations yield a minimum possible value. Genetic codes are nowhere near so dense, though some of the redundancy probably contributes to error tolerance. --Percy
|
|
|
Do Nothing Button
Copyright 2001-2023 by EvC Forum, All Rights Reserved
Version 4.2
Innovative software from Qwixotic © 2024