Register | Sign In


Understanding through Discussion


EvC Forum active members: 65 (9162 total)
1 online now:
Newest Member: popoi
Post Volume: Total: 915,806 Year: 3,063/9,624 Month: 908/1,588 Week: 91/223 Day: 2/17 Hour: 0/0


Thread  Details

Email This Thread
Newer Topic | Older Topic
  
Author Topic:   Gitt Information from Evolution FairyTale
Admin
Director
Posts: 12995
From: EvC Forum
Joined: 06-14-2002
Member Rating: 2.3


Message 1 of 7 (468181)
05-28-2008 8:00 AM


A discussion thread about Gitt information in the Free Scientists Lecture !, Dr. Werner Gitt Genetic Information Specialist thread at the Evolution FairyTale's Discussion Board was temporarily closed by a moderator for being off-topic, and I'm inviting the participants to pick up discussion here if they're so inclined.

--Percy
EvC Forum Director

  
Percy
Member
Posts: 22388
From: New Hampshire
Joined: 12-23-2000
Member Rating: 5.2


Message 2 of 7 (468555)
05-30-2008 8:30 AM


Well, it doesn't look like any of the thread's participants are interested in resuming here, so I'll describe the last issue being discussed in case anyone here wishes to comment.
Deadlock posed this problem:
deadlock writes:
Please, show me what is the probability of 3 mutations happen in 3 specific spots in a bacteria genome of 10^7 positions in 10^6 tries.
And when I gave the "wrong answer" he responded with this solution:
deadlock writes:
It’s a Bernoulli Distribution.
P(Y) = Cn,y * p^y*q^n-y
y = number of success
n = number of tries
p -> We have three possible bases to change in each position, so the probability of mutating a specific base in a specific point is : 1/3^(10^7)
q -> 1 - p
so P( Y >= 3 ) = 1 - ( P(Y=0) + P(Y=1) + P(Y=2) )
Where Deadlock writes "Cn,y" he means combination, i.e., "n things taken y at a time".
As part of his solution Deadlock claims that the probability of a mutation at a specific point in the genome is (1/3)107 (nice to be back in a land where HTML in messages is legal). When the thread was closed I was attempting to explain to Deadlock why this was incorrect, that this value is many thousands of orders of magnitude smaller than a goolgol-th and is ridiculous. But he stood by it, then the thread was closed.
Anyone have any idea where Deadlock might have gotten the idea that this was the correct probability? What's the correct value, and how would you persuade Deadlock that it is the correct value?
--Percy
Edited by Percy, : Grammar.

Replies to this message:
 Message 3 by PaulK, posted 05-30-2008 2:39 PM Percy has replied

  
PaulK
Member
Posts: 17822
Joined: 01-10-2003
Member Rating: 2.3


Message 3 of 7 (468579)
05-30-2008 2:39 PM
Reply to: Message 2 by Percy
05-30-2008 8:30 AM


To put the obvious first it's so clear that he meant 1/(3 * 10^7) that I'd have asked him straight out if it was a typo. If he insisted then I'd point out that 3 base pairs per codon for 10^7 codons is obviously 3*10^7 base pairs. If that didn't work I'd ask him to explain himself to find out just what absurdity he had in mind.
Looking at the thread:
quote:
It’s a Bernoulli Distribution.
P(Y) = Cn,y * p^y*q^n-y
y = number of success
n = number of tries
That works if the probabilities p and q are constant. Because he insists on mutations in 3 presumably different bases it isn't that simple. It can be used as an estimate, but there is a corrective factor that needs to be applied.
quote:
p -> We have three possible bases to change in each position, so the probability of mutating a specific base in a specific point is : 1/3^(10^7)
With the correction noted above this would be a reasonable probability if a "try" was a mutation. (i.e. if a mutation occurs, assuming equiprobability, the chance of it occurring at a particular location is 1 divided by the number of locations). Unfortunately he states:
A try is a reproduction event.The only moment when a mutation can happen and be passed on
In which case he should forget about the genome size and just use the per-base probability of mutation (a simpler calculation, since there is no need to invoke combinatorials)
Assuming that a "try" is a mutation:
[qs] q -> 1 - p
so P( Y >= 3 ) = 1 - ( P(Y=0) + P(Y=1) + P(Y=2) ) [/quote]
The probability p is the probability of getting ONE specific mutation. But we don't want one specific mutation three times. We want 1 of three mutations, then one of the remaining two, then the last one (the order doesn't matter). SInce there are 6 ways that this could happen the real probability is about 6 times higher.
I estimate it as about 4*10-5 (if I did the calculation right). A simpler estimate (multiply the probability of one occurring by 10^6 and cube the result) comes out at about the same. The real probability ought to be a bit lower, but not greatly so.

This message is a reply to:
 Message 2 by Percy, posted 05-30-2008 8:30 AM Percy has replied

Replies to this message:
 Message 4 by Percy, posted 05-30-2008 4:49 PM PaulK has replied

  
Percy
Member
Posts: 22388
From: New Hampshire
Joined: 12-23-2000
Member Rating: 5.2


Message 4 of 7 (468596)
05-30-2008 4:49 PM
Reply to: Message 3 by PaulK
05-30-2008 2:39 PM


PaulK writes:
To put the obvious first it's so clear that he meant 1/(3 * 10^7) that I'd have asked him straight out if it was a typo. If he insisted then I'd point out that 3 base pairs per codon for 10^7 codons is obviously 3*10^7 base pairs. If that didn't work I'd ask him to explain himself to find out just what absurdity he had in mind.
I think he may really have meant (1/3)107, because 107 is the number of base pairs in his genome. I would argue that he was thinking, "There are three wrong bases for any position, and the probability of getting one of those three wrong bases is 1/3. So the probability of the first base pair in the genome being wrong is 1/3, and the probability of the second base pair being wrong is 1/3, and so forth, so multiple 1/3 by itself 107 times."
Which is, course, completely bogus.
What he really wanted to do if he was determined to use that style of approach was properly figure out the probability of an error for a base pair, and 10-8 seemed to be an acceptable figure to him. This means that the probability of all the base pairs being right would be:
(1-10-8)107
This happens to be around .9, which means there's about a .1 probability of one or more mutations.
This is so similar to his (1/3)107 that it makes it seem likely that he was trying to use this particular approach, but what he did was both wrong (the 1/3) and too simplistic (this has to be calculated, it isn't something simple that you write down off the top of your head).
So the probability of three mutations at specific predesignated positions in a single reproductive event is:
(1-10-8)(107-3) * 10-83 = 9.08*10-25
I'm on the edge of my competency here, so let me know if I've gone wrong. Anyway, it is indeed a very small number. As I told Deadlock, predesignating the positions isn't the way evolution operates. Natural selection doesn't sit around waiting for beneficial mutations, it just works on what it has. And if you apply 106 Bernoulli trials like this you still get a very small number:
C(106,1) * 9.08*10-25 * (1 - 9.08*10-25)(106-1)) = 9.08*10-20
This is unsurprising because of the unlikelihood of three mutations in predesignated locations in a single reproductive event. As I explained to Deadlock several times, this isn't the way evolution works because of multiple offspring and generations, and because of the post facto fallacy (is that the proper name for this fallacy, calculating the odds of what really happened, like winning a lottery, when all the outcomes were equally unlikely but one of them has to happen?).
Anyway, that's my attempt at things, not the same answer you got, but I had a different interpretation of this problem. What do you think?
--Percy

This message is a reply to:
 Message 3 by PaulK, posted 05-30-2008 2:39 PM PaulK has replied

Replies to this message:
 Message 5 by PaulK, posted 05-30-2008 5:30 PM Percy has replied

  
PaulK
Member
Posts: 17822
Joined: 01-10-2003
Member Rating: 2.3


Message 5 of 7 (468599)
05-30-2008 5:30 PM
Reply to: Message 4 by Percy
05-30-2008 4:49 PM


If you use the probability of a mutation at a single base in a single reproductive event you can do the calculation relatively easily.
If the probability is p, to get mutations at three specific points in n reproductive events would be (1 - (1-p)^n)^3
If you assume that a specific replacement is required (not necessarily true, because of the redundancy in the genetic code) you get p by multiplying the chance of getting any mutation by the chance of getting the right replacement (you can assume 1/3 to simplify but some substitutions are more likely than others).
The probability of getting all three in a single reproductive event is obviously going to be p^3. So if the probability of a base pair mutating is 10^-8 the probability of all three mutating is 10^-24 - close to your estimate, but a much simpler calculation !
Your second calculation is also not the obvious approach. If you want to calculate the chance of getting all three in a single attempt in 10^6 reproduction events (itself an odd thing to do) you're normally just calculate the probability of it not happening and subtract from 1. i.e. 1 - (1-10^-24)^10^6 (which means you need 25 places of precision just to do the subtraction in the brackets !)

This message is a reply to:
 Message 4 by Percy, posted 05-30-2008 4:49 PM Percy has replied

Replies to this message:
 Message 6 by Percy, posted 05-30-2008 5:42 PM PaulK has replied

  
Percy
Member
Posts: 22388
From: New Hampshire
Joined: 12-23-2000
Member Rating: 5.2


Message 6 of 7 (468600)
05-30-2008 5:42 PM
Reply to: Message 5 by PaulK
05-30-2008 5:30 PM


PaulK writes:
Your second calculation is also not the obvious approach. If you want to calculate the chance of getting all three in a single attempt in 10^6 reproduction events (itself an odd thing to do) you're normally just calculate the probability of it not happening and subtract from 1. i.e. 1 - (1-10^-24)^10^6 (which means you need 25 places of precision just to do the subtraction in the brackets !)
Yep, that's the same answer I get, 9.08*10-19 (I counted decimal places wrong, the exponent of 10 in my previous message should have been -19, not -20).
I was trying to use the approach Deadlock insisted on using for Bernoulli trials, so it sounds like you're saying I did it properly? I'm amazed!
I used to be more intuitive with probabilities. I was able to follow the approach you used pretty easily, but I'm not sure I would have got there on my own. I use that skill very rarely these days and it is seriously rusty. Thanks for the help!
--Percy

This message is a reply to:
 Message 5 by PaulK, posted 05-30-2008 5:30 PM PaulK has replied

Replies to this message:
 Message 7 by PaulK, posted 05-30-2008 5:57 PM Percy has not replied

  
PaulK
Member
Posts: 17822
Joined: 01-10-2003
Member Rating: 2.3


Message 7 of 7 (468604)
05-30-2008 5:57 PM
Reply to: Message 6 by Percy
05-30-2008 5:42 PM


quote:
I was trying to use the approach Deadlock insisted on using for Bernoulli trials, so it sounds like you're saying I did it properly? I'm amazed!
Strictly speaking it's not exactly the same as the simpler calculation I used. It's the probabilty of it happening exactly once as opposed to the probability of it happening at least once. Not that it makes a lot of difference when the expected number of successes is << 1, and it's not clear that either one is more accurate anyway.
My own probability is pretty rusty although I occasionally do the odd exercise (and it was one of my better areas, too).
If you learn just one thing about probability it;'s that you have to be really, really careful about what you're doing and try to understand exactly what your calculations mean. Too many people just put down something that makes intuitive sense - and it badly wrong.
(If I can find the link I can point to a calculation that Lee Spetner - who really ought to have known better - badly bodged)

This message is a reply to:
 Message 6 by Percy, posted 05-30-2008 5:42 PM Percy has not replied

  
Newer Topic | Older Topic
Jump to:


Copyright 2001-2023 by EvC Forum, All Rights Reserved

™ Version 4.2
Innovative software from Qwixotic © 2024