Register | Sign In


Understanding through Discussion


EvC Forum active members: 61 (9209 total)
3 online now:
Newest Member: The Rutificador chile
Post Volume: Total: 919,503 Year: 6,760/9,624 Month: 100/238 Week: 17/83 Day: 0/8 Hour: 0/0


Thread  Details

Email This Thread
Newer Topic | Older Topic
  
Author Topic:   DNA similarity between Chimpanzee and Human 70%
RAZD
Member (Idle past 1662 days)
Posts: 20714
From: the other end of the sidewalk
Joined: 03-14-2004


Message 16 of 32 (720014)
02-19-2014 4:35 PM
Reply to: Message 13 by Telesto
02-19-2014 4:03 PM


Re: One word: Gaps
Hi Telesto, and welcome to the fray.
First of all. There is no ONE result number. The algorithm compared about 650000 sequences with about 400 million bases in summary (Human Y is 60 million bases long, Chimp Y is about 20 million bases long). So compared sequences overlaped many times. I also got mismatch bases. Each sequence had (in my case) percentage identity, sequence length and number of mismatch bases. For example:
97.3% 4552 105
My guess is that the algorithm is similar to other matching algorithms (such as tree rings) ...
So they take one as the baseline and then compare the second one starting with matching both at one end and then shifting the second one along the first one base at a time, recording the degree of matching for each step.
The DNA likely has a lot of regions that were duplicated and then modified, so those would produce matches with lower percentages.
Enjoy
... as you are new here, some posting tips:
type [qs]quotes are easy[/qs] and it becomes:
quotes are easy
or type [quote]quotes are easy[/quote] and it becomes:
quote:
quotes are easy
also check out (help) links on any formatting questions when in the reply window.
For other formatting tips see Posting Tips
For a quick overview see EvC Forum Primer
If you have problems with replies see Report Discussion Problems Here 3.0

we are limited in our ability to understand
by our ability to understand
Rebel American Zen Deist
... to learn ... to think ... to live ... to laugh ...
to share.


Join the effort to solve medical problems, AIDS/HIV, Cancer and more with Team EvC! (click)

This message is a reply to:
 Message 13 by Telesto, posted 02-19-2014 4:03 PM Telesto has replied

Replies to this message:
 Message 23 by Telesto, posted 02-19-2014 6:32 PM RAZD has seen this message but not replied

  
Taq
Member
Posts: 10302
Joined: 03-06-2009
Member Rating: 7.1


Message 17 of 32 (720017)
02-19-2014 4:52 PM
Reply to: Message 15 by Telesto
02-19-2014 4:27 PM


Re: More Please
The only difference between gapped and ungapped sequences would be in total length of longest sequences
That isn't true. Go back to my post in message #8. Using those sequences, the best hit for the gapped alignment would be 88% similarity. The best hit for the ungapped alignment would be 29%. This isn't because of different length sequences, or comparing different parts of the genome. This is comparing the same two sequences using different parameters.
The creationist article biases their methodology by excluding indels. There is no way around it. They do this in order to get a lower percentage for similarity. They use a different methodology that they know will falsely return a lower percentage, and is different than the methodology used in the other papers.
It's not as if the author re-sequenced the genomes from scratch and found out that the scientists had reported the wrong sequence. They are using deception to con people that aren't familiar with genetics.
Compare two identical chromosomes - expected result is 100%.
No, it isn't. Different chromsomes have diverged at different rates. There is no expectation that the similarities will be the same for a comparison of any two chromosomes.
But first of all I need to get 43% similarity for chromosome Y.
The Y chromosome has 50 million bases, or just 1.6% of the total genome. You do know this, right?
Let's put this another way. If I said that the average life expectancy was 85 years old, could I prove this wrong by pointing to a baby that died at 1 year old? If I said that the average life expectancy was 85, does this mean that everyone dies at 85, and at no other age?
Yes... rat and mouse are more different than human and chimp.
More importantly, chimp and gorilla are more different than chimp and human. Chimp and orangutan are more different than chimp and human. No species is closer to chimps than humans.

This message is a reply to:
 Message 15 by Telesto, posted 02-19-2014 4:27 PM Telesto has replied

Replies to this message:
 Message 18 by Dr Adequate, posted 02-19-2014 5:17 PM Taq has not replied
 Message 20 by Telesto, posted 02-19-2014 6:21 PM Taq has not replied

  
Dr Adequate
Member
Posts: 16113
Joined: 07-20-2006


Message 18 of 32 (720021)
02-19-2014 5:17 PM
Reply to: Message 17 by Taq
02-19-2014 4:52 PM


Re: More Please
No, it isn't. Different chromsomes have diverged at different rates.
He means identical. As a way of calibrating the method --- if he gives it two identical bits of data, it should give him 100% as an answer, or there's something wrong with it.

This message is a reply to:
 Message 17 by Taq, posted 02-19-2014 4:52 PM Taq has not replied

  
Dr Adequate
Member
Posts: 16113
Joined: 07-20-2006


Message 19 of 32 (720022)
02-19-2014 5:20 PM
Reply to: Message 15 by Telesto
02-19-2014 4:27 PM


Re: More Please
I am not sure if this algorithm was used in the particular research.
It wasn't, but they cite it approvingly and the code is there for you to use.

This message is a reply to:
 Message 15 by Telesto, posted 02-19-2014 4:27 PM Telesto has replied

Replies to this message:
 Message 21 by Telesto, posted 02-19-2014 6:25 PM Dr Adequate has not replied

  
Telesto
Junior Member (Idle past 3893 days)
Posts: 10
From: Zlín
Joined: 02-03-2014


Message 20 of 32 (720025)
02-19-2014 6:21 PM
Reply to: Message 17 by Taq
02-19-2014 4:52 PM


Re: More Please
First of all... I think we don't understand each other. Probably it is caused by my english - as you realized I am not native speaker.
That isn't true. Go back to my post in message #8. Using those sequences, the best hit for the gapped alignment would be 88% similarity. The best hit for the ungapped alignment would be 29%.
I completly understand. But I think that this is not the case for blastn algorithm used in the research. I made similar experiment. I created two identical strings 50 bases long. Then I deleted one base in second one on 25 position. So that the second string has only 49 bases and is shifted with one base.
I understand what you have told me about overall differences. But lets try to use blastn with parameter -ungapped and -word_size 11. The results are below (numbers: percent identical, sequence length, mismatch bases):
1) For identical strings - 1 hit
100.00 50 0
2) Second string shorten in the middle - 2 hits
100.00 25 0
100.00 24 0
3) One base changed in the middle - 1 hit
98.00 50 1
These are results from blastn. What now? What is it saying?
No, it isn't. Different chromsomes have diverged at different rates. There is no expectation that the similarities will be the same for a comparison of any two chromosomes.
I was talking about exactly the same chromosomes (e.g. Human Y vs. Human Y).
The Y chromosome has 50 million bases, or just 1.6% of the total genome. You do know this, right?
Sure I know. I chose this chromosome because of its length and because in the research there was smallest similarity. I know this has a little impact for whole genome. But they used in the paper also chromosome Y separately and their result was 43%. I tried to get this number also.
More importantly, chimp and gorilla are more different than chimp and human. Chimp and orangutan are more different than chimp and human. No species is closer to chimps than humans.
I meant the difference between rat vs. mouse is larger than between chimp vs. human.

This message is a reply to:
 Message 17 by Taq, posted 02-19-2014 4:52 PM Taq has not replied

  
Telesto
Junior Member (Idle past 3893 days)
Posts: 10
From: Zlín
Joined: 02-03-2014


Message 21 of 32 (720026)
02-19-2014 6:25 PM
Reply to: Message 19 by Dr Adequate
02-19-2014 5:20 PM


Re: More Please
It wasn't, but they cite it approvingly and the code is there for you to use.
You'r right. They didn't use it. However I tried to use these scripts and it seems it calculate something (I hate perl )
I tried to use it on some reference sequences but I failed. I am not sure what values I should set. Perl is quite difficult to read for me

This message is a reply to:
 Message 19 by Dr Adequate, posted 02-19-2014 5:20 PM Dr Adequate has not replied

  
Taq
Member
Posts: 10302
Joined: 03-06-2009
Member Rating: 7.1


Message 22 of 32 (720028)
02-19-2014 6:28 PM
Reply to: Message 13 by Telesto
02-19-2014 4:03 PM


Re: One word: Gaps
I am not sure that the blastn algorithm compute the sequence as you described.
It does. If you leave out gaps you will have a much lower score than if gaps are included.
I didn't try this with gaps (indels) - I uses parameter -ungapped as they did. I think the number would be similar anyway.
Actually, sfs over at Christian Forums has already done some of the leg work. sfs also happens to be an author on the chimp genome paper, for what it is worth.
In message 56 he writes:
"I checked: the low percentage of matches does in fact result from only looking for ungapped alignments. I downloaded the human and chimpanzee genomes and the BLAST executable. As a test set, I pulled 500 randomly sampled, non-overlapping slices from chimpanzee chromosome 12, each 300 base pairs long. After dropping any slices that contained unknown sequence (i.e. 'N's), I had 471 test sequences. I fed these into BLASTN against human chromosome 12, using the parameters specified by Tomkins, with and without allowing gaps in the alignment. With no gaps, 68% of my queries yielded matches, in good agreement with Tomkins's finding. With gaps allowed, 100% of queries matched; of these, one or two were of poor quality and likely represent random matches. So the actual matching rate, when doing a proper alignment, was 99.6%."
Error | Christian Forums
It has already been confirmed that changing from ungapped to gapped makes a huge difference.

This message is a reply to:
 Message 13 by Telesto, posted 02-19-2014 4:03 PM Telesto has replied

Replies to this message:
 Message 28 by Telesto, posted 02-20-2014 10:46 AM Taq has replied

  
Telesto
Junior Member (Idle past 3893 days)
Posts: 10
From: Zlín
Joined: 02-03-2014


Message 23 of 32 (720029)
02-19-2014 6:32 PM
Reply to: Message 16 by RAZD
02-19-2014 4:35 PM


Re: One word: Gaps
Hi RAZD,
So they take one as the baseline and then compare the second one starting with matching both at one end and then shifting the second one along the first one base at a time, recording the degree of matching for each step.
Yes I understand. Do you think that it is possible to get overall genetical similarity with such method (gapped or ungapped)? I think the blast algorithm is not created for this purpose. Anyway I would like to get the numbers from the research (even if they are wrong).
It is bad that I don't know what to do with all the numbers I got. What is the algorithm to get one number that represent overall similarity. I always got thousands of numbers. How they got 43% from these numbers? I have no idea...

This message is a reply to:
 Message 16 by RAZD, posted 02-19-2014 4:35 PM RAZD has seen this message but not replied

Replies to this message:
 Message 24 by Taq, posted 02-19-2014 7:01 PM Telesto has not replied

  
Taq
Member
Posts: 10302
Joined: 03-06-2009
Member Rating: 7.1


Message 24 of 32 (720030)
02-19-2014 7:01 PM
Reply to: Message 23 by Telesto
02-19-2014 6:32 PM


Re: One word: Gaps
Yes I understand. Do you think that it is possible to get overall genetical similarity with such method (gapped or ungapped)? I think the blast algorithm is not created for this purpose. Anyway I would like to get the numbers from the research (even if they are wrong).
It is bad that I don't know what to do with all the numbers I got. What is the algorithm to get one number that represent overall similarity. I always got thousands of numbers. How they got 43% from these numbers? I have no idea...
sfs over at CF had a good analogy for gapped v. ungapped.
Let's say that you had a 2 books, each with 1,000 pages. When you begin looking at the 2 books you realize that they are nearly identical. The only difference between the 2 books is that there is an extra space smack dab in the middle of one of the books. Every word, letter, and piece of punctuation is otherwise identical.
Now, would you say that these two books are nearly 100% identical? Tomkins would say no. He would say that the two books are only 50% identical. Why? Because he ignores the extra space which puts every letter one space off so that they no longer match up. That is how ridiculous Tomkin's comparison is.

This message is a reply to:
 Message 23 by Telesto, posted 02-19-2014 6:32 PM Telesto has not replied

  
saab93f
Member (Idle past 1651 days)
Posts: 265
From: Finland
Joined: 12-17-2009


(1)
Message 25 of 32 (720062)
02-20-2014 1:10 AM
Reply to: Message 8 by Taq
02-19-2014 1:26 PM


Re: One word: Gaps
The author of the creationist paper has rigged the methodology to ignore gaps, and therefore return a false result.
The projection in the rest of the paper is also worth discussing, but this is the one major issue that the paper has and so it should be discussed first.
I wonder how the creationistis reconcile their utter and total lack of integrity with their preconception of moral superiority compared to "secular scientists"?
The scientific community should raise their voice a notch or three and really hammer this deceitful nature of creationism so that every layman can understand it.
Loathable folks them cretins...

This message is a reply to:
 Message 8 by Taq, posted 02-19-2014 1:26 PM Taq has not replied

Replies to this message:
 Message 27 by Pressie, posted 02-20-2014 3:17 AM saab93f has seen this message but not replied

  
Pressie
Member (Idle past 232 days)
Posts: 2103
From: Pretoria, SA
Joined: 06-18-2010


(1)
Message 26 of 32 (720068)
02-20-2014 3:01 AM


Course in genetics
Thanks guys for all the free education.
I'm about six months into my genetics course and I'm starting to understand what you are trying to say, even though I'm not near the level of even attempting a post on genetics here yet! So much to learn.
Edited by Pressie, : Spelling

  
Pressie
Member (Idle past 232 days)
Posts: 2103
From: Pretoria, SA
Joined: 06-18-2010


Message 27 of 32 (720069)
02-20-2014 3:17 AM
Reply to: Message 25 by saab93f
02-20-2014 1:10 AM


Re: One word: Gaps
quote:
The scientific community should raise their voice a notch or three and really hammer this deceitful nature of creationism so that every layman can understand it.
I actually agree with you.
However, I don't think that a lot of scientists are really interested in taking note or even contemplating commenting on the ramblings of crazy people. Those scientists who do that are spread very thin. Especially in countries where creationists are an endangered species.
Those scientists who do read creationist ramblings do it for the fun of it. It's like an early morning dose of comedy just to wake up laughing.

This message is a reply to:
 Message 25 by saab93f, posted 02-20-2014 1:10 AM saab93f has seen this message but not replied

  
Telesto
Junior Member (Idle past 3893 days)
Posts: 10
From: Zlín
Joined: 02-03-2014


(1)
Message 28 of 32 (720097)
02-20-2014 10:46 AM
Reply to: Message 22 by Taq
02-19-2014 6:28 PM


Re: One word: Gaps
Hi Taq,
It does. If you leave out gaps you will have a much lower score than if gaps are included.
Well I made simple application that works as follows:
1) Referenced (subject) chromosome is Human chromosome
2) It takes 500 subsequences from Chimp chromosome each 300 bases long (as it was in your quoted comment).
3) Blastn uses these attributes (as it was used in the creationist research paper): -word_size 11 -evalue 10 -num_alignments 1 -dust no -soft_masking false
4) Parameter "-ungapped" is optional and I made two experiments with this parameter and without it.
5) And the calculation. I have no idea what should I calculate. But I made a few calculations:
a) First of all I check if the Chimp subsequence matched. I am not sure what can be considered as MATCH. I guess match means the whole subsequence was found. In this case match is 300 bases long (or longer if I use gaps). If matched subsequence was shorter I counted is as "not match". For example:
Best sequence is 298 bases long with 5 mismatch. - NOT match
Best sequence is 300 bases long with 2 mismatch - MATCH
In the end I calculated the percentage of matched sequences according to the above logic. I think this number has nothing to do with the whole genome comparison. It just says how many 300 (or more) bases long similar subsequences of Chimp chromosome was found in Human chromosome.
b) Then I was trying to calculate some relevant similarity percentage. First number was taken only from matched subsequences. Subsequences shorter than 300 bases were completly ignored. From these numbers I take the best match. Longest sequence with the lowest number of mismatch. Example:
300 - 5
298 - 1
300 - 2
the winner is 300 - 2
I summarized all these bases and compared them with summarized mismatch.
This is I think not much useful. It ignores shorter sequences that were found. For example if in the result file is the best match 289 - 2, it is ignored.
c) Next number took into account also shorter sequences, but the rest of bases were added. The missing were counted as mismatch. For example:
Best match from result file 289 - 2 was recalculated to 300 - 13
Not sure if this is right...
d) Next number was taken from number as they were in result file. Example:
best match from result file 289 - 2 was not changed. In the end it was compared with exactly the same number of bases and mismatch. No changes...
e) The last number was calculated also from all steps in experiment - matched (300 or longer) and not matched (shorter) sequences. However if the sequence was marked as not matched (shorter) the number was calculated as completly wrong. Example:
best match 289 - 3 was marked as not aligned and calculated in sum as 300 - 300 (300 bases long with 300 mismatch = 0% similarity)
And here are results for chromosome Y:
1) Ungapped!
a) Matched vs. all: 234/500 => 46,80%
b) Only matched similarity: 97.8%
c) All results, calculated as full 300 bases long: 81.43%
d) All results as they were, no recalculation: 96.03%
e) All results, with 100% penalty: 45.77%
2) Gapped
a) Matched vs. all: 359/500 => 71.80%
b) Only matched similarity: 97.12%
c) All results, calculated as full 300 bases long: 91.14%
d) All results as they were, no recalculation: 95.76%
e) All results, with 100% penalty: 69.81%
So... What is right what is wrong. The only think I can see is the number 45.77% similarity that is very close to 43% reported in research paper. Of course this number is nonsense - but that is another story
I hope you understand to my "methodology". Or is there better approach?
As you can see gapped was better but not much. I think the most representative number is d) Calculated as it was with no recalculation and no penalty. But with ungapped parameter the results were better 96.03% than with gaps 95.76%. But both very close.
I would like to do the same experiment for chromosome 1. But it will take much more time as it is 250 MB large (in contrast to 60 MB of human chromosome Y).

This message is a reply to:
 Message 22 by Taq, posted 02-19-2014 6:28 PM Taq has replied

Replies to this message:
 Message 29 by Taq, posted 02-20-2014 11:04 AM Telesto has replied

  
Taq
Member
Posts: 10302
Joined: 03-06-2009
Member Rating: 7.1


Message 29 of 32 (720106)
02-20-2014 11:04 AM
Reply to: Message 28 by Telesto
02-20-2014 10:46 AM


Re: One word: Gaps
As you can see gapped was better but not much.
Not much? You went from 47% to 72% for matches. I would call that a pretty massive jump, especially given that Tomkins is comparing a 70% match to 95% similarity.
As cited above, sfs has already run it and he is more familiar with BLAST batch runs than either of us are. He gets results very close to Tomkins for the ungapped alignments, and near 100% results for gapped. I would call that a real problem for Tomkins.

This message is a reply to:
 Message 28 by Telesto, posted 02-20-2014 10:46 AM Telesto has replied

Replies to this message:
 Message 30 by Telesto, posted 02-20-2014 11:20 AM Taq has replied

  
Telesto
Junior Member (Idle past 3893 days)
Posts: 10
From: Zlín
Joined: 02-03-2014


Message 30 of 32 (720111)
02-20-2014 11:20 AM
Reply to: Message 29 by Taq
02-20-2014 11:04 AM


Re: One word: Gaps
Not much? You went from 47% to 72% for matches. I would call that a pretty massive jump, especially given that Tomkins is comparing a 70% match to 95% similarity.
Well for matching yes. But I hoped for allmost 100% according to sfs resluts. But I know that human Y chromosome is most diverse. So I will wait for results of other chromosomes.
But I am really curios about these numbers. Do you really think that Tomkins compare number of matches with similarity? Unbelivable... I hoped not, but from my preliminary results it really looks like he did it.
I will try to contact sfs

This message is a reply to:
 Message 29 by Taq, posted 02-20-2014 11:04 AM Taq has replied

Replies to this message:
 Message 32 by Taq, posted 02-20-2014 11:31 AM Telesto has not replied

  
Newer Topic | Older Topic
Jump to:


Copyright 2001-2023 by EvC Forum, All Rights Reserved

™ Version 4.2
Innovative software from Qwixotic © 2024