|
Register | Sign In |
|
QuickSearch
EvC Forum active members: 61 (9209 total) |
| |
The Rutificador chile | |
Total: 919,503 Year: 6,760/9,624 Month: 100/238 Week: 17/83 Day: 0/8 Hour: 0/0 |
Thread ▼ Details |
|
Thread Info
|
|
|
Author | Topic: DNA similarity between Chimpanzee and Human 70% | |||||||||||||||||||||||||||||||||||||||
RAZD Member (Idle past 1662 days) Posts: 20714 From: the other end of the sidewalk Joined: |
Hi Telesto, and welcome to the fray.
First of all. There is no ONE result number. The algorithm compared about 650000 sequences with about 400 million bases in summary (Human Y is 60 million bases long, Chimp Y is about 20 million bases long). So compared sequences overlaped many times. I also got mismatch bases. Each sequence had (in my case) percentage identity, sequence length and number of mismatch bases. For example: 97.3% 4552 105 My guess is that the algorithm is similar to other matching algorithms (such as tree rings) ... So they take one as the baseline and then compare the second one starting with matching both at one end and then shifting the second one along the first one base at a time, recording the degree of matching for each step. The DNA likely has a lot of regions that were duplicated and then modified, so those would produce matches with lower percentages. Enjoy
... as you are new here, some posting tips: type [qs]quotes are easy[/qs] and it becomes:
quotes are easy or type [quote]quotes are easy[/quote] and it becomes:
quote: also check out (help) links on any formatting questions when in the reply window. For other formatting tips see Posting TipsFor a quick overview see EvC Forum Primer If you have problems with replies see Report Discussion Problems Here 3.0 by our ability to understand Rebel American Zen Deist ... to learn ... to think ... to live ... to laugh ... to share. Join the effort to solve medical problems, AIDS/HIV, Cancer and more with Team EvC! (click)
|
|||||||||||||||||||||||||||||||||||||||
Taq Member Posts: 10302 Joined: Member Rating: 7.1 |
The only difference between gapped and ungapped sequences would be in total length of longest sequences That isn't true. Go back to my post in message #8. Using those sequences, the best hit for the gapped alignment would be 88% similarity. The best hit for the ungapped alignment would be 29%. This isn't because of different length sequences, or comparing different parts of the genome. This is comparing the same two sequences using different parameters. The creationist article biases their methodology by excluding indels. There is no way around it. They do this in order to get a lower percentage for similarity. They use a different methodology that they know will falsely return a lower percentage, and is different than the methodology used in the other papers. It's not as if the author re-sequenced the genomes from scratch and found out that the scientists had reported the wrong sequence. They are using deception to con people that aren't familiar with genetics.
Compare two identical chromosomes - expected result is 100%. No, it isn't. Different chromsomes have diverged at different rates. There is no expectation that the similarities will be the same for a comparison of any two chromosomes.
But first of all I need to get 43% similarity for chromosome Y. The Y chromosome has 50 million bases, or just 1.6% of the total genome. You do know this, right? Let's put this another way. If I said that the average life expectancy was 85 years old, could I prove this wrong by pointing to a baby that died at 1 year old? If I said that the average life expectancy was 85, does this mean that everyone dies at 85, and at no other age?
Yes... rat and mouse are more different than human and chimp. More importantly, chimp and gorilla are more different than chimp and human. Chimp and orangutan are more different than chimp and human. No species is closer to chimps than humans.
|
|||||||||||||||||||||||||||||||||||||||
Dr Adequate Member Posts: 16113 Joined: |
No, it isn't. Different chromsomes have diverged at different rates. He means identical. As a way of calibrating the method --- if he gives it two identical bits of data, it should give him 100% as an answer, or there's something wrong with it.
|
|||||||||||||||||||||||||||||||||||||||
Dr Adequate Member Posts: 16113 Joined: |
I am not sure if this algorithm was used in the particular research. It wasn't, but they cite it approvingly and the code is there for you to use.
|
|||||||||||||||||||||||||||||||||||||||
Telesto Junior Member (Idle past 3893 days) Posts: 10 From: Zlín Joined: |
First of all... I think we don't understand each other. Probably it is caused by my english - as you realized I am not native speaker.
That isn't true. Go back to my post in message #8. Using those sequences, the best hit for the gapped alignment would be 88% similarity. The best hit for the ungapped alignment would be 29%. I completly understand. But I think that this is not the case for blastn algorithm used in the research. I made similar experiment. I created two identical strings 50 bases long. Then I deleted one base in second one on 25 position. So that the second string has only 49 bases and is shifted with one base. I understand what you have told me about overall differences. But lets try to use blastn with parameter -ungapped and -word_size 11. The results are below (numbers: percent identical, sequence length, mismatch bases): 1) For identical strings - 1 hit100.00 50 0 2) Second string shorten in the middle - 2 hits100.00 25 0 100.00 24 0 3) One base changed in the middle - 1 hit98.00 50 1 These are results from blastn. What now? What is it saying?
No, it isn't. Different chromsomes have diverged at different rates. There is no expectation that the similarities will be the same for a comparison of any two chromosomes. I was talking about exactly the same chromosomes (e.g. Human Y vs. Human Y).
The Y chromosome has 50 million bases, or just 1.6% of the total genome. You do know this, right? Sure I know. I chose this chromosome because of its length and because in the research there was smallest similarity. I know this has a little impact for whole genome. But they used in the paper also chromosome Y separately and their result was 43%. I tried to get this number also.
More importantly, chimp and gorilla are more different than chimp and human. Chimp and orangutan are more different than chimp and human. No species is closer to chimps than humans. I meant the difference between rat vs. mouse is larger than between chimp vs. human.
|
|||||||||||||||||||||||||||||||||||||||
Telesto Junior Member (Idle past 3893 days) Posts: 10 From: Zlín Joined: |
It wasn't, but they cite it approvingly and the code is there for you to use. You'r right. They didn't use it. However I tried to use these scripts and it seems it calculate something (I hate perl ) I tried to use it on some reference sequences but I failed. I am not sure what values I should set. Perl is quite difficult to read for me
|
|||||||||||||||||||||||||||||||||||||||
Taq Member Posts: 10302 Joined: Member Rating: 7.1 |
I am not sure that the blastn algorithm compute the sequence as you described. It does. If you leave out gaps you will have a much lower score than if gaps are included.
I didn't try this with gaps (indels) - I uses parameter -ungapped as they did. I think the number would be similar anyway. Actually, sfs over at Christian Forums has already done some of the leg work. sfs also happens to be an author on the chimp genome paper, for what it is worth. In message 56 he writes: "I checked: the low percentage of matches does in fact result from only looking for ungapped alignments. I downloaded the human and chimpanzee genomes and the BLAST executable. As a test set, I pulled 500 randomly sampled, non-overlapping slices from chimpanzee chromosome 12, each 300 base pairs long. After dropping any slices that contained unknown sequence (i.e. 'N's), I had 471 test sequences. I fed these into BLASTN against human chromosome 12, using the parameters specified by Tomkins, with and without allowing gaps in the alignment. With no gaps, 68% of my queries yielded matches, in good agreement with Tomkins's finding. With gaps allowed, 100% of queries matched; of these, one or two were of poor quality and likely represent random matches. So the actual matching rate, when doing a proper alignment, was 99.6%."Error | Christian Forums It has already been confirmed that changing from ungapped to gapped makes a huge difference.
|
|||||||||||||||||||||||||||||||||||||||
Telesto Junior Member (Idle past 3893 days) Posts: 10 From: Zlín Joined: |
Hi RAZD,
So they take one as the baseline and then compare the second one starting with matching both at one end and then shifting the second one along the first one base at a time, recording the degree of matching for each step. Yes I understand. Do you think that it is possible to get overall genetical similarity with such method (gapped or ungapped)? I think the blast algorithm is not created for this purpose. Anyway I would like to get the numbers from the research (even if they are wrong). It is bad that I don't know what to do with all the numbers I got. What is the algorithm to get one number that represent overall similarity. I always got thousands of numbers. How they got 43% from these numbers? I have no idea...
|
|||||||||||||||||||||||||||||||||||||||
Taq Member Posts: 10302 Joined: Member Rating: 7.1 |
Yes I understand. Do you think that it is possible to get overall genetical similarity with such method (gapped or ungapped)? I think the blast algorithm is not created for this purpose. Anyway I would like to get the numbers from the research (even if they are wrong). It is bad that I don't know what to do with all the numbers I got. What is the algorithm to get one number that represent overall similarity. I always got thousands of numbers. How they got 43% from these numbers? I have no idea...
sfs over at CF had a good analogy for gapped v. ungapped. Let's say that you had a 2 books, each with 1,000 pages. When you begin looking at the 2 books you realize that they are nearly identical. The only difference between the 2 books is that there is an extra space smack dab in the middle of one of the books. Every word, letter, and piece of punctuation is otherwise identical. Now, would you say that these two books are nearly 100% identical? Tomkins would say no. He would say that the two books are only 50% identical. Why? Because he ignores the extra space which puts every letter one space off so that they no longer match up. That is how ridiculous Tomkin's comparison is.
|
|||||||||||||||||||||||||||||||||||||||
saab93f Member (Idle past 1651 days) Posts: 265 From: Finland Joined:
|
The author of the creationist paper has rigged the methodology to ignore gaps, and therefore return a false result. The projection in the rest of the paper is also worth discussing, but this is the one major issue that the paper has and so it should be discussed first. I wonder how the creationistis reconcile their utter and total lack of integrity with their preconception of moral superiority compared to "secular scientists"? The scientific community should raise their voice a notch or three and really hammer this deceitful nature of creationism so that every layman can understand it. Loathable folks them cretins...
|
|||||||||||||||||||||||||||||||||||||||
Pressie Member (Idle past 232 days) Posts: 2103 From: Pretoria, SA Joined:
|
Thanks guys for all the free education.
I'm about six months into my genetics course and I'm starting to understand what you are trying to say, even though I'm not near the level of even attempting a post on genetics here yet! So much to learn. Edited by Pressie, : Spelling
|
|||||||||||||||||||||||||||||||||||||||
Pressie Member (Idle past 232 days) Posts: 2103 From: Pretoria, SA Joined: |
quote: I actually agree with you. However, I don't think that a lot of scientists are really interested in taking note or even contemplating commenting on the ramblings of crazy people. Those scientists who do that are spread very thin. Especially in countries where creationists are an endangered species. Those scientists who do read creationist ramblings do it for the fun of it. It's like an early morning dose of comedy just to wake up laughing.
|
|||||||||||||||||||||||||||||||||||||||
Telesto Junior Member (Idle past 3893 days) Posts: 10 From: Zlín Joined:
|
Hi Taq,
It does. If you leave out gaps you will have a much lower score than if gaps are included. Well I made simple application that works as follows: 1) Referenced (subject) chromosome is Human chromosome2) It takes 500 subsequences from Chimp chromosome each 300 bases long (as it was in your quoted comment). 3) Blastn uses these attributes (as it was used in the creationist research paper): -word_size 11 -evalue 10 -num_alignments 1 -dust no -soft_masking false 4) Parameter "-ungapped" is optional and I made two experiments with this parameter and without it. 5) And the calculation. I have no idea what should I calculate. But I made a few calculations: a) First of all I check if the Chimp subsequence matched. I am not sure what can be considered as MATCH. I guess match means the whole subsequence was found. In this case match is 300 bases long (or longer if I use gaps). If matched subsequence was shorter I counted is as "not match". For example: Best sequence is 298 bases long with 5 mismatch. - NOT matchBest sequence is 300 bases long with 2 mismatch - MATCH In the end I calculated the percentage of matched sequences according to the above logic. I think this number has nothing to do with the whole genome comparison. It just says how many 300 (or more) bases long similar subsequences of Chimp chromosome was found in Human chromosome. b) Then I was trying to calculate some relevant similarity percentage. First number was taken only from matched subsequences. Subsequences shorter than 300 bases were completly ignored. From these numbers I take the best match. Longest sequence with the lowest number of mismatch. Example: 300 - 5298 - 1 300 - 2 the winner is 300 - 2 I summarized all these bases and compared them with summarized mismatch. This is I think not much useful. It ignores shorter sequences that were found. For example if in the result file is the best match 289 - 2, it is ignored. c) Next number took into account also shorter sequences, but the rest of bases were added. The missing were counted as mismatch. For example: Best match from result file 289 - 2 was recalculated to 300 - 13 Not sure if this is right... d) Next number was taken from number as they were in result file. Example: best match from result file 289 - 2 was not changed. In the end it was compared with exactly the same number of bases and mismatch. No changes... e) The last number was calculated also from all steps in experiment - matched (300 or longer) and not matched (shorter) sequences. However if the sequence was marked as not matched (shorter) the number was calculated as completly wrong. Example: best match 289 - 3 was marked as not aligned and calculated in sum as 300 - 300 (300 bases long with 300 mismatch = 0% similarity) And here are results for chromosome Y: 1) Ungapped!a) Matched vs. all: 234/500 => 46,80% b) Only matched similarity: 97.8% c) All results, calculated as full 300 bases long: 81.43% d) All results as they were, no recalculation: 96.03% e) All results, with 100% penalty: 45.77% 2) Gappeda) Matched vs. all: 359/500 => 71.80% b) Only matched similarity: 97.12% c) All results, calculated as full 300 bases long: 91.14% d) All results as they were, no recalculation: 95.76% e) All results, with 100% penalty: 69.81% So... What is right what is wrong. The only think I can see is the number 45.77% similarity that is very close to 43% reported in research paper. Of course this number is nonsense - but that is another story I hope you understand to my "methodology". Or is there better approach? As you can see gapped was better but not much. I think the most representative number is d) Calculated as it was with no recalculation and no penalty. But with ungapped parameter the results were better 96.03% than with gaps 95.76%. But both very close. I would like to do the same experiment for chromosome 1. But it will take much more time as it is 250 MB large (in contrast to 60 MB of human chromosome Y).
|
|||||||||||||||||||||||||||||||||||||||
Taq Member Posts: 10302 Joined: Member Rating: 7.1 |
As you can see gapped was better but not much. Not much? You went from 47% to 72% for matches. I would call that a pretty massive jump, especially given that Tomkins is comparing a 70% match to 95% similarity. As cited above, sfs has already run it and he is more familiar with BLAST batch runs than either of us are. He gets results very close to Tomkins for the ungapped alignments, and near 100% results for gapped. I would call that a real problem for Tomkins.
|
|||||||||||||||||||||||||||||||||||||||
Telesto Junior Member (Idle past 3893 days) Posts: 10 From: Zlín Joined: |
Not much? You went from 47% to 72% for matches. I would call that a pretty massive jump, especially given that Tomkins is comparing a 70% match to 95% similarity. Well for matching yes. But I hoped for allmost 100% according to sfs resluts. But I know that human Y chromosome is most diverse. So I will wait for results of other chromosomes. But I am really curios about these numbers. Do you really think that Tomkins compare number of matches with similarity? Unbelivable... I hoped not, but from my preliminary results it really looks like he did it. I will try to contact sfs
|
|
|
Do Nothing Button
Copyright 2001-2023 by EvC Forum, All Rights Reserved
Version 4.2
Innovative software from Qwixotic © 2024