EvC Forum: DNA similarity between Chimpanzee and Human 70%

Email This Thread

Newer Topic | Older Topic

Author

Topic: DNA similarity between Chimpanzee and Human 70%

Taq

Member

Posts: 10084

Joined: 03-06-2009

Member Rating: 5.1

Normal Thread Display

(4)

Message 8 of 32 (719974)
02-19-2014 1:26 PM

One word: Gaps

The main thesis of the creationist paper falls apart once you understand the underhanded tricks that they use to get their 70%. To the uninitiated, it may seem like a subtle difference, but it isn't. The author uses a non-gapped alignment.

"Gapping was disallowed for a variety of reasons. First, Altschul et al. (1990) determined that the addition of gapping strategies for alignments designed to locate regions of local similarity using BLAST was negligible. Secondly, an objective comparison among all queries negates the use of gapping with the algorithm."
Chimpanzee DNA Sequences Queried Against Human Genome | Answers Research Journal

Those excuses are just utter BS. If you are going to compare genomes in an objective, fair, and unbiased manner you have to include indels. It just so happens that insertions and deletions really do happen, so they have to be part of any comparison between two genomes. Let's take a look at what a massive difference a gapped and ungapped alignment can make using a random stretch of DNA:

Gapped alignment

Species A:  TATA-AGCGTAGGCAAT
Species B:  CATAGAGCGTAGGCAAT

With this alignment, there is a one base indel and one substitution mutation at the very beginning for an overall identity of 15/17, or 88%. Now for the ungapped alignment.

Species A:  TATAAGCGTAGGCAAT
Species B:  CATAGAGCGTAGGCAAT

The overall identity is now 5/17, or 29%.

The author of the creationist paper has rigged the methodology to ignore gaps, and therefore return a false result.

The projection in the rest of the paper is also worth discussing, but this is the one major issue that the paper has and so it should be discussed first.

Edited by Taq, : No reason given.

Replies to this message:
	Message 10 by RAZD, posted 02-19-2014 2:03 PM	Taq has replied
	Message 13 by Telesto, posted 02-19-2014 4:03 PM	Taq has replied
	Message 14 by NosyNed, posted 02-19-2014 4:14 PM	Taq has not replied
	Message 25 by saab93f, posted 02-20-2014 1:10 AM	Taq has not replied

Taq

Member

Posts: 10084

Joined: 03-06-2009

Member Rating: 5.1

Normal Thread Display

(2)

Message 11 of 32 (719991)
02-19-2014 2:49 PM

Reply to: Message 10 by RAZD
02-19-2014 2:03 PM

Re: intentionally misusing science

That was my first suspicion, the second would be ignoring reversed sequences that still accomplish the same functions.

As long as the reversed sequence spanned the 300 or 30 base stretches that the author was using, it shouldn't make a difference. Plus/Plus and Plus/Minus strand matches are treated equally.

We have seen this type of intentionally misusing science in other areas, such as carbon 14 dating and living animals (seals at McMurdo Sound, etc), and several other dating methodologies.

No surprises.

There are other "no surprises" moments as well. For example:

"Non-alignable regions are typically omitted and gaps in alignments are often discarded or obfuscated. . .

One of the first publications to compare large regions of the chimpanzee genome with human, was Britten’s lab in 2002 using an in-house Fortran computer program. The study was based on five large DNA fragments (BAC clones) from chimpanzee known to be homologous to human that were thoroughly sequenced. The total length of the DNA sequence for all 5 BACs was 846,016 bases, but only 92% of the DNA aligned to human and the paper reported on only 779,132 bases. The alignment with insertions and deletions (indels) indicated a human-chimp similarity of 95% (Britten 2002). However, when the complete sequence of all 5 BACs is included, a final DNA similarity of 87% is the final figure for the compared homologous regions between chimp and human."
Chimpanzee DNA Sequences Queried Against Human Genome | Answers Research Journal

The author keeps making a big stink about non-alignable sequence, as if it has higher than normal differences so it is kept out to keep the percentages higher. Of course, that misses the actual truth by a mile. You can't compute the % similarity between two DNA sequences unless you can align them first.

To use an analogy, let's say that you want to find the average weight of a finch on a single island. After 3 months you have snagged 90% of the individuals, and the average weight is 100 grams. Would it be fair to say that the rest of the finches weigh zero grams, so the actualy average weight is actually 90 grams? No. That's not how it works, and yet that is how the author is treating these comparisons. If the sequences can't align he treats them as being 0% similar, which just isn't the case.

Edited by Taq, : No reason given.

This message is a reply to:
	Message 10 by RAZD, posted 02-19-2014 2:03 PM		RAZD has seen this message but not replied

Taq

Member

Posts: 10084

Joined: 03-06-2009

Member Rating: 5.1

Normal Thread Display

Message 12 of 32 (719992)
02-19-2014 2:53 PM

Reply to: Message 9 by Dr Adequate
02-19-2014 1:55 PM

Re: More Please

But what they've actually done is picked a different method of measuring difference. Your weight in kilograms is not in contrast to your weight in ounces, it's just a different metric.

That's a good analogy. To carry it a little further, if I buy 16 ounces of ham at the store and then find that I have 1 pound of ham when I get home, did the ham lose 94% of its weight on the way home (15/16=0.94)?

That's the kind of game that creationists are trying to play with these numbers.

Edited by Taq, : No reason given.

This message is a reply to:
	Message 9 by Dr Adequate, posted 02-19-2014 1:55 PM		Dr Adequate has not replied

Taq

Member

Posts: 10084

Joined: 03-06-2009

Member Rating: 5.1

Normal Thread Display

Message 17 of 32 (720017)
02-19-2014 4:52 PM

Reply to: Message 15 by Telesto
02-19-2014 4:27 PM

Re: More Please

The only difference between gapped and ungapped sequences would be in total length of longest sequences

That isn't true. Go back to my post in message #8. Using those sequences, the best hit for the gapped alignment would be 88% similarity. The best hit for the ungapped alignment would be 29%. This isn't because of different length sequences, or comparing different parts of the genome. This is comparing the same two sequences using different parameters.

The creationist article biases their methodology by excluding indels. There is no way around it. They do this in order to get a lower percentage for similarity. They use a different methodology that they know will falsely return a lower percentage, and is different than the methodology used in the other papers.

It's not as if the author re-sequenced the genomes from scratch and found out that the scientists had reported the wrong sequence. They are using deception to con people that aren't familiar with genetics.

Compare two identical chromosomes - expected result is 100%.

No, it isn't. Different chromsomes have diverged at different rates. There is no expectation that the similarities will be the same for a comparison of any two chromosomes.

But first of all I need to get 43% similarity for chromosome Y.

The Y chromosome has 50 million bases, or just 1.6% of the total genome. You do know this, right?

Let's put this another way. If I said that the average life expectancy was 85 years old, could I prove this wrong by pointing to a baby that died at 1 year old? If I said that the average life expectancy was 85, does this mean that everyone dies at 85, and at no other age?

Yes... rat and mouse are more different than human and chimp.

More importantly, chimp and gorilla are more different than chimp and human. Chimp and orangutan are more different than chimp and human. No species is closer to chimps than humans.

This message is a reply to:
	Message 15 by Telesto, posted 02-19-2014 4:27 PM		Telesto has replied

Replies to this message:
	Message 18 by Dr Adequate, posted 02-19-2014 5:17 PM	Taq has not replied
	Message 20 by Telesto, posted 02-19-2014 6:21 PM	Taq has not replied

Taq

Member

Posts: 10084

Joined: 03-06-2009

Member Rating: 5.1

Normal Thread Display

Message 22 of 32 (720028)
02-19-2014 6:28 PM

Reply to: Message 13 by Telesto
02-19-2014 4:03 PM

Re: One word: Gaps

I am not sure that the blastn algorithm compute the sequence as you described.

It does. If you leave out gaps you will have a much lower score than if gaps are included.

I didn't try this with gaps (indels) - I uses parameter -ungapped as they did. I think the number would be similar anyway.

Actually, sfs over at Christian Forums has already done some of the leg work. sfs also happens to be an author on the chimp genome paper, for what it is worth.

In message 56 he writes:

"I checked: the low percentage of matches does in fact result from only looking for ungapped alignments. I downloaded the human and chimpanzee genomes and the BLAST executable. As a test set, I pulled 500 randomly sampled, non-overlapping slices from chimpanzee chromosome 12, each 300 base pairs long. After dropping any slices that contained unknown sequence (i.e. 'N's), I had 471 test sequences. I fed these into BLASTN against human chromosome 12, using the parameters specified by Tomkins, with and without allowing gaps in the alignment. With no gaps, 68% of my queries yielded matches, in good agreement with Tomkins's finding. With gaps allowed, 100% of queries matched; of these, one or two were of poor quality and likely represent random matches. So the actual matching rate, when doing a proper alignment, was 99.6%."
Error | Christian Forums

It has already been confirmed that changing from ungapped to gapped makes a huge difference.

This message is a reply to:
	Message 13 by Telesto, posted 02-19-2014 4:03 PM		Telesto has replied

Replies to this message:
	Message 28 by Telesto, posted 02-20-2014 10:46 AM		Taq has replied

Taq

Member

Posts: 10084

Joined: 03-06-2009

Member Rating: 5.1

Normal Thread Display

Message 24 of 32 (720030)
02-19-2014 7:01 PM

Reply to: Message 23 by Telesto
02-19-2014 6:32 PM

Re: One word: Gaps

Yes I understand. Do you think that it is possible to get overall genetical similarity with such method (gapped or ungapped)? I think the blast algorithm is not created for this purpose. Anyway I would like to get the numbers from the research (even if they are wrong).

It is bad that I don't know what to do with all the numbers I got. What is the algorithm to get one number that represent overall similarity. I always got thousands of numbers. How they got 43% from these numbers? I have no idea...

sfs over at CF had a good analogy for gapped v. ungapped.

Let's say that you had a 2 books, each with 1,000 pages. When you begin looking at the 2 books you realize that they are nearly identical. The only difference between the 2 books is that there is an extra space smack dab in the middle of one of the books. Every word, letter, and piece of punctuation is otherwise identical.

Now, would you say that these two books are nearly 100% identical? Tomkins would say no. He would say that the two books are only 50% identical. Why? Because he ignores the extra space which puts every letter one space off so that they no longer match up. That is how ridiculous Tomkin's comparison is.

This message is a reply to:
	Message 23 by Telesto, posted 02-19-2014 6:32 PM		Telesto has not replied

Taq

Member

Posts: 10084

Joined: 03-06-2009

Member Rating: 5.1

Normal Thread Display

Message 29 of 32 (720106)
02-20-2014 11:04 AM

Reply to: Message 28 by Telesto
02-20-2014 10:46 AM

Re: One word: Gaps

As you can see gapped was better but not much.

Not much? You went from 47% to 72% for matches. I would call that a pretty massive jump, especially given that Tomkins is comparing a 70% match to 95% similarity.

As cited above, sfs has already run it and he is more familiar with BLAST batch runs than either of us are. He gets results very close to Tomkins for the ungapped alignments, and near 100% results for gapped. I would call that a real problem for Tomkins.

This message is a reply to:
	Message 28 by Telesto, posted 02-20-2014 10:46 AM		Telesto has replied

Replies to this message:
	Message 30 by Telesto, posted 02-20-2014 11:20 AM		Taq has replied

Taq

Member

Posts: 10084

Joined: 03-06-2009

Member Rating: 5.1

Normal Thread Display

Message 31 of 32 (720114)
02-20-2014 11:28 AM

Not aligned =/= 0% similarity

The other egregious, but easy to miss, deception that Tomkins uses is that sequence that could not be aligned between humans and chimps is necessarily 0% similar. This is completely false.

When they say that sequence could not be aligned they are saying that they aren't sure where in the genome that chunk of DNA belongs. This often happens in regions with lots of repeats. If they can't verify where a DNA sequence belongs in a genome they can not guarantee that they are looking at orthologous DNA, so they keep it out of the comparisons. For example, if they aren't sure if there are 10 repeats or 15 repeats between one chunk of DNA and the rest of the genome, it is said to be unaligned. This can be due to something as simple as not having enough sequencing coverage in that part of the genome. It does not necessarily mean that they could not find homologous sequence in the other genome. Even two random DNA sequences will share 25% homology, but Tomkins claims that unaligned sequence necessarily means that there is no homologous sequence in the other genome.

There are several instances where Tomkins makes this claim. This is one example:

"Nevertheless, enough data from the 2005 chimp genome project was available to allow rough estimates of overall genome similarity. Tomkins and Bergman (2012) derived a calculation that included published concurrent information from the human genome project along with the data reported in the 2005 chimpanzee paper and estimated an overall genome DNA similarity of 80.6%, which they proposed as a very conservative figure (see Tomkins and Bergman 2012, for details). "

How did they get that 80.6% figure? They counted the unaligned sequence as 0% similar.

"In summary, only 2.3 Gb of chimp sequence aligned onto the highly accurate and complete human genome (2.85 Gb) an operation that included the masking of low complexity sequences. For the chimp sequence that aligned, the data for substitutions and indels indicates 95.8% similarity, a biased figure which excludes the masked regions. Using these numbers, an overall estimate of chimp compared to human DNA produces a conservative estimate of genome-wide similarity at 80.6%."
Human chimp dna similarity re-evaluated - creation.com

The moral of the story is that you can't compare unaligned sequence because you don't know if it is orthologous or not. That is why it is excluded, not because it lacks any homology to the other genome.

Edited by Taq, : No reason given.

Taq

Member

Posts: 10084

Joined: 03-06-2009

Member Rating: 5.1

Normal Thread Display

Message 32 of 32 (720116)
02-20-2014 11:31 AM

Reply to: Message 30 by Telesto
02-20-2014 11:20 AM

Re: One word: Gaps

Well for matching yes. But I hoped for allmost 100% according to sfs resluts.

You got 46% instead of the 70% reported by Tomkins, so there is a problem with something. If you are using a sex chromosome this is probably the root of the problem. Even Tomkins used an autosomal chromosome.

This message is a reply to:
	Message 30 by Telesto, posted 02-20-2014 11:20 AM		Telesto has not replied

Date format: mm-dd-yyyy

Timezone: ET (US)

Newer Topic | Older Topic

Do Nothing Button

4 online now:	AZPaul3, nwr, PaulK, Tangle
Newest Member:	ChatGPT
Post Volume:	Total: 916,895 Year: 4,152/9,624 Month: 1,023/974 Week: 350/286 Day: 6/65 Hour: 1/3