More Recent Comments

Friday, March 23, 2007

How Many Genes Do We Have?

 
The number of genes in the human genome flutuates on a monthly basis as the genome annotators add new genes and remove false positives. It's an ongoing process that's not likely to be complete in the near future.

The original draft sequences of the human genome had between 25,000 and 30,000 genes but these numbers were not reliable since they were based entirely on computer predictions. The programs were still in the testing stage for complex genomes when they were used in 2001. They are much better now but it really takes human intevention to assess whether a prediction is correct or not. The annotation process is tedious.

The latest summary from NCBI is based on the Oct. 17, 2006 genome assembly [NCBI Reference Assembly]. It lists 28,961 genes for the public genome and 26,245 for the private Celera assembly.

The Ensembl site has better data because the curation seems to be more rigorous. It lists 26,720 genes of which 3,994 have RNA products (mainly ribosomal RNA, tRNAs, and snoRNAs) [Ensembl Homo sapiens]. This is not much different than the NCBI number. It looks like the total number of genes is stabilizing at 27,000 total genes and about 23,000 protein encoding genes.

Carl Zimmer recently posted an article about the number of genes in the human genome [You Don't Miss Those 8,000 Genes, Do You?]. He referred to the PANTHER database where they quote 25,431 genes on their current website [PANTHER pie chart]. This differs considerably from the 18,308 genes shown in Zimmer's original article at this site [PANTHER filtered NP]. The difference is due to filtering the total number of genes (25,431) by showing only those that have a RefSeq entry in the Entrez database. This is an underestimate since not all genes have been assigned a RefSeq entry, particularly those that produce an RNA product rather than a protein.

[Thanks to Scientia Natura for the cartoon]

5 comments :

Rosie Redfield said...

Do we have a satisfactory definition of 'gene'?

Larry Moran said...

Yes, it's a DNA sequence that's transcribed to produce a functional product [see What Is a Gene?].

Rosie Redfield said...

That definition is probably satisfactory for first-year university biology courses, but I don't think it's very satisfactory when applied to the question of how many human genes there are.

The question of 'how many human genes there are' arises because we want to know how many distinct gene products are needed to specify humans, not how many distinct transcripts our DNA produces.

Many human transcripts produce multiple functional protein products because of alternate splicing. If 'a functional product' mean ONLY ONE functional product, then many of what we consider huuman genes don't meet this criterion. If it means AT LEAST ONE functional product, then the definition fails to capture the point of the question.

Larry Moran said...

Rosie Redfield says,

That definition is probably satisfactory for first-year university biology courses, but I don't think it's very satisfactory when applied to the question of how many human genes there are.

It's perfectly satisfactory. There are exceptions.

The question of 'how many human genes there are' arises because we want to know how many distinct gene products are needed to specify humans, not how many distinct transcripts our DNA produces.

Well, some people may be interested in the total number of protein variants that a cell can produce but most of us aren't. We want to know how many distinct genes there are because each gene will produce one kind of product or a small class of related products.

The reason for adding "functional product" to the definition of a gene is not to quibble about whether genes produce protein (or RNA) variants. It's to eliminate the case where a stretch of DNA is transcribed but no functional product is made (e.g., pseudogenes).

Many human transcripts produce multiple functional protein products because of alternate splicing. If 'a functional product' mean ONLY ONE functional product, then many of what we consider huuman genes don't meet this criterion. If it means AT LEAST ONE functional product, then the definition fails to capture the point of the question.

We've known about alternative splicing for 35 years and it's never been thought to threaten our understanding of what a gene is. The point about the definition is that a gene is a region of DNA that is transcribed and not that it can only produce one particular product.

So, the point of the question is not whether proteins can be modified post-translationally or whether mRNA precursors can be alternatively spliced. The point is to decide how many fundamental genetic units are present in the genome.

Now, as it turns out, many people were unhappy that we had only a few more genes than a fruit fly so they went looking for a way to explain the embarrassment. They needed to find some rationalization to put us back at the top of the complexity heap.

The one they came up with was alternative splicing. They claimed that humans were very special because they had evolved a way of making many different proteins from each of their genes.

This rationalization had the additional advantage of accounting for the anomalous EST data that showed far too much of the genome being transcribed.

Well, unfortunately for the human centrists, the explanation hasn't worked out. The EST data is mostly artifact and it's just not true that human genes are more likely to produce multiple splice variants than the genes of other species.

Only a minor percentage of human genes produce distinct protein products by alternative splicing. The argument was never very logical anyway since the artifactual EST data of other species showed the same (pseudo)phenomonen of alternative splicing so we couldn't argue that alternative splicing was what made us special.

Mark Cowan said...

We can afford to be human centrists because only humans are doing this work, and consciously modifying genes in a way that is "utterly foreign" to natural selection as Richard Dawkins reminds. By the very act of using human created technology and commenting on this dimension of difference totally unique to humans, human centrism will continue, because it's a truth, it's what we do with that truth that is important. We have a control over the natural world beyond parallel with all other life, but we need to take control of that control.