Before the human genome was sequenced and the results published in February 2001, some biologists speculated that there might be 100,000 or more different genes. Later in 2001 the estimated numbers were still sometimes between 60,000 and 90,000. (See here, here, here.) More conservative estimates at the time were around 35,000, and that gradually fell to about 25,000 over the next several years.
To distinguish such misidentified genes from true ones, the research team, led by Clamp and Broad Institute director Eric Lander, developed a method that takes advantage of another hallmark of protein-coding genes: conservation by evolution. The researchers considered genes to be valid if and only if similar sequences could be found in other mammals – namely, mouse and dog. Applying this technique to nearly 22,000 genes in the Ensembl gene catalog, the analysis revealed 1,177 “orphan” DNA sequences. These orphans looked like proteins because of their open reading frames, but were not found in either the mouse or dog genomes.
Tags: genes, human genome, genomics