Initiated by Congress in 1990, the U.S. Human Genome Project was a multidisciplinary effort, jointly administered by the Department of Energy and the National Institutes of Health, to map and sequence the human genome. The International Human Genome Sequencing Consortium included hundreds of scientists at 20 sequencing centers in the U.S., Great Britain, China, France, Germany, and Japan. Sequencing the DNA in all 46 chromosomes took 10 years even with the help of polymerase chain reaction, fluorescent in situ hybridization, cloning of DNA segments, and automated sequencing technology. The resulting map is a highly idealized representation, like an illustration in an anatomy atlas, because no two people except (perhaps) identical twins have exactly the same genetic makeup. The estimated 30,000-40,000 human genes, considerably fewer than had been predicted, encode more than 10 times that number of proteins. More than 1,600 diseases have been identified as due to abnormalities affecting about 1,200 of these genes. Completion of the genomic map has broadened our understanding of human biology and is expected to facilitate the detection and treatment of genetic diseases, permit identification of genes and gene products that can be targeted by custom-designed drugs, and enhance the individualization of medical care. Projects also under way to study the genomes of bacteria, yeasts, crop plants, farm animals, and other organisms will foster advances in agriculture, environmental science, and industrial processes. About 5% of the budget of the Human Genome Project has been devoted to anticipating and resolving the ethical, legal, and social issues likely to arise from this research.
The HGP also funded the sequencing of 5 non-human genomes to provide model systems for comparative genome studies as follows: the bacterium, ESCHERICHIA COLI, to represent PROKARYOTES; the yeast, SACCHAROMYCES cerevisiae, to represent UNICELLULAR EUKARYOTES; the FRUIT FLY, DROSOPHILA melanogaster and the NEMATODE worm, Caenorhabditis elegans, to represent MULTICELLULAR animals with moderately complex genomes; and the mouse, Mus musculus, to represent multicellular animals of comparable genetic complexity to humans. In addition, the flowering plant, Arabidopsis thaliana, has been sequenced in another project, to represent plants. Further human genome projects are also being initiated to investigate human diversity, by assessing sequence variability in different populations.
Initial working drafts of the human genome sequence, published in 2001 by IHGSC (Nature 409,860–921) and by Venter et al. Celera Genomics (Science 291, 1304–1351), revealed the presence of about 30,000 to 40,000 genes, which is only about twice as many as in the worm or the fruit fly However, there is more ALTERNATIVE SPLICING in the human to generate a larger number of proteins. It should be noted that these drafts were missing about 10% of the euchromatic part (see EUCHROMATIN of the genome and about 30% of the whole genome, including the HETEROCHROMATIC parts. The HGP was completed in 2003. However, sequencing of the heterochromatic parts, where the DNA is tightly packaged, generally difficult to clone and believed to contain few, if any, genes is still to be finished. The exact number of genes in the human genome remains unknown. In 2004 the estimate was reduced to between 20,000 and 25,000 genes. The genome has a greater portion of repeated sequences than is found in either the worm or the fruit fly.
Once sequenced and checked for accuracy the next step is to determine the exact location of all the genes and their functions (see ANNOTATION). Comparative genomics (see GENOMICS).is important here in order to make comparisons, particularly with animal genes, which can then be used as model systems for studying disease and so on. It also facilitates an understanding of the genetic differences between humans and other organisms. DNA MICROARRAY/CHIP technology is facilitating studies on the functioning of the genome.
The HGP also considered Ethical, Legal and Social Implications of the research through the ELSI programme. Issues being studied include the use of genetic testing, privacy and confidentiality, fair use of genetic information in, for example, employment and insurance (see BIOETHICS).
Potential applications of genome research are wide ranging and include those in molecular medicine, where emphasis is on identifying the fundamental causes of disease for disease prevention; in DNA forensics, where DNA sequencing should allow precise identification of individuals; and in agriculture, for more nutritious, pesticide-free foods. See also PHARMACOGENOMICS, TOXICOGENOMICS. Furthermore, the evolutionary history of the human genome will be addressed.Other genome projects have been undertaken including those for the rat, chimpanzee and various BACTERIA, ARCHAEA-, FUNGI, PROTOZOA and VIRUSES.