Molecular biology and clinical practice

 

DAVID J. WEATHERALL

 

 

In his monograph Dreams, genes and realities, written as recently as 1971, Macfarlane Burnet predicted that the basic biological sciences are unlikely to make a major impact on clinical practice. The remarkable discoveries of the past few years have proved him wrong. It is now apparent that molecular and cell biology are likely to play an increasingly important role in clinical practice, and that medical research may well be moving into the most exciting phase of its development. Over the next few years there will be a change of emphasis from whole-patient physiology and pathology to the definition of disease at the cellular and molecular level. At first sight this reductionist approach to the study of disease might seem to be at odds with the holistic type of medical care to which all branches of clinical practice are being urged to aspire. On the contrary, however, the study of disease at the cellular and molecular levels will tend to unify the clinical specialities. There will no longer be disparate groups working in the subspecialities of medicine and surgery. Rather, medical scientists from each of these subjects will work together and use common technology to study their particular diseases. The overall effect will be to bring together all the medical specialities and, equally important, to reunite the basic and clinical sciences.

 

This chapter will summarize what we might expect to achieve over the next few years, and will discuss briefly some of the wider implications of the new science of human molecular biology. It will only be possible to highlight a few aspects of this exciting field; for more extensive discussion the reader is referred to several recent monographs and reviews cited at the end of this section.

 

THE STRUCTURE AND FUNCTION OF HUMAN GENES

Proteins consist of one or more peptide chains folded into a three-dimensional structure, the exact shape of which is critical for their normal function, as enzymes or building blocks of tissues, for example. Their conformation depends on the interactions of the different amino acids from which they are constructed. The genetic information that determines the order of amino aids in a peptide chain is encoded in the deoxyribonucleic acid (DNA) which constitutes the gene for that chain. This information is transported from nuclei of cells to their cytoplasm by means of a form of ribonucleic acid (RNA) called messenger RNA (mRNA), which has a structure exactly complementary to that of the DNA from which it is copied, or transcribed. The process whereby a protein chain is synthesized on its mRNA template is called translation. Thus, the flow of genetic information in cells can be written: Equation 40

 

 

The structure of DNA and genes

DNA consists of two chains of nucleotide bases wrapped around each other. There are four bases, adenine (A), guanine (G), cytosine (C), and thymine (T). The building blocks of each chain are deoxyribonucleotides, which consist of a base, deoxyribose, and a phosphate, covalently joined. The backbone of DNA, which is constant throughout the whole molecule, consists of deoxyribose molecules linked by phosphates. Thus the only variable part of a DNA chain is the sequence of bases, which can be in any order along the sugar–phosphate backbone. Because of their particular shapes, A always pairs with T, and C with G. Genetic information is encoded by the order of bases; it is a triplet, non-overlapping code in which three bases determine a particular amino acid.

 

DNA replication is an extremely complex process whereby the strands are separated and each one is copied to produce new daughter strands. Since one of each parent strand remains intact after replication the process is said to be semiconservative. Through the action of enzymes called DNA polymerases each new strand is synthesized in a 5′ → 3′ direction by the stepwise addition of the four deoxyribonucleotide triphosphates; these bases are added to complementary bases on the parental template strand so that the replication process produces two identical copies of the original molecule.

 

A gene is defined as a length of DNA which carries the information to make a single peptide chain. This information must include not only instructions about the amino acid sequence of the chain but also to ensure that the protein product is made in appropriate amounts in the correct tissues at a particular time during development. Although some genes are transcribed in many tissues at all stages of development, and are therefore called ‘housekeeping’ genes, many are only expressed in specific tissues at particular times of development. Although the ‘one gene–one peptide chain’ rule is generally true, there are some exceptions. For example, it appears that some genes are able to produce more than one product as a result of complex post-transcriptional modification of their mRNAs or by post-translational modification of their protein products.

 

Almost all mammalian genes that have been analysed so far have their coding sequences interrupted by sequences of unknown function called intervening sequences, introns or IVS, at varying positions along their length (Fig. 1) 2895. Their number and size, often considerably longer than the coding sequences or exons, varies form gene to gene. At the 5′ and 3′ ends of genes there are specific triplets which determine the initiation (AGG) and termination (TAA, TAG, or TGA) of protein synthesis on mRNAs. There are also sequences of varying lengths at both ends which determine the structure of untranslated regions of mRNA. The highly conserved AATAAA sequence in the 3′ non-coding region of all mammalian genes is essential for the normal processing of mRNA.

 

Most mammalian genes have blocks of sequences in their 5′ flanking regions which are similar to those found in Drosophila and many other species. The first, ATA, is located 26 to 30 nucleotides upstream from the RNA initiation site. Another conserved box, CCAAT is found about 72 to 77 nucleotides upstream. These regions, and another with the general structure CACCC, which occurs twice between about 80 and 110 nucleotides upstream from the beginning of the gene, are involved in the regulation of transcription of mRNA. For this reason they are called promoters, or upstream promoter elements, regions of DNA to which RNA polymerases bind and initiate gene transcription.

 

It is now apparent that there are other major regulatory elements which are involved in determining whether genes are transcribed in particular tissues. These so-called enhancer elements may be at some distance from the structural genes, but are probably brought into apposition to the promoters when they are activated.

 

Transcription and processing messenger RNA

Messenger RNA is synthesized on its DNA template in a 5′ → 3′ direction by the action of enzymes called RNA polymerases. Chemically, RNA is similar to DNA except for two differences; the sugar of DNA is deoxyribose while in RNA it is ribose, and instead of thymine (T) RNA contains the closely related pyrimidine, uracil (U). The synthesis of RNA on the DNA template is similar in principle to the process of DNA replication, and involves the formation of complementary base pairs, in this case G pairs with C, but A pairs with U instead of T. In this way mRNA carries the faithful replica of the DNA strand from which it is transcribed.

 

The primary transcript is a large mRNA precursor, which contains the entire gene complex including exons and introns. This molecule undergoes a series of processing steps before it is ready for delivery to the cell cytoplasm (Fig. 1) 2895. The introns are cut out and the exons are spliced together in a two-stage process. First, the mRNA precursor is cut at the 5′ site to generate two intermediates, a linear first exon and a branched lariat-shaped molecule containing the intron and second exon. Secondly the 3′ splice site is cleaved, the lariat intron released, and the two exons joined. This process involves the interaction of several enzymes and other nuclear proteins. While in the nucleus, mRNA undergoes further processing, including chemical modification of its 5′ end and the attachment of a string of adenylic acid residues (polyA) at its 3′ end, which may serve to stabilize it during its passage to the cytoplasm.

 

Once in the cytoplasm, mRNA acts as a template for protein synthesis. Amino acids are brought to mRNA attached to another type of molecule called transfer RNA (tRNA). There is a family of different transfer RNAs, specific for a particular amino acid and for three bases (anticodons) which are complementary to the appropriate mRNA codons for their amino acids. Protein synthesis occurs on ribosomes, each of which consists of two different-sized subunits. The initiation of protein synthesis occurs when a ribosome is bound to the region of the initiation codon, AUG, and when an initiator tRNA base pairs with this codon. As each amino acid is brought to its appropriate place by its tRNA, it forms a peptide bond with its fellow that is already in place, and hence a peptide chain is formed and gradually lengthened as the ribosomes move along the mRNA (Fig. 1) 2895. The ribosomes move over the mRNA in a 5′ → 3′ direction from codon to codon until a specific termination codon is reached. The complete chain is then released from the mRNA and ribosomes.

 

The relationship between the DNA bases and their RNA equivalents that carry the information to make a peptide chain is called the genetic code. It is a triplet, non-overlapping code. Because there are more codewords than amino acids it follows that several amino acids can be encoded by more than one triplet. Thus the code is said to be degenerate.

 

Many proteins have to undergo a considerable amount of posttranslational modification before they are functional. Insulin, for example, is first synthesized as a molecule called preproinsulin, which is 100 amino acids long. The first 24 amino acids constitute a signal peptide that facilitates entry of the molecule into the endoplasmic reticulum; many secreted proteins have signal or leader peptides of this type. The signal peptide is cleaved to produce a shorter molecule called proinsulin, which is then further modified to form the definitive two-chain insulin molecule.

 

Regulation

DNA exists in a highly compressed form in nuclei, complexed with histones and other proteins which constitute chromatin. For this reason its transcriptional activity in individual cells is quite limited. For example, in erythroid cells only a few per cent of the total DNA sequence is capable of being expressed or active. This variability is reflected by major alterations in chromatin structure, which can be demonstrated experimentally as an increased sensitivity to digestion by various nucleases, notably DNAase I. Another useful indicator of the state of activity of genes is their degree of methylation; actively transcribed genes are hypomethylated, and vice versa. Very little is known about the regulation of chromatin structure that leads to genes being in an active or closed conformation. However, at the level of DNA there is some knowledge about the sequences that are involved in gene activation. We have already seen that there are critical promoter and enhancer elements. In recent years some evidence about how these regions interact with regulatory proteins has been obtained.

 

Many DNA-binding proteins have been purified from nuclear extracts. There are several different classes, identified by particular structural motifs. They bind to specific regulatory regions close to, or at a distance from, particular genes, and in this way regulate gene activation or suppression. The overall pattern of gene regulation that is emerging is a complex network of genes that can code for regulatory proteins and which themselves can respond to external signals, so ensuring the synchronous interaction of the activity of many genes with similar functions in different tissues and at different stages of development.

 

THE TOOLS OF RECOMBINANT DNA TECHNOLOGY

Before considering the clinical applications of molecular and cell biology, it is important to outline briefly some of the methods that are involved, particularly as they will play an increasingly important role in medical research and practice over the next few years. It is impossible to describe them in detail here, and readers who wish to explore this field further are referred to several articles and monographs written for non-specialists, which are listed at the end of this chapter.

 

Molecular hybridization and gene probes

The two strands of DNA can be dissociated and reassociated in vitro by heating and cooling. It is also possible to form double-stranded DNA/RNA molecules in this way. This reannealing process is highly specific, and under suitable conditions occurs only between DNA or RNA strands which have identical or almost identical base sequences. If we wish to look for a particular gene buried away in a large amount of DNA we can make a length of DNA with a complementary sequence which will anneal to the gene, but not to the rest of the DNA. This principle underlies the construction of gene probes.

 

Gene probes can be made in a variety of ways. First, an enzyme called reverse transcriptase can be used to synthesize a DNA copy (complementary DNA, or cDNA) from any messenger RNA (mRNA) that can be isolated from human cells. If radioactive bases are added to the reaction, the cDNA can be labelled and hence used as a hybridization probe to ‘look for’ its partner sequences in genomic DNA or cellular RNA. If cDNA probes are made from cellular RNA, they may represent several different mRNA species. However, it is possible to clone cDNA into bacterial plasmids. This is done by synthesizing a second DNA strand on newly synthesized cDNA using a bacterial DNA polymerase. In this way small cDNA duplexes are made which can be incorporated into plasmids and then grown in bacterial cells (see below). It is also possible to generate genomic probes by cloning fragments of genomic DNA into plasmids or bacteriophage and amplifying individual genes in Escherichia coli, as outlined in a later section. Thus, there are three main sources of gene probes: cDNA, cloned genomic DNA, or DNA fragments prepared from genomic DNA.

 

In order to label DNA to make a hybridization probe, a technique called nick translation is used. Appropriate nicks can be made in double-stranded DNA by various nucleases (Fig. 2) 2896. So treated, DNA can act as a template for the enzyme DNA polymerase I. Appropriate nicks are introduced with nucleases, and the DNA is labelled by incorporating a ³²P-labelled deoxyribonucleoside 5-triphosphate at the 3′ OH terminus of the nick by the action of the DNA polymerase. In this way it is possible to prepare highly radioactive probes.

 

DNA fractionation: restriction endonucleases

Restriction endonucleases are enzymes that occur naturally, mainly in bacteria, and which cleave DNA. They are called restriction endonucleases because they restrict their activity to foreign DNA. For example, if DNA from one strain of E. coli is introduced into another strain it is fragmented by the host restriction endonucleases; the bacterium's own DNA is not attacked because its vulnerable sites are protected by methylation. The restriction enzymes used most commonly in genetic engineering recognize signals consisting of six bases, often palindromes. Over 400 restriction enzymes with 100 different specificities have now been isolated, many of which are in regular use for recombinant DNA technology. They are named according to their organism of origin; Eco RI is derived from E. coli for example.

 

Gene mapping

Restriction endonuclease mapping, or Southern blotting as it is usually called, after its inventor Edward Southern, has become a major tool for the analysis of genetic diseases (Fig. 3) 2897. DNA is obtained from any available tissue, usually from peripheral blood white cells, and, after purification, is treated with a particular restriction enzyme. The mixture of fragments is then subjected to electrophoresis on an agarose gel. After separation of the fragments according to their size, the DNA in the gel is denatured by alkali treatment and the separated fragments are transferred to a nitrocellulose filter. The filter is then exposed to a radioactively labelled gene probe. The position of the fragments containing the gene of interest is then determined by autoradiography. By using a series of different enzymes which cleave DNA either within or outside the gene or genes we are studying, and by orientating some of the fragments in the appropriate direction, it is possible to build up restriction enzyme maps of areas of the genome. The power of this technique for studying human diseases is quite remarkable. From the white blood cells from as little as 5 ml of blood it is possible to obtain sufficient DNA to analyse any normal or mutant gene for which we happen to have an appropriate probe.

 

Gene cloning and the preparation of gene libraries

The insertion of foreign DNA into bacterial plasmids or bacteriophage is the keystone of recombinant DNA technology. Plasmids are closed, circular DNA molecules which replicate autonomously in bacteria. A plasmid commonly used for this type of work is illustrated in Fig. 4 2898. It has an origin of replication, which means that it can be replicated in a bacterium by exploiting the latter's DNA synthesizing machinery. It usually contains one or two genes for antibiotic resistance, and sites where restriction enzymes can cleave the DNA circle, so opening it up to produce a linear molecule. The DNA to be inserted into the plasmid is fragmented by the same restriction enzyme (Fig. 5) 2899. Plasmid and DNA fragments are then mixed and associate with each other by virtue of the ‘sticky’ ends of the DNA. A permanent union is achieved by adding an enzyme called DNA ligase. Some plasmids rejoin and form the original circular DNA but others, recombinants, incorporate the foreign DNA. Suitable bacteria are then transformed by the plasmids (that is plasmids and bacteria are mixed and a small number of plasmids enter the bacteria). The frequency of transformation is such that each bacterium usually contains only one plasmid. The latter are selected by a variety of microbiological tricks, usually by allowing the recombinant plasmids to confer antibiotic resistance on their bacterial hosts and growing the latter on selected media. Bacterial colonies can be screened by hybridization with appropriate gene probes for the presence of foreign DNA inserts, and when such a colony is identified it can be grown in large quantities to provide the required DNA fragment.

 

In this way it is possible to prepare gene libraries. To make a genomic library, DNA is prepared so that it consists of fragments of greater than 100 kilobases (a kilobase (kb) is one thousand nucleotide bases). These pieces are then digested with restriction enzymes so as to provide a random assortment of pieces of DNA. The fragments are inserted into an appropriate vector. Three types of vectors are now commonly used: plasmids, bacteriophage, and cosmids. Plasmids have the disadvantage that they can only be used to clone a piece of DNA of less than 10 kb. On the other hand, bacteriophage (bacterial viruses) can accommodate fragments of 10 to 20 kb. As this field has progressed it has become necessary to be able to handle even larger-sized DNA fragments. A number of ingenious approaches have been developed. One particularly valuable system involves cloning in cosmids. A cosmid is an artificial vector produced by genetic engineering which consists of plasmid DNA packaged into a phage particle. Another valuable technique has been derived from yeast genetics and makes use of the development of methods for taking apart and putting together entire chromosomes. It has been possible to exploit this technology in a novel and ingenious way to develop cloning vectors called yeast artificial chromosomes. It turns out that yeast artificial chromosomes can accommodate human DNA fragments of hundreds of kilobases in length.

 

Gene libraries may contain hundreds of thousands of different recombinants, each representing roughly one gene attached to a plasmid or a bacteriophage DNA. In order to select a particular colony or plaque containing a desired gene from a bacterial plate a technique called colony hybridization is used. A nitrocellulose filter is placed over the bacterial colonies or phage plaques. This absorbs a small amount of DNA. The filter is then incubated under hybridization conditions with a radioactive DNA probe complementary to the sequence of the gene being sought. After the excess probe has been washed away the filter is exposed to an X-ray plate; the position of the desired colony is indicated by a mark on the plate. In constructing libraries care is made to ensure that the entire genome is accurately represented.

 

A variety of ingenious methods have been used for developing probes for screening libraries. We have already mentioned how it is possible to construct cDNA probes provided purified mRNA is available from the gene that we wish to find. However, this is often not the case and other approaches are required. For example, in many cases we will wish to identify a gene whose mRNA constitutes only a very tiny percentage of the total RNA of the cells in which it is expressed. One approach to this problem is to try to determine at least part of the amino acid sequence of the particular gene product and then to synthesize short (oligonucleotide) probes with sequences deduced from the structure of the particular protein. But sometimes nothing is known about the amino acid sequence of the product of a particular gene that is being sought. One way round this difficulty is by immunological purification of the appropriate mRNAs. Newly formed proteins start to form their three-dimensional structures as they are being assembled on ribosomes. If a suspension of polyribosomes is incubated together with antibodies against the protein product of a gene we wish to find, antigen/antibody complexes form only with those polyribosomes that are producing the particular protein. By using a type of affinity chromatography, it is possible to harvest the polysomes that are bound to the antibody and then to isolate mRNA from them. A variety of other extremely ingenious methods have been devised to isolate low-abundance mRNA. Finally, it is sometimes possible to isolate human genes by transferring genomic DNA into mouse fibroblasts; cells carrying genes coding for proteins that are expressed on the cell surface, such as T-cell-specific antigens, can be identified by fluorescent-antibody screening of the cell population.

 

Cloned DNA can be used for many purposes. Individual genes can be isolated and sequenced, probes for gene mapping can be prepared, and a start has been made in devising transcription systems for analysing the function of abnormal genes in the test tube. Finally, and with enormous potential for the future, cloned genes can sometimes be persuaded to transcribe their products in bacteria.

 

Gene sequencing

Now that genes can be isolated by cloning, the development of rapid methods for DNA sequencing has made it possible to determine the molecular basis for many single-gene disorders. There are two commonly used methods for DNA sequencing, developed independently by Maxam and Gilbert in the United States and Sanger in England. Both require the initial fractionation of DNA but from then on they are fundamentally different. Maxam and Gilbert use a degradative technique while Sanger uses a synthetic method based on stopping the synthesis of a DNA chain at a particular point rather than breaking it. Readers who wish to learn more about these techniques are referred to the monographs cited at the end of this chapter.

 

Speeding up the analysis of human genes

Recently some ingenious techniques have been developed for increasing the speed of analysis of human DNA. The most important is called the polymerase chain reaction (PCR), which is designed to amplify any short DNA sequence over a period of a few hours. Indeed, such is its power that it is possible to amplify sufficient DNA from one or two cells to obtain a genetic diagnosis within 24 hours. The principle of PCR is illustrated in Fig. 6 2900. This method has already had many important uses, not least in the development of extremely rapid methods for gene sequencing. It has also allowed short regions of DNA containing mutations to be amplified and analysed with oligonucleotide probes, thus greatly facilitating the diagnosis of genetic diseases.

 

Studying the function of isolated genes

There are several ways to study the function of isolated genes, none entirely satisfactory. First, using what are called transient expression systems, it is possible to insert genes into cells and to study both the quantity and structure of their mRNA transcripts. DNA can be inserted into a cell in the form of calcium microprecipitates, although this is very inefficient. More recently, several types of virus-derived vectors have been used to study the expression of human genes in mammalian cells. A completely different approach is to introduce genes into established cultures of cells of appropriate lineage for the genes we wish to study. For example, human haemoglobin genes can be inserted into mouse erythroleukaemia cells. This can be done either by using purified genes or by transferring intact human chromosomes by techniques of cell fusion. The latter approach usually entails making use of a product of the particular human chromosome to ensure that it confers a selective property that will allow the hybrid cells to grow in culture. Finally, genes can be introduced into embryos by microinjection so that their patterns of integration and expression can be studied over several generations. Currently, the study of transgenic mice derived by this method is providing some extremely important information about the regulation of the expression of genes in different tissues.

 

THE HUMAN GENOME AND REVERSE GENETICS

As early as 1927 J. B. S. Haldane reasoned that if it were possible to map 50 or more inherited characters, they could be used as markers for predicting whether children would carry genes for conditions such as Huntington's disease. The idea is beautifully simple. Supposing we want to follow the progress of a particular genetic trait through a family but have no way of identifying it. The thing to do is to find a gene that we can easily identify and which is linked to the gene for which we are looking. If the two are so close together on the same chromosome that they always pass together through successive generations, we now have a ‘handle’ on the gene that we can't identify; if the marker gene is inherited, so must the gene that is closely linked to it. It follows, therefore, that if we know the chromosomal location of our marker gene we can use this approach to find any gene that happens to be linked to it. This is the idea behind the idea of generating a complete map of the human genome in which there are linkage markers spread at convenient distances which could lead us to any gene that we wish to find. Such a map would be called a genetic map. The other type of map that we could prepare would be a physical map, that is, one that shows us the structure of the genome. The ultimate physical map would, of course, entail sequencing the entire genome.

 

Before setting out to produce a map of anything, it is useful to have a rough idea what kind of distances are involved. In fact they are very large. Current estimates of the human genome put it between 3 and 3.5 × 10&sup9; base pairs. It has also been estimated that there may be somewhere between 50 000 and 200 000 important genes to be found and mapped. Furthermore, it is apparent that over half the human genome consists of non-coding DNA of no known function, so-called ‘junk DNA’. Given these rather daunting statistics where might we start in our efforts to map the human genome?

 

Until the recombinant DNA era, the major difficulty for gene mappers was the lack of markers. There were a few protein markers such as blood group antigens, serum protein polymorphisms, and so on. But they were never sufficient to even start making a map. However, as soon as restriction enzymes were discovered and human DNA was digested, it became clear that all of us show remarkable variability in the structure of our DNA. Single base changes, which are in themselves harmless, can be identified by the altered cutting sites for restriction enzymes. Thus the size of the fragments of DNA generated by such enzymes will vary. This is the basis for what are called restriction fragment length polymorphisms (RFLPs). They offer extremely valuable genetic markers and, if we can identify their chromosomal location, they are excellent markers for hunting genes by linkage analysis. But it turns out that things are even better than this because there are regions of DNA scattered about the genome which are highly polymorphic. Such regions often represent blocks of repeated segments of DNA which vary in length from person to person. Such mini- or microsatellite DNA has turned out to be a particularly valuable source of linkage markers.

 

Physical mapping has also moved forward quickly. This can be carried out at various levels. At low resolution one of the most useful approaches has been a technique called somatic cell hybridization. If human cells are mixed with rodent tumour cells grown in culture together with sendai virus, they tend to fuse together. After fusion the chromosomes of each of the cells become mixed together, and subsequently many of them are lost from the now hybrid cell; human chromosomes are preferentially lost in a random fashion. Thus it is possible to propagate cells in culture that only contain a limited number of human chromosomes and hence to build up a panel of such somatic cell hybrids. These can be used to assign human genes to particular chromosomes by looking for the products of the particular gene in the hybrid line that only contains one or a few human chromosomes.

 

Genes can also be assigned to chromosomes by a technique called in-situ hybridization in which a radioactive probe for the particular gene is used to hybridize directly to complementary sequences on a particular chromosome. Recently, the development of highly sophisticated microscopic techniques has revolutionized the field of chromosome analysis. For example, the development of multichannel confocal fluorescent microscopy has made it possible to label entire chromosomes with chromosome-specific libraries, a pastime called ‘chromosome painting’. Furthermore, equally sophisticated techniques have been developed for sorting human chromosomes and isolating them.

 

At higher resolution, physical mapping involves the isolation of chromosomal DNA, fragmentation of the DNA by restriction enzymes, the generation of a library of cloned fragments, and the ordering of the clones to reflect the original order of the particular fragments along the chromosome. This approach can be used to link physical to genetic maps. For example, starting with an RFLP marker it is possible to build up a series of overlapping phage or cosmid clones and, in essence, to walk along the chromosome from a starting point to a gene that we wish to find. All these techniques have been facilitated recently by our increasing ability to deal with large pieces of DNA by cloning into yeast, as mentioned earlier in this chapter. Finally, it is often possible to obtain a clue as to where a particular gene might be by finding a patient with a particular phenotype associated with a chromosomal abnormality, such as a deletion. In such cases it is reasonable to assume that the patient's clinical picture may be related to loss of particular genes in the deleted region of DNA. Thus the deletion can act as a starting point for finding a particular gene.

 

Mapping techniques of this type have led to some remarkable success stories in clinical genetics. In particular, it has been possible to discover the cause for some important genetic diseases. The first step is to define an RFLP linkage, which may put us within a few million bases of the gene that we are looking for. Next, by various chromosome walking or jumping techniques it is possible to move towards the gene, and finally to define it. The next step is to sequence the gene and then to make an educated guess as to the likely protein product which would be produced by such a sequence. Next, the mutations in the gene are determined, and, finally, the function of the particular product is worked out from its protein sequence and insights are gained into how this might be affected by the particular mutations. The discovery of the genetic defects in Duchenne muscular dystrophy and cystic fibrosis are prime examples of the power of this approach.

 

CLINICAL APPLICATIONS OF RECOMBINANT DNA TECHNOLOGY

It is beyond the scope of this chapter to outline all the possible clinical applications of recombinant DNA technology. The main areas is which this technology will be applied to medical research and practice are summarized in Table 1 725. In the following sections a few examples will be considered.

 

The molecular pathology of single-gene disorders

Considering that it is only a few years since the first human gene was cloned and sequenced, it is astonishing how much progress has been made in unravelling the molecular pathology of single-gene disorders. Studies of the inherited haemoglobin disorders have been particularly informative and have already given us examples of single base changes, deletions of one or more bases or of entire genes, insertion of new genetic material, inversions of stretches of DNA, mutations of regulatory regions which control the transcription of genes, and base substitutions which interfere with the processing or translation of messenger RNAs.

 

Broadly speaking, there are two main classes of mutations. First, there are those that result in an altered protein product, most of which involve a single base substitution which causes a single amino acid substitution, so-called mis-sense mutations. In many cases they have no ill-effects, but if they change the function or stability of the protein they may cause a clinical disorder. The other group is made up of mutations which cause defective synthesis of proteins without changing their structure.

 

Work on the thalassaemias, and more recently on other single-gene disorders, has told us in detail how a single base change in a gene can profoundly modify its output. For example, it may produce a premature stop codon, so that when messenger RNA is translated, shortened and therefore functionally useless peptide chains are produced (Fig. 7) 2901; such lesions are called nonsense mutations. Because amino acids are encoded by a triplet code, the loss or insertion of one, two, or four bases in a gene throws the ‘reading frame’ out of sequence; its messenger RNA cannot be translated beyond the frameshift (Fig. 8) 2902. As mentioned earlier, most genes have their coding regions (exons) divided up into several pieces by lengths of DNA of unknown function called introns. Since primary RNA transcripts contain both intron and exon sequences, the introns have to be cut out and the exons precisely spliced together before messenger RNAs move into the cell cytoplasm. Several types of thalassaemia and some other single-gene disorders result from mutations which interfere with the splicing mechanism. Single base changes at the junctions between introns and exons may prevent splicing; no normal messenger RNA is produced. More surprisingly, it turns out that base changes within introns or exons can produce alternative splice sites, which result in the production of both normal and abnormally spliced messenger RNA; the latter cannot be used as a template for peptide chain synthesis. Finally, as mentioned earlier, all mammalian genes have sequences in common at their 5′ flanking regions which play a critical role in the regulation of transcription. Base changes in these regions can reduce their rate of gene transcription.

 

Thus we may well already have a reasonable idea of the repertoire of the molecular mechanism underlying single-gene disorders. Recent studies of the appropriate genes of patients with Christmas disease, haemophilia, Duchenne muscular dystrophy, cystic fibrosis, growth hormone deficiency, antithrombin III deficiency, and low-density lipoprotein (LDL) receptor deficiency support this prediction; the molecular pathology of these disorders is turning out to be very similar to the thalassaemias.

 

Common polygenic diseases

When we consider common disorders such as diabetes, degenerative arterial disease, autoimmune disease, and the major psychoses, the applications of recombinant DNA and cell biology are less obvious. Many of them have a complex polygenic basis and, in addition, environmental factors play a major role in their aetiology. Why do we want to try to determine the important genes involved in these disorders?

 

Studies that have analysed the occurrence of common diseases in identical twins have suggested that the genetic component varies considerably between disorders. For example, the common form of maturity-onset diabetes, type II diabetes, is a genetic disease; if one of a pair of identical twins is affected there is a very high likelihood that the other twin will become diabetic. Similar findings have been obtained in many common diseases, although often the genetic component is very much weaker. However, we know virtually nothing about the aetiology of these disorders, except that in some cases environmental factors may play an important role: smoking and diet in vascular disease, for example. It is likely, therefore, that if we were able to identify a few important genes that are involved in making us more or less susceptible to the action of these environmental agents, and we were able to understand the function of these genes and how it varies between susceptible or resistant individuals, we might gain considerable insights into the underlying causes of these diseases and how such genetic variability interacts with the environment. Similarly, it might be possible for us to identify individuals who are particularly susceptible to environmental agents and hence to develop more focused public health programmes for the prevention of these diseases.

 

There are several different approaches to attempt to define the major genes in these complex polygenic systems. One, and probably the least rewarding, it to find large families in which more than one member is affected. By using random RFLP linkage markers and carrying out extensive family studies it might be possible to obtain a linkage to a susceptibility gene which could then be identified by reverse genetics; this is extremely time consuming and difficult, and so far has rarely been successful.

 

Another approach is to try to make an educated guess about which genes might be involved. For example, in coronary artery disease it would be reasonable to suppose that genes which are involved in cholesterol or vessel wall metabolism might be important players. Thus, having isolated these so-called candidate genes and obtained appropriate RFLP markers, family studies can be carried out to see if any polymorphisms segregate with a particular disease. Another way to tackle this problem is through mouse genetics. There is a large amount of information about the location of genes in the mouse genome and there are many mouse models for common human diseases. Furthermore, although many genes in the mouse are on different chromosomes to man, there is already considerable information about the equivalent chromosomal locations of genes between mouse and man. It is relatively easy to carry out breeding experiments in mice, and therefore the loci involved in some of the diseases that resemble their human counterparts can be defined quickly.

 

Using these different approaches, considerable progress has been made already in defining some of the genes that are likely to be involved in susceptibility or resistance to common diseases. For example, in type I diabetes it is clear that there are at least two major gene systems involved, one being the class II genes of the HLA DR system and the other a locus close to the insulin gene, although probably not insulin itself. At least five loci have been found to be involved in producing the phenotype of non-obese diabetes in mice, a condition similar, though not identical, to human type I diabetes. All these mouse genes will have their human homologues. It is of particular interest that one of them seems to be involved in susceptibility or resistance to a variety of infections.

 

Useful progress has also been made in defining some of the major genes involved in susceptibility to coronary artery disease. For example, certain polymorphisms of the apolipoprotein genes which are involved in cholesterol and lipid metabolism seem to be strongly related to the development of premature vascular disease and, interestingly, obesity. It also seems likely that genetic variability in the coagulation system, particularly fibrinogen production, may be an important factor in the generation of coronary artery disease and myocardial infarction. No doubt a number of other genetic systems will be involved.

 

This is slow and difficult work, and progress in this important field will undoubtedly be easier once we have a detailed linkage map of the human genome. Indeed this area of research is probably one of the best arguments in support of the human genome project.

 

Cancer

Another area of considerable potential for recombinant DNA technology is the study of the mechanisms of malignant transformation. Here we are moving into a different area of human genetics called somatic cell genetics.

 

Undoubtedly the pathogenesis of cancer involves even more complex interactions between environment and genome. Familial cancers are rare, and very little is known about inherited susceptibility to common cancers. However, it is becoming clear that malignant transformation involves fundamental changes in the genome of individual cells which are passed on to their progeny. The current excitement in the cancer field has arisen because the new tools of cell and molecular biology have made it possible to synthesize several long-standing observations about the epidemiology and cytogenetics of cancer into a working model of what may go wrong in a neoplastic cell.

 

One of the most important threads in the development of this story was the discovery of oncogenes. Oncogenic viruses can produce neoplastic transformation in a variety of ways. Although in many cases this is achieved by their insertion close to a critical host regulatory region, insertional mutagenesis, it is also clear that some of them carry specific genes which are involved in producing a neoplastic phenotype. The discovery that has transformed the cancer field was that these viral oncogenes have their equivalent in the cells of almost all species studied. It appears as though the viruses have picked up normal cellular genes during evolution and that they have developed their oncogenic properties in their new home. The cellular homologues of viral oncogenes (v-onc), cellular oncogenes (c-onc), appear to be part of a cell's normal genetic machinery, responsible for the control of proliferation, differentiation, and development.

 

There is increasing evidence for unusual activation of c-onc genes in different human cancers. In some cases this seems to result from structural alterations in the oncogenes, while in others it may follow a change in their chromosomal location, as part of a chromosomal translocation specific for a particular type of tumour for example. Patients with chronic myeloid leukaemia carry an abnormal chromosome called the Philadelphia (Ph&sub1;) chromosome which usually results from a translocation between chromosomes 9 and 22 with breakpoints at 9q34 and 22q11. It turns out that this translocation involves the movement of an oncogene, c-abl, which is normally situated on chromosome 9. The breakpoint of this deletion has been analysed in detail. It involves the juxtaposition of a region of chromosome 22 to sequences near the 5′ end of c-abl. The region on chromosome 22 has been called bcr (break-point-cluster region). Apparently, the fused bcr–abl locus produces a giant primary messenger RNA transcript which, by a series of splicing events, gives rise to a novel messenger RNA of about 8.7 kb that includes the same sequence in every patient with chronic myeloid leukaemia.

 

Another tumour that has been studied in this way is Burkitt's lymphoma. The cancer cells have specific chromosome changes; 90 per cent of patients have an 8/14 translocation, while others have 8/2 or 8/22 translocations. Chromosomes 14, 2, and 22 carry the genes encoding the immunoglobulin heavy chain, and &kgr; and &ggr; light chains respectively. The cellular oncogene c-myc is located on chromosome 8. The breakpoint of all three translocations is at the site of the c-myc gene. Thus this important regulatory gene is transposed directly into regions of the genome which are undergoing major rearrangement during B-cell maturation.

 

These translocations cannot be the only event leading to Burkitt's lymphoma, however. This tumour is associated with Epstein–Barr virus infection and occurs commonly in parts of the world where malaria is endemic. Furthermore, only a small proportion of children who are infected by Epstein–Barr virus in these regions develop lymphoma. Research on DNA tumour viruses suggests that at least two separate genes are required to produce neoplastic transformation of normal cells, an observation that is in keeping with epidemiological evidence that the development of cancer is a multistep process. One of the genes enable the cells to go on growing indefinitely; the other produces changes in the properties of the cells that are associated with loss of growth constraint. Although the sequence of events in the genesis of the Burkitt's lymphoma is not known, it is tempting to speculate that infection with Epstein–Barr virus may immortalize the cells, and that perhaps this is more likely to happen when there is chronic antigenic stimulation due to malarial infection. Later events would include the chromosome translocation and, possibly, the abnormal activation of one or more oncogenes.

 

Another, equally remarkable, development in understanding the cellular basis of cancer is the observation that some childhood neoplasms are associated with small chromosome deletions or point mutations involving oncogenes. It appears that if we remain heterozygous for these lesions we may go through life never knowing we have them; the presence of a normal allele, or antioncogene, appears to be sufficient to maintain a normal phenotype. However, if there is a rearrangement of a deletion of the normal allele, leading to its inactivation, affected cells become homozygous for the absence of activity at these loci and may then undergo neoplastic transformation. Restriction enzyme analyses of tumour cells have shown that loss of heterozygosity of this type occurs in many cancers.

 

Recent studies of the evolution and pathogenesis of colon cancer provide an excellent example of the rapid speed of progress in this field. Analysis of families with familial polyposis coli have shown that this important premalignant condition is determined by a locus on the long arm of chromosome 5 (5q15.22). It is possible that inherited mutations of this gene make it more likely that patients will develop adenomata of the colon. It is also clear from the study of sporadic colon cancers that a number of other gene loci are involved. Another locus that seems to be a major player in this disorder maps to the long arm of chromosome 17 and has been identified as the oncogene p53. Yet another locus that is also often involved in colon cancer maps to the long arm of chromosome 18 (18q21.22). This has been called the DCC gene, standing for deleted colon cancer locus. Its product appears to be similar to certain key adhesion molecules involved in the interactions between nerve cells. Finally, mutations of the ras oncogene occur very frequently in colon cancer. From analyses of different parts of particular cancers, a picture is emerging that suggests that the development of colon cancer involves a minimum of six different mutations. Some of them, such as that involving the familial polyposis coli locus, may be inherited, while it is presumed that the majority reflect somatic mutations occurring during the lifetime of the individual, possibly as the result of environmental carcinogens. The precise order in which these mutations occur may not be important. It appears that what is essential for neoplastic transformation is a critical number of mutations.

 

In summary, therefore, even in the short time in which molecular technology has been applied to the study of cancer, some remarkable insights have been obtained about the general mechanisms of malignant transformation. It is clear that key genes involved in the regulation of growth and division of cells are involved. Sometimes we may inherit mutations that take us part of the way along the road to developing a cancer, but it is clear that a single mutation is rarely sufficient and that most cancers represent multiple different mutations and/or chromosomal rearrangements which combine to reduce a cell's normal facility for orderly division and differentiation.

 

Development, differentiation, repair, and congenital malformation

The most interesting question in human biology is how a single fertilized egg with its 10&sup9; base pairs of DNA turns into a human being. This field has enormous implications for all aspects of clinical practice. It is becoming clear that developmental work in such an apparently unpromising organism as the fruitfly, Drosophila, has important implications for understanding major developmental abnormalities in man. Like a lot of modern biological research, the ideas are not new but the availability of the tools of recombinant DNA technology is allowing them to be explored in a novel fashion. In 1894 Bateson suggested that the study of chance deviation in normal developmental patterns might provide clues about the rules that govern the regulation of development. This is turning out to be the case. For example, it has been found that the homeotic genes of Drosophila, which regulate the development of body segments, have DNA sequences in common with many other species, including man. Furthermore, the products of these genes are expressed in different tissues during mammalian development. Homeotic mutations in insects result in major developmental abnormalities, including substitutions of one or more segments normally found elsewhere along the body axis. Thus the discovery of the human equivalent of the homeotic genes suggests a particularly promising new area of research into human development and its abnormalities.

 

Another major area of advance in developmental genetics is the isolation of genes for proteins that are involved in the regulation of growth and differentiation and, incidentally, of particular interest for surgery, of repair. As well as more general regulatory molecules, such as the insulin-like growth factors, there are many proteins that are involved in the differentiation of specific tissues. For example, a whole battery of regulatory proteins has been isolated that plays a role in the complex programme in which haematopoietic stem cells divide and give rise to progeny that mature into red cells, white cells, and platelets. Similarly, many other important tissue-specific growth factors have been isolated and are available for the study of the regulation of growth and differentiation of particular cell populations. Further studies along these lines have important implications for such intractable problems as nerve repair and the control of regenerative and healing processes in general.

 

It has been known for a long time that many congenital abnormalities result from chromosomal defects. Hitherto, it has only been possible to identify gross abnormalities of this type, using light microscopy. Recently, however, it has become apparent that we may be able to identify much more subtle structural changes of chromosomes, submicroscopic deletions or insertions for example, by the use of restriction enzymes.

 

THE DIAGNOSIS AND TREATMENT OF DISEASE

It seems likely that the applications of recombinant DNA technology and cell biology will have practical implications for all aspects of clinical practice. Indeed, the pharmaceutical industry has moved rapidly into this field and is putting major efforts into developing biotechnology facilities for the production of diagnostic and therapeutic agents.

 

Diagnosis

Recombinant DNA and monoclonal antibody technology promise to revolutionize diagnostic medicine over the next few years. The earliest use of gene probes for diagnostic purposes was for the detection of carrier states for genetic diseases and for their prenatal detection using fetal DNA. This new technology promises to revolutionize preventative genetics and offers the possibility of controlling many inherited diseases.

 

Gene probes, together with the use of PCR, will have wide application in diagnostic pathology. Because of their extreme sensitivity and specificity they will be of particular value in microbiology and virology for the identification of micro-organisms. Already a variety of diagnostic kits have been constructed for this purpose. As more is learnt about the activation of oncogenes or the synthesis of abnormal gene products due to mutations of these genes as a cause of cancer DNA, probes will become increasingly valuable for the rapid identification of malignant transformation.

 

The other area of biotechnology that has important implications for diagnosis is monoclonal antibody production. Already diagnostic agents based on this technology have been established for a wide range of conditions, including pregnancy testing, the monitoring of ovulation, ovarian function, and the identification of a variety of infections including AIDS, hepatitis, and legionellosis.

 

Again, as our understanding of the cell biology of cancer increases, it should be possible to radiolabel specific probes or monoclonal antibodies for the imaging and more precise localization of tumours.

 

Treatment

There is already enough information to suggest that therapeutics will be changed dramatically by the use of recombinant DNA and monoclonal antibody technology. The ability to clone and express genes for human proteins in micro-organisms provides a remarkably effective way of producing large quantities of absolutely pure products for therapeutic purposes. One of the first success stories in this field was the development of recombinant erythropoietin, which is now in routine use for treating the anaemia of chronic renal failure. A number of other valuable recombinant proteins are now being used in clinical practice, including &agr; and &ggr; interferon, tissue plasminogen activator, human growth hormone, human insulin, and human factor VIII for the treatment of haemophilia. In recent years a variety of growth factors, lymphokines, and other biologically active mediators have been produced and their value in therapeutics is being explored at the present time. They include haematopoietic growth factors, which promise to be extremely valuable in the management of patients with bone-marrow depression following treatment for cancer of after transplantation.

 

Recombinant DNA technology also offers many possibilities for the development of new vaccines or for the replacement of vaccines which are in present use. Recent innovations include subunit vaccines, which include the components causing antibody production but which exclude those which give rise to a particular disease; and anti-idiotype technology, which involves the production of antibodies to the actual antibodies to the disease-causing agent. A vaccine against hepatitis B was one of the early successes of biotechnology. A variety of monoclonal antibodies have been developed for therapeutic use and, by some ingenious genetic engineering, it has been possible to ‘humanize’ rat monoclonal antibodies. These agents have a wide range of uses in the treatment of infectious disease and cancer.

 

Recombinant DNA technology has provided the basis for the newly developing field of human gene therapy. It is possible to insert genes into foreign cells, either directly or by using vectors such as retroviruses. The discovery of the major regulatory regions for many human genes, together with the development of safer and more effective retrovirus vector systems, has raised the possibility that gene therapy may be with us in the near future. Indeed, it has already been possible to insert the appropriate gene into the lymphocytes of children with an immune deficiency disorder and improve their function so that these children are protected against infection. It seems likely that many single-gene disorders will be amenable to gene therapy, particularly those that are expressed in haematopoietic stem cells.

 

Another promising therapeutic area has the slightly less promising title of ‘antisense’ DNA technology. As we saw earlier, when a gene is transcribed it is from a single strand of DNA which is called the sense strand. By making short pieces of DNA with the sequence of the complementary strand it is possible to ‘switch off’ a particular gene. These reagents bind to the messenger RNA of the particular gene because they have sequences which are complementary to that of the gene from which the RNA is transcribed. Thus it is hoped selectively to turn off particular genes, such as those involved in malignant transformation.

 

Another developing area of DNA technology in medical practice is for screening foodstuffs for chemicals or microbial contamination. This involves the use of both gene probes and monoclonal antibodies.

 

There are numerous other potential products from the biotechnology industry. The dream of harmless alternatives for blood products is closer with the recent finding that it is possible to persuade yeast to synthesize human haemoglobin. Vascular implants can be made more compatible by treatment with cells, growth factors, and collagen. The problem of graft rejection is being tackled by removing T cells, using a specific antibody bound to magnetite to permit magnetic removal.

 

THE OVERALL IMPACT OF HUMAN MOLECULAR BIOLOGY

Medical practice in the developed countries

The overall effect of molecular medicine on clinical practice in the developed countries is difficult to forecast; it is unlikely to change it overnight. Nor will it lessen the need for skill at the bedside or the holistic approach to patient care. The most immediate impact will be in preventative genetics, and the production of diagnostics, vaccines, blood products, and a wide variety of therapeutic agents. It is too early to predict whether a greater understanding of the molecular pathology of degenerative arterial disease, autoimmune disease, cancer, congenital malformation, or neuropsychiatric illnesses, which should follow the application of DNA technology to these problems, will have a major impact on their prevention or management. But since we have largely failed to control these diseases, except by high-technology patch-up procedures, this is surely the way that basic medical research must go in the future. Epidemiological studies have already taught us how we might reduce the prevalence of some of these diseases by changes of diet, stopping cigarette smoking, and other modifications of lifestyle; we now need to understand the molecular and cellular basis of individual susceptibility to these bad habits and, based on this information to find out whether the diseases that they cause can be controlled by a more logical approach.

 

Medicine in the Third World

What will the new tools of biotechnology do for the problems of the developing countries? Again, the most immediate application is for the prevention of genetic disease. Recent figures from the World Health Organization suggest that there are hundreds of thousands of children born each year with sickle-cell anaemia or thalassaemia. If, for example, all the thalassaemic children born in Cyprus were treated with regular blood transfusion and iron-chelating drugs, the health budget on the island would be doubled in the next 15 years, just in treating this one disease. Programmes for prenatal diagnosis of these conditions in the second trimester by fetal blood sampling and radiolabelling of globin chains have already been set up in many developing countries, and the more recently developed techniques of chorion villus sampling and direct fetal DNA analysis are now being incorporated into these programmes. Current data suggest that these interventions have led to a remarkable revolution in the births of affected infants over the past 10 years. This is very encouraging because as the high mortality rate in the first year of life due to malnutrition and infection is controlled in developing countries, genetic diseases will pose an increasingly serious problem; the World Health Organization predicts that by ad 2000 about 7 per cent of the world's population will be carriers for important genetic haemoglobin disorders.

 

But in a world in which millions of children die each year of starvation, the major medical application of recombinant DNA technology should be for improving food supplies and for developing vaccines and diagnostic agents for parasitic and infectious diseases which, together with malnutrition, still look like being the main killers of the twenty-first century.

 

Broader implications

The advent of recombinant DNA technology has raised the expectation that as we gradually gain control over our genome we may be able to modify the human phenotype, more or less as we please. We have been regaled by television accounts of potential parents walking round ‘gene supermarkets’ stocking up their trolleys with genes that they would like to see expressed in their children, and fears of eugenics, and memories of Nazi Germany, have been raised again. Biological determinism is already having a major impact on sociobiology, and the dubious science that underlies this philosophy is providing a convenient peg on which extremist political groups are basing their views on how society should be regulated. But we do not have the faintest idea about the nature of the complex interactions of genome and environment that underlie human behaviour. Indeed, it is debatable whether we shall ever be able to explain much of human behaviour in terms of a DNA sequence. Perhaps in the long term our exploration of the human genome will provide some real insights into why we are what we are, but it would be unwise to pin our hopes for the future on this expectation. For the moment we need to develop an ongoing debate with the public about how far we wish to move in the modification of our genomes for medical advances.

 

FURTHER READING

Alberts B, Bray D, Lewis J, Raff M, Roberts K, Watson JD. The molecular biology of the cell. 2nd edn. New York: Garland: 1989.

Lewin B. Genes IV. Oxford: Oxford University Press, 1990.

Singer M, Berg P. Genes and genomes. California: University Science Books, Oxford: Blackwell Scientific Publications, 1991.

Weatherall DJ. The new genetics and clinical practice. 3rd edn. Oxford: Oxford University Press, 1991.

Хостинг от uCoz