Sci10_Course_Material

Life on earth

(from “The Creation of Life: From Chemical to Animal”)

by Andrew Scott

…the DNA double helix and replication based on complementary associations, genetic coding between nucleic acids and proteins: such are the great secrets of life at the molecular level.

~Jacques Ninio

Scientists interested in the origins of life on earth are embroiled in a complex detective story in which the mystery to be solved is us. How did we, and all other living things, ever come to be here? A first essential step towards solving that mystery is to find out everything we can about what we are and how we work. Answers to these questions will not only make it easier for us to speculate on the past origins of life on earth, they should also help us to consider how life elsewhere or new life in the future might originate. So in this chapter I want to consider how the living things of the modern earth manage to live, leaving the questions of how they might have originated until later. If the material in this chapter is all new to you, then you may find it a bit of a challenge. Hopefully it will be a stimulating challenge, because it will reveal to you the chemical ‘secrets’ of life, secrets which are in general much simpler than you might suppose.

Humans have puzzled over the mysteries of their own existence ever since there were humans around to puzzle, but reliable answers have been very slow in coming. Only over the last few decades have we begun to understand the complexities of physics and chemistry that makes us work. Many profound mysteries remain, especially concerning the minds that do the puzzling, but the barriers between puzzlement and perception do seem to be crashing down at last.

Our living bodies, and the bodies of all other living things, appear to be marvelously intricate chemical machines. We may not be ‘mere’ machines, but machines we certainly are. Actually, creatures such as ourselves are confederates of many millions of individuals chemicals machines working in cooperation, because the basic machine which makes us work is the living cell.

The living cell

Even those unenlightened souls whose main memories of school science are that chemistry laboratories smell, and dissected dogfish smell even worse, usually also remember that the basic unit of life is the ‘cell’. The difference between ourselves and all other creatures boil down to the types and numbers of cells we contain. An amoeba is made of a single cell which does little else but seek out food, take it in, and use it to grow and multiply. A human is made of around ten trillion cells, divided into many different types each specialized to do different things.

Although cells can differ enormously in what they do and what they look like, they all have a common core of essential features which let them work. Figure 3.1 outlines these features, using a cell from a higher organism’ such as ourselves as an example. The single cell of ‘lower organism’ (such as bacteria) have a simpler structure lacking a separate ‘nucleus’, for example – but the basic chemistry that makes them work is very similar.

Figure 3.1 Living cells contain genes (made of DNA) which direct the manufacture of proteins. Proteins are the chemical catalyst (‘enzymes’) which speed up and direct the chemistry of life (although they have other roles to play as well – see text). RNA is a very similar chemical to DNA, which serves as a working copy of a gene during the process of protein manufacture.

Remember that all the structures labeled in the figure and described in the text made up of atoms, or molecules (atoms chemically bonded together), or ‘ions’ (atoms or molecules that carry a net electrical charge by virtue of having lost or gained one or more outer electrons). And all the changes and interactions I will describe take place because they are essentially chemical reactions driven by the tendency of energy to become evenly dispersed throughout the universe.

The cells of higher organisms are split in two components – the ‘nucleus’ and the ‘cytoplasm’, thanks to the presence of the nuclear membrane, and it is inside the nucleus that we find our ‘genes’. Many people who know nothing about the chemical nature or behavior or genes are nonetheless well aware that they play a central role in determining the structure and activities of living things; and also that they make each generations of cells or organisms resemble their ancestors. Genes, in other words, are agents of heredity, ensuring that all mice look like mice and all men look like men.

Genes are actually distinct regions of incredibly long thin molecules of the chemical ‘DNA’ (which stands for DeoxyriboNucleic Acid). Some organism, such as bacteria, contain only one main DNA molecule. Our own cells, and those of all other higher organisms, contain a number of separate bundles of DNA, each known as a ‘chromosome;’ and the DNA of each chromosome contains many genes.

So what do genes do that makes them so important? Very simply, they carry the ‘information’ (in a chemically coded form) needed to make an organism, and they pass that information on to subsequent generations of cells. The information carried by genes is called ‘genetic information’ and it can be thought of as the precise ‘instructions’ needed to construct another vitally important class of chemicals – the proteins. Broadly speaking, one gene is a section of DNA molecule that contains the information needed to make one particular type of protein molecule. The full complement of genes contained within a cell’s DNA is known as its ‘genome’, and the human genome probably contains about 100 000 distinct genes, all present within every one of our trillion cells.

If the importance of genes is that they contain the information needed to make proteins, the next big question is what do proteins do? Proteins actually perform a wide range of essential task within cells and organisms, but by far their most important function is to act as the molecules that actually construct and maintain all cells. The proteins that do this job of cell construction and maintenance are known as ‘enzymes’. And they perform their seemingly miraculous task in a very simple way.

Enzymes are chemical ‘catalysts’ – with a catalyst being defined as something that speeds up a particular chemical reaction while itself remaining unchanged in the process. I have already emphasized that every living thing is a chemical machine, which implies that the overall activities of a living organism are the result of many interacting chemical reactions. Now countless different chemical reactions could take place between the chemicals found inside cells, most of which would not give rise to life; so for any meaningful form of life to be assembled from all various possibilities there must be some way of encouraging desirable reactions, preventing (or at least not assisting) undesirable ones, and ensuring that all the detailed chemical reactions of life take place in the right places, at the right times, at suitable speeds and in the correct sequence. This is the job of the enzymes. (see figure 3.2)

Each of the thousands of chemicals reactions that combine to make you, me, a mouse or a bacterium, is catalysed by a particular enzyme. Without the help of catalysis by enzymes many of these reactions would never get going at any reasonable pace at all. It is the enzymes which make the integrated chemistry of life possible. They bring order, structure and balance to the chemical ‘soup’ within our cells. So by containing the information needed to make enzymes (and other proteins), genes ultimately determine the structure and activities of all cells and organisms. But you should bear in mind that enzymes are not the only proteins that matter, for proteins do lots of important things other than speeding up chemical reactions. We will explore some of these other things a bit later.

Figure 3.2 The importance of enzymes. Each enzyme speeds up a particular chemical reaction required for life (solid lines), while giving no help at all to undesirable reactions (dashed lines). See also plate 4.

The overall message so far has been that life on earth is based on genes that direct the manufacture of proteins, and the proteins act together to construct all living things. Before delving more deeply into the activities of genes and proteins, I should briefly mention the other class of chemical shown in figure 3.1 – RNA. RNA (which stands for RiboNucleic Acid) has a very similar structure to DNA, and in figure 3.1 it is shown performing one of its major roles in the cell – acting as a ‘working copy’ of the genetic information which travels out from the nucleus and into the cytoplasm, where protein assembly actually takes place. DNA is the ‘master copy’ of a cell’s genetic information, and it stays secure within the nucleus. RNA copies of the genetic information stored in any gene are made and transported out into the cytoplasm when required, and it is these RNA copies of genes that actually direct the production of proteins. As their full names imply (see above) both RNA and DNA belong to a class of chemicals known as the ‘nucleic acids’. We will be looking at their chemical structure more closely in chapter 4.

Genes and the double-helix

The obvious place to begin a journey through the working of a cell is at its DNA, which stores the genetic information needed to make the cell. There are two main requirements for any chemical that is to serve as a carrier of genetic information. Firstly, it obviously must be able to contain information; and secondly, there must be some easy way for this information to be copied, so that when cells multiply by dividing there will be a copy available for each of the two new cells. Let’s look at the structure of DNA, beginning with the entire molecule in all its complexity, and then simplifying it step by step to concentrate on only those of features that make it a good carrier of genetic information.

Figure 3.3A (and plate 1) is as close as I can get to showing you what DNA actually ‘looks’ like. It shows a short section of DNA with spheres representing all the atoms. DNA is made out of hydrogen, carbon, nitrogen, oxygen and phosphorus atoms, and real DNA molecules are incredibly long compared with their breadth. A single gene, shown to the same scale as figure 3.3A, would be at least 6 metres long (often much longer); and a single DNA molecule usually contains many genes.

Figure 3.3 The structure of the DNA double-helix.

About one and a half million life-sized copies of the short section of DNA shown in figure 3.3A would need to join up to form a DNA molecule 1 centimetre long. That demonstrates how very small atoms are, compared to us; and that when we investigate how life works we must deal with things which are unbelievably tiny compared to the world of our everyday experience.

Looking at all its individual atoms makes DNA seem very complex, but it can be readily be made to look much simpler. Even the complexity of figure 3.3A, however, can be used to point out the most celebrated feature of DNA’s structure – its ability to form a ‘double-helix’. Two helical ribbons of atoms, referred to as the helical ‘backbones’ of DNA, can be seen spiraling around the central core (part of one of them has been highlighted by drawing ‘lines on either side of it). The double-helix structure can be seen much more clearly in figure 3.3B. This considerably simplifies the structure of DNA by distinguishing between its two main regions – the helical backbones and the central core – and by representing them in diagrammatic form.

The structure of the helical backbones never varies, so these parts of the molecule cannot contain any information. Actually, they serve simply to hold the crucial central core in place. So we can represent all the atoms of the helical backbones simply as two twisting ribbons. The structure of the central core has obviously also been simplified in figure 3.3B, with all the atoms being replaced by mere the first initials of the different types of chemical groups that make up the core. The core of a DNA double-helix is composed of only four different chemical groups, called ‘bases’. These bases are ‘adenine’ (A), ‘thymine’ (T), ‘guanine (G) and ‘cytosine’ (C). The difference between different DNA molecules – the difference between the genes that make the proteins of mice or of men – simply involve the different sequences in which the four bases of DNA are arranged. So to understand how DNA works, we now only have to worry about the four bases – ‘A’, ‘T’, ‘G’, and ‘C’ – and the sequences in which they are arranged.

In figure 3.3C, the simplification process has been taken one final step. The helical backbones have been untwisted and are now represented by straight lines. This allows us to concentrate on the central bases which carry the genetic information. Having reached this stage, I should point out that a DNA double-helix is strictly speaking not a single molecule. But is composed of two separate DNA molecules wound around one another. In Ffgure 3.3C the molecule or ‘strand’ labeled X carries one particular sequence of bases, while the other strand X’, carries another. The two strands are held together by weak forces of electromagnetic attraction (the attraction between positive and negative electrical charge, remember) represents by dashed lines between the bases on opposing strands. So the double-stranded DNA double helix is held together by ‘base-pairs’, formed when the bases on opposing strands become linked by these weak bonds.

In only two steps the structure of DNA has been reduced from the real-life assembly of five types of atoms, to a simple array of letters (representing different chemical groups). Such arrays must somehow be deciphered to make living cells and organisms, but how?

The first clue to how DNA works is the hidden order present in its structure as shown in figure 3.3C. It isn’t very well hidden, but it might escape a first glance. Take a look at the particular bases that make up individual base-pairs. Wherever a ‘A’ appears on one strand, it is paired with a ‘T’ on the opposite strand (and conversely, all ‘T’s are paired up with ‘A’s). The same applies to all ‘G’s and ‘C’s – every ‘G’ is paired with a ‘C’ and every ‘C’ with a ‘G’. These ‘rules of base-pairing’ result from the chemical structure of the individual bases, and they never vary. Throughout all the DNA of a cell there are only two types of base-pair – ‘A’s paired with ‘T’s, and ‘G’s paired with ‘C’s – although these two types of base-pair can appear either way round. Any two DNA strand whose base sequences ‘match’ according to the rules of base-pairing, are called ‘complementary’ strands. Obviously, two DNA strands able to bind together in the form of a double-helix must be complementary to one another.

Now remember the two things that the genetic material of a cell must be able to do. It must be able to be easily copied, so that copies can be made available for future generations; and it must be able to contain information – the information needed to make specific proteins. Clearly, the information must take the form of variations in the sequence of bases strung out along length of a DNA molecule, but I will consider the copying problem first.

The answer to the copying problem stares you in the face as you examine figure 3.3C. the structure of DNA makes it easy to produce faithful copies of any particular double-helix, because all the information needed to make an entire double-helix is contained in either of its two strands. To appreciate this, imagine the two strands were pulled apart, and you were given only one of them (the strand can actually be pulled apart rather easily, because the bonds holding the base-pairs together are very weak). Given one strand and a supply of the four bases linked to the atom that make up the helical backbone you could quickly reconstruct the original double helix. You would simply need to pair up the appropriate bases according to the rules of base-pairing – pairing ‘A’s with ‘T’s, ‘T’s with ‘A’s, ‘G’s with ‘C’s and ‘C’s with ‘G’s (see figure 3.4).

Figure 3.4 Anyone provided with only one strand of a DNA double-helix and a supply of nucleotides (bases linked to atoms that form the ‘backbone’ of DNA) could easily re-create the original double-helix by following the rules of base-pairing.

The copying process outlined in figure 3.4 is very close to what actually happens when the DNA of living cells is ‘replicated’. The DNA is unwound or ‘unzipped’ by enzymes, and then other enzymes link up the required bases into the newly forming strands of DNA, paired up in the only way the base-pairing rules allow (see figure 3.5). The bases actually come already joined to the atoms that will form the new helical backbone, as molecules known as ‘nucleotides’.

Figure 3.5 the replication of double-helix DNA.

It is important to realize that although enzymes catalyse all of the chemical reactions involved in DNA replication, the actual specificity of the process is due to the structure of the bases themselves. The ‘correct’ base-pairs are formed, simply because they are the only ones that can form in a way that allows the available enzymes to link the new bases into the growing helical backbone. If the ‘wrong’ bases should pair up, they will not be in the correct position for the linkage reaction to occur – that is why they are ‘wrong’.

Having seen the answer to the copying problem, we can now turn to the other one – the central question of how DNA manages to contain the information needed to make specific protein molecules.

I have already said that the information carried by DNA takes the form of variations in the sequence of bases strung out along the length of the DNA double-helix. The bases can be present in any sequence at all, provided the base-pairing rules are satisfied between the bases on opposing strands. Obviously, the information needed to make protein molecules must be encoded in the DNA base sequence in some way. The possibilities for using a base-sequence code to direct the production of protein become more obvious when you take a look at the architecture of protein molecule.

Plate 3 shows you what a typical protein molecule actually ‘looks like’, and it suggests that once again ‘we are dealing with incredibly complex structures which might be very difficult to understand. But just as we were able to simplify the structure of DNA, to make it comprehensive, so we can also simplify the structure of proteins to reveal the common architectural plan they all adhere to.

All proteins are long chain-like molecules, as shown in figure 3.6C (the rest of the figure is explained shortly). A protein chain is made by linking together a many smaller molecules called ‘amino acids’, each composed of just 10 to 27 individual atoms. Each rectangle of figure 3.6C represents one particular amino acid molecule. A total of 20 different amino acids are used to make proteins, and most proteins contain several hundreds of them linked into the protein chain.

Once a protein has been made by linking up the correct amino acids in the correct sequence (all catalysed by enzymes of course), then the long chain usually folds up into a highly specific shape (represented schematically by figure 3.6C, and more realistically by plate 3). Only once a protein has adopted its final precisely folded form can it actually carry out its chemical task, such as acting as an enzyme which speeds up some specific chemical reaction vital to the cell.

It is crucial to realize that the folding process is determined entirely by the type of amino acids in the protein chain, and by the sequence in which they are arranged. The protein is pushed and pulled into its folded structure by electromagnetic forces of attraction and repulsion acting between the amino acid and the watery surroundings inside the cell. So once the correct amino acids have been linked up in the correct sequence, the job of making a protein is essentially complete. The protein chain will fold up into the precise three dimensional structure that the electromagnetic force pulls it into; and this will be the structure that allows it to carry out its highly specific biological task.

So things should be looking clearer and simpler now. The information needed to make a protein is stored in the form of a linear sequence of bases in DNA, and what that information must do is bring about the formation of a protein composed of a particular linear sequence of amino acids. To solve the problem of how genetic information in DNA is decoded into proteins, we simply need to know the relationship between the base sequence of a gene and the amino acid sequence of the protein it codes for, and how the first gives rise to the second. The essence of the answer can be stated very simply: each sequence of three bases in DNA can direct one particular amino acid to become linked up into a growing protein chain.

Figure 3.6 A summary of gene expression. The DNA of a gene is copied into a single-stranded RNA. This messenger RNA (mRNA) moves out to the cytoplasm and binds to a ribosome. The ribosome moves along the mRNA, allowing appropriate transfer RNAs (tRNAs) to bind to the exposed mRNA codons by forming base-pairs between the codons and complementary anticodons on the tRNAs. Each tRNA brings with it the amino acid encoded by whichever codon the tRNA can bind to. The amino acids are linked up to form the protein encoded by the gene.

That single sentence summarizes the results of a monumental amount of human effort and ingenuity. It tells us how to decode the ‘genetic code’ – the natural code which allows variations in the base sequence of DNA to direct the production of the protein molecules that go on to construct and maintain all living things. It will be explained to you in the next page or so, which may make tough reading if it is all new to you. If so, take things slowly, read each section over a few times and don’t give up! You are learning nothing less than how life on earth manages to live! First of all, we will follow the whole decoding process through in outline (see figure 3.6 A-C).

It begins with DNA – the master copy of the cell’s genetic information. A section of DNA encoding one particular protein is known as ‘gene’ and the first thing to happen when a gene becomes active is that a working copy of the gene is made in the form of RNA. From the figure you can see two obvious differences between the DNA of a gene and its RNA working copy. Firstly, the RNA is a single-stranded. It is actually a replica of just one of the strands of the double-helix (in the figure, the RNA is a replica of the left-hand strand of the unwound double-helix). Secondly, wherever the base ‘T’ appears in DNA, a different base known as ‘U’ appears in its RNA copy. ‘U’ stands for ‘uracil’, a very similar base to ‘T’ and one which forms a base-pair with ‘A’ just as ‘T’ does. So for our present purpose the bases ‘U’ and ‘T’ can be regarded as behaving identically.

RNA replica of one strand of the double-helix is made in much the same way as new copies of the DNA itself. The two strands of the double-helix become temporarily separated, allowing enzymes to use one strand (the right-hand one in figure 3.6) as a template on which nucleotides (i.e. bases linked to the backbone atoms, remember) can be linked together into a complementary strand of RNA. The production of this RNA is known as ‘transcription’, since the genetic message is being transcribed from a DNA version into an RNA version, and obviously it too depends on the rules of base-pairing to make it work. As the RNA copy of a gene is made, the double-helix snaps together once more, releasing the RNA.

The RNA copy is known as ‘messenger RNA’ (mRNA), since it then moves out from the nucleus and into the cytoplasm, carrying its genetic message from the cell’s central ‘data bank’ out to the site of protein synthesis in the cytoplasm. Proteins are actually constructed on large conglomerates of protein and RNA known as ‘ribosomes’ (figure 3.6B). A ribosome attaches to an mRA molecule, and then works its way along the message from one end to the other. Each time it passes a sequence of three bases, the appropriate amino acid (the one ‘encoded’ by the three bases) can be linked up by enzymes into the growing protein chain.

The amino acids are actually brought to the ribosome attached to another class of RNA molecules called ‘transfer RNAs’ (tRNAs). Each tRNA has a set of three bases which can form base-pairs with the three bases on mRNA that code for the amino acid the tRNA carries. The bases ‘UAC’ on mRNA, for example, can form base-pairs with the bases ‘AUG’ on the tRNA that carries the amino acid ‘tyrosine’. This means that the sequence ‘UAC’ in mRNA always causes tyrosine to be linked into a growing protein.

Figure 3.7 shows which amino acids are encoded by each of the sets of three bases (known as ‘codons’) that can be found in mRNA. Notice that three codons (‘UAA’, ‘UAG’ and ‘UGA’) do not code for amino acids, but instead indicate the point on mRNA at which protein synthesis should stop. Also, there is more than one codon available for most amino acids. The set of three bases on tRNA that can form base-pairs with the mRNA codons are known as ‘anticodons’, and obviously is complementary to its codon.

Figure 3.7 The genetic code table. The amino acids specified by each codon are represented by their common abbreviations.

So as ribosomes travels the length of an mRNA molecule, tRNA molecules binds to the codons as they become exposed at a special site on the ribosomes. The tRNAs bring along the amino acids encoded by the codons they can bind to, and as each amino acid arrives, it is linked up into the growing protein chain. Each tRNA is released from the ribosome when the growing protein chain is passed on to the next tRNA and the ribosome moves on. Eventually, the ribosome will reach one of the ‘stop’ signals, causing both the mRNA and the now complete protein to be released. The protein will fold up into the precise way determined by its amino acid sequence, and will then be able to begin to perform whatever chemical task its structure allows it to perform.

This process of protein synthesis is known as ‘translation’, since the genetic message is now being translated from the language of DNA and RNA (i.e. from the language of nucleic acids) into the language of proteins. This entire process of decoding a gene into a protein, involving both transcription (the production of the mRNA) and translation (use of the mRNA to make protein) is known as gene ‘expression’. And remember that all of the enzymes needed to catalyse gene expression are themselves produced by the expression of their genes; and all the other types of RNA, such as tRNAs and the RNAs found in ribosomes, are produced by the transcription of genes that encode them.

So DNA and RNA are needed to make proteins are needed to make DNA and RNA, and also to assemble proteins. This presents us with a ‘chicken and egg’ type of dilemma: how could DNA and RNA (or any similar nucleic acid) have first formed and been replicated without proteins; or how could proteins have first formed without the DNA genes and mRNAs needed to encode them, and without the other proteins needed to catalyse their manufacture? That is really the central issue facing scientist trying to explain the origin of life on earth – how did self-replicating systems of genes encoding proteins that make new genes and proteins… first arise? (See figure 3.8.)

Figure 3.8 The central molecules of life on earth are as dependent on one another as chickens and eggs. Proteins are needed to catalyse the chemical reactions which make proteins, RNA, and which replicate DNA. DNA genes and RNA are needed to make proteins. How this independent system first arose, is the major mystery facing scientists trying to explain the origin of life on earth.

If you have found the last few pages confusing, don’t despair! They contain all the essential secrets of the chemistry of life. If you have understood them, then you know how life on earth lives and multiplies – simply by containing genes which replicate (to pass their ‘information’ on to succeeding generations), and which direct the manufacture of proteins (which make all living things by catalyzing the necessary chemical reactions). Humans have been searching for these secrets for millennia, so if you found them hard to understand on the first reading of only a few pages, you should at least be prepared to have another go! But first of all, run through the quick summary given in the paragraph below.

Genes are long sections of DNA in which four different bases are arranged in different sequences. This sequence of bases is converted into a sequence of amino acids in a protein molecule according to a code; and in this ‘genetic code’ each group of three bases directs the incorporation of one particular amino acid into a protein molecule. Once formed, the protein automatically folds up and, if it is an enzyme, begins to catalyse a specific type of chemical reaction involved in the construction or maintenance of cells (and therefore of organisms). The differences between all organisms result from the different chemical reactions taking place within them. Which reactions occur is dependent on which proteins are encoded in an organism’s DNA. The ‘secret of life’ is that replicating genes direct the manufacture of the proteins (and some RNAs) that make life live. All life on earth is essentially based on genes that encode proteins.

The powers of proteins

Since proteins are the molecular ‘workers’ that build cell and organisms, we should at least briefly consider how they manage to do that work; and we should also look at some of the other things they do in addition to acting as enzymes.

The first problem is how to do enzymes to catalyse so many different chemical reactions with such speed and efficiency, often increasing their rate many thousand fold, while giving no help at all to unwanted reactions? In essence, the answer is very simple. In its final folded form an enzyme has grooves and clefts on its surface into which only the chemicals involved in the reaction it catalyses can fit (see figure 3.9 and plate 3). When bound to the enzyme surface, the reacting chemicals are held in an orientation that makes the desired reaction much more likely. Chemical groups on the enzyme itself, belonging to the various amino acids, can also participate in the reaction – pushing or pulling the electrons of the reacting chemicals in ways that encourage the reaction to proceed (by lowering the activation energy of the reaction). There are amino acids that can act as acids, and others that can act as alkalis; some that carry a positive charge and some whose charge is negative; some that bind tightly to water molecules and other that can hold on to ‘oily’ hydrocarbons. Together, they comprise a formidable chemical repertoire which makes the chemistry of life work.

Some enzymes are assisted in their catalytic wizardry by ‘co-enzymes’ – simple small molecules, or even single metal ions, that can bind to such enzymes to provide chemical assistance. The ability to grab hold of and utilize such co-enzymes is of course a result of the amino acid sequence of the enzymes involved.

So enzymic catalysis depends on the folding of protein chains to produce a final folded structure in which appropriate amino acids come together in appropriate conformation to allow the reacting chemicals (and perhaps some co-enzyme) to bind to the surface of the enzyme and react. The chances that any protein with randomly chosen amino acid sequence will act as an efficient catalyst for any particular reaction, are very small. But throughout the course of evolution countless numbers of amino acid sequence must have been ‘tried out’, with only the most useful being ‘selected’ and modified further. The superbly efficient enzymes found in modern organisms are presumed to be the end products of millennia of gradual improvement – a process which no doubt often began with very coarse ‘enzymes’ whose catalytic effects were only very slight.

Figure 3.9 Enzymes have grooves and clefts on their surface into which only the chemicals involved in the reaction they catalyse can fit. This allows the chemicals to bind to the appropriate enzymes in orientations and environments that greatly encourage the reactions to proceed.

The way in which different protein sequences could have been ‘tried out’ and ‘selected’ is a major part of our story.

Enzymes are so spectacular and so efficient that it is easy to forget that proteins do lots of other very important things in addition to acting as enzymes. You will find some of them listed in figure 3.10. Apart from enzymes, the next most fundamental type of proteins are probably the ‘structural proteins’ which, as their name suggest, form much of the structural framework or ‘scaffolding’ holding cells and organisms together. The exterior surfaces of many organisms, including us, are made largely of protein. Other structural proteins form much of the ‘connective tissue’ found in bones, ligaments, tendons and which generally holds the cell of our bodies together. And many cells contain a complex intracellular ‘scaffolding’ of tube-like proteins known as the ‘cytoskeleton’. This is not a static skeleton, but one that can contract and disassemble and then assemble again where and when required. It pulls the components of the cell around and gives cells the ability to move.

This brings us on to the ‘contractile proteins’ – a class of structural proteins that clump together into bundles or fibers which can contract and relax in response to various stimuli. The contractile protein of your muscles are straining and stretching as you sit reading this book. Complex cycles of their contraction and relaxation are needed to prevent you from slumping to the desk or to the floor, and to perform the intricate process of flipping from one page to the next.

Figure 3.10 Some of the things that proteins do.

Some proteins are able to bind to specific regions of cell’s DNA and switch various genes ‘on’ or ‘off’ as appropriate. These ‘gene-regulating’ properties must play a vital role in allowing cells to grow and develop properly, and to respond in appropriate ways to changes in their environment. Thanks to them, various cells of an organism can become specialized to perform many different tasks (as muscle cells, liver, brain cells etc.) despite containing identical genomes.

Many other small proteins act as signaling or ‘messenger’ molecules within our bodies. These are made and released by one type of cell to influence the activity of other types of cells in appropriate ways. The most famous protein messenger are some ‘hormones’, such as ‘insulin’ or ‘growth hormone’, which circulate through the bloodstream until they meet up with the cells whose activities they control. Other less well known messengers called ‘tissue factors’ are constantly being passed around the various tissues of your body. These act a bit like ‘local hormones’ restricted to just one type of tissue rather than being circulated widely throughout the bloodstream. Some protein messenger can be passed from one organism to another, such as some of the sex pheromones used by various creatures to attract and prepare a suitable mate.

The celebrated ‘defensive proteins’, such as ‘antibodies’ which allow us to fight off infection, form another vital and almost infinitely variable class of protein molecules. Other defensive proteins such as ‘inteferon’, ‘interleukin’, and many others with complex unfamiliar names, all cooperate to save us many times from the threat of viruses, bacteria, fungi and so on.

The final class of proteins I want to mention are the ‘transport proteins’, responsible for transporting various essential substance around large creatures such as ourselves. The best-known example is probably haemoglobin, which sits in our red blood cells and carries oxygen molecules from our lungs to all parts of the body that need them.

So the proteins are a remarkable class of chemicals which make cells and organisms into the marvelously intricate chemical machines they are. Genes are important simply because they contain the information needed to make the life-creating proteins. The major task of the protein is to catalyse all of the chemical reaction of life, although they perform other important functions as well. Before winding up this chapter, I should introduce you to two very important types of chemical which are assembled or manipulated by enzymes, and which have vital roles to play in the structure of life – carbohydrates and lipids.

The most familiar carbohydrates are various simple ‘sugars’ (sucrose, glucose, fructose etc.), and the more complex types known as starch and cellulose. They play a major role in many cell and organisms, as well as acting as versatile sources of chemical energy (see chapter 8). All carbohydrates are essentially composed of carbon, hydrogen and oxygen atoms. The simple carbohydrates known as sugars can form more complex ones by becoming linked up into chains, as you can see in figure 3.11. The ‘amylopectin’ shown in the figure, and formed by the linkage of many glucose molecules, is a major constituent of starch. Carbohydrates are often found attached to the surface of various protein molecules, forming protein-carbohydrate complexes called ‘glycoproteins’.

The substance that biochemists call lipids are diverse group of chemicals with one major property in common – they are not soluble in water. The term includes everything that a layman would call ‘fat’. Lipids act as energy storage molecules, and also play many other important roles within cells; but their major function is to form the thin membranes found at the boundary of all cells and also the various intracellular bodies such as the nucleus (see chapter 5).

In addition to carbohydrates and lipids, living things contain a wide variety of different types of chemicals all reacting and interacting in ways that give rise to life. But all of the many thousands of chemical reactions are catalysed by the protein we call enzymes, which are encoded by genes (see plate 4). So the mystery of how living things ever got going in the first place really boils down to a question of how genes first arose and began to direct the manufacture of proteins. And the mystery of how first living things evolved into such complex organisms as ourselves, boils down to the question of how the first protein-making gene systems were amplified and diversified to make the thousands of different proteins that makes us work.

Figure 3.11 The structure of some carbohydrates. Simple sugars such as glucose and fructose can become linked into slightly larger sugar molecules such as sucrose (common ‘sugar’). They can also become linked into complex carbohydrates such as amylopectin (a major constituent of starch, made of linked glucose molecules).

The common foundation of genes that encode proteins has given rise to a myriad of very different living things, all based on different types of cells. Genes and proteins manage to make bacterial cells, plant cells and animal cells; cells specialized to form leaf and stem, muscle, bone, liver and kidney; cells adapted to transport oxygen around our bodies and to fight off disease; and many more, including the cells of the brain which presumable contain the still secret chemistry of consciousness which lets us think and study and worry about it all.

Many such different cells are different because they contain different genes, but many others are different because a common set of genes is used in different ways. Consider all of the different cells within your own body. They all came from just a single fertilized egg cell which began to divide until it had given rise to you. Each of your cell is believed to contain the same complement of genes as was present in the original cell, and yet the cell of your skin are obviously very different from those of your bones, or muscles or brains and so on. These differences must be due to differences in the activity of the genes, with the genes that make blood cells being active only in blood cells, the genes that make nerve cells being active only in nerve cells, and so on. Each human cell probably contains the genetic information needed to make any type of human cell, and therefore an entire human, but in each cell only a selected portion of that information is used.

How this control (or ‘regulation’) of gene activity within different cells is achieved is one of the major mysteries facing biology today. Scientists do know of lots of ways in which genes can be switched on or off when appropriate (largely by starting or stopping their copying into messenger RNA), but they still have little idea about how all the various gene control systems mesh together to let a single fertilized egg cell create a superbly efficient thinking being rather than a chaotic proteinaceous mess.

So the diversity of living cell is literally bewildering. Nobody should let statements that attributes it all to different genes and different proteins seduce them into thinking that we know precisely how the different genes and proteins actually integrate to create such complex variety – we don’t. And the baffling diversity of cells is at least equaled by more familiar diversity of creatures all around us. Think of tigers and spiders, elephants and flies, geese and gooseberries, mushrooms and whales, yeasts, lobster, bacteria, geraniums, mussels mice and men. All work because they contain genes that encode proteins. All are powered by the fundamental forces pushing and pulling atoms and sub-atomic particles into place, as energy gradually disperses; and all are supposedly derived from a common origin that dates from a time when mere clusters of atoms spontaneously took on characteristics that changed them from what we call ‘dead’ to what we call ‘alive’. It is at last time to look more closely at modern ideas about how atoms might have ‘come alive’.