Tuesday, November 21, 2006

What is DNA Fingerprinting

What is DNA Fingerprinting?

The chemical structure of everyone's DNA is the same. The only difference between people (or any animal) is the order of the base pairs. There are so many millions of base pairs in each person's DNA that every person has a different sequence.

Using these sequences, every person could be identified solely by the sequence of their base pairs. However, because there are so many millions of base pairs, the task would be very time-consuming. Instead, scientists are able to use a shorter method, because of repeating patterns in DNA.

These patterns do not, however, give an individual "fingerprint," but they are able to determine whether two DNA samples are from the same person, related people, or non-related people. Scientists use a small number of sequences of DNA that are known to vary among individuals a great deal, and analyze those to get a certain probability of a match.

How is DNA Fingerprinting Done?

1. Performing Sourthen Blot

The Southern Blot is one way to analyze the genetic patterns which appear in a person's DNA. Performing a Southern Blot involves:

1. Isolating the DNA in question from the rest of the cellular material in the nucleus. This can be done either chemically, by using a detergent to wash the extra material from the DNA,or mechanically, by applying a large amount of pressure in order to "squeeze out" the DNA.

2. Cutting the DNA into several pieces of different sizes. This is done using one or more restriction enzymes.

3. Sorting the DNA pieces by size. The process by which the size separation, "size fractionation," is done is called gel electrophoresis. The DNA is poured into a gel, such as agarose, and an electrical charge is applied to the gel, with the positive charge at the bottom and the negative charge at the top. Because DNA has a slightly negative charge, the pieces of DNA will be attracted towards the bottom of the gel; the smaller pieces, however, will be able to move more quickly and thus further towards the bottom than the larger pieces. The different-sized pieces of DNA will therefore be separated by size, with the smaller pieces towards the bottom and the larger pieces towards the top.

4. Denaturing the DNA, so that all of the DNA is rendered single-stranded. This can be done either by heating or chemically treating the DNA in the gel.

5. Blotting the DNA. The gel with the size-fractionated DNA is applied to a sheet of nitrocellulose paper, and then baked to permanently attach the DNA to the sheet. The Southern Blot is now ready to be analyzed.

In order to analyze a Southern Blot, a radioactive genetic probe is used in a hybridization reaction with the DNA in question (see next topics for more information). If an X-ray is taken of the Southern Blot after a radioactive probe has been allowed to bond with the denatured DNA on the paper, only the areas where the radioactive probe binds [red] will show up on the film. This allows researchers to identify, in a particular person's DNA, the occurrence and frequency of the particular genetic pattern contained in the probe.



2. Making a radioactive probe
1. Obtain some DNA polymerase [pink]. Put the DNA to be made radioactive (radiolabeled) into a tube.

2. Introduce nicks, or horizontal breaks along a strand, into the DNA you want to radiolabel. At the same time, add individual nucleotides to the nicked DNA, one of which, *C [light blue], is radioactive.

3. Add the DNA polymerase [pink] to the tube with the nicked DNA and the individual nucleotides. The DNA polymerase will become immediately attracted to the nicks in the DNA and attempt to repair the DNA, starting from the 5' end and moving toward the 3' end.

4. The DNA polymerase [pink] begins repairing the nicked DNA. It destroys all the existing bonds in front of it and places the new nucleotides, gathered from the individual nucleotides mixed in the tube, behind it. Whenever a G base is read in the lower strand, a radioactive *C [light blue] base is placed in the new strand. In this fashion, the nicked strand, as it is repaired by the DNA polymerase, is made radioactive by the inclusion of radioactive *C bases.

5. The nicked DNA is then heated, splitting the two strands of DNA apart. This creates single-stranded radioactive and non-radioactive pieces. The radioactive DNA, now called a probe [light blue], is ready for use.


3. Creating a Hybridization Reaction
1. Hybridization is the coming together, or binding, of two genetic sequences. The binding occurs because of the hydrogen bonds [pink] between base pairs. Between a A base and a T base, there are two hydrogen bonds; between a C base and a G base, there are three hydrogen bonds.

2. When making use of hybridization in the laboratory, DNA must first be denatured, usually by using heat or chemicals. Denaturing is a process by which the hydrogen bonds of the original double-stranded DNA are broken, leaving a single strand of DNA whose bases are available for hydrogen bonding.

3. Once the DNA has been denatured, a single-stranded radioactive probe [light blue] can be used to see if the denatured DNA contains a sequence similar to that on the probe. The denatured DNA is put into a plastic bag along with the probe and some saline liquid; the bag is then shaken to allow sloshing. If the probe finds a fit, it will bind to the DNA.

4. The fit of the probe to the DNA does not have to be exact. Sequences of varying homology can stick to the DNA even if the fit is poor; the poorer the fit, the fewer the hydrogen bonds between the probe [light blue] and the denatured DNA. The ability of low-homology probes to still bind to DNA can be manipulated through varying the temperature of the hybridization reaction environment, or by varying the amount of salt in the sloshing mixture.


4. VNTRs
Every strand of DNA has pieces that contain genetic information which informs an organism's development (exons) and pieces that, apparently, supply no relevant genetic information at all (introns). Although the introns may seem useless, it has been found that they contain repeated sequences of base pairs. These sequences, called Variable Number Tandem Repeats (VNTRs), can contain anywhere from twenty to one hundred base pairs.

Every human being has some VNTRs. To determine if a person has a particular VNTR, a Southern Blot is performed, and then the Southern Blot is probed, through a hybridization reaction, with a radioactive version of the VNTR in question. The pattern which results from this process is what is often referred to as a DNA fingerprint.

A given person's VNTRs come from the genetic information donated by his or her parents; he or she could have VNTRs inherited from his or her mother or father, or a combination, but never a VNTR either of his or her parents do not have. Shown below are the VNTR patterns for Mrs. Nguyen [blue], Mr. Nguyen [yellow], and their four children: D1 (the Nguyens' biological daughter), D2 (Mr. Nguyen's step-daughter, child of Mrs. Nguyen and her former husband [red]), S1 (the Nguyens' biological son), and S2 (the Nguyens' adopted son, not biologically related [his parents are light and dark green]).

Because VNTR patterns are inherited genetically, a given person's VNTR pattern is more or less unique. The more VNTR probes used to analyze a person's VNTR pattern, the more distinctive and individualized that pattern, or DNA fingerprint, will be.

Pratical Applications of DNA Fingerprinting

1. Paternity and Maternity
Because a person inherits his or her VNTRs from his or her parents, VNTR patterns can be used to establish paternity and maternity. The patterns are so specific that a parental VNTR pattern can be reconstructed even if only the children's VNTR patterns are known (the more children produced, the more reliable the reconstruction). Parent-child VNTR pattern analysis has been used to solve standard father-identification cases as well as more complicated cases of confirming legal nationality and, in instances of adoption, biological parenthood.

2. Criminal Identification and Forensics
DNA isolated from blood, hair, skin cells, or other genetic evidence left at the scene of a crime can be compared, through VNTR patterns, with the DNA of a criminal suspect to determine guilt or innocence. VNTR patterns are also useful in establishing the identity of a homicide victim, either from DNA found as evidence or from the body itself.

3. Personal Identification
The notion of using DNA fingerprints as a sort of genetic bar code to identify individuals has been discussed, but this is not likely to happen anytime in the foreseeable future. The technology required to isolate, keep on file, and then analyze millions of very specified VNTR patterns is both expensive and impractical. Social security numbers, picture ID, and other more mundane methods are much more likely to remain the prevalent ways to establish personal identification.

Problems with DNA Fingerprinting

Like nearly everything else in the scientific world, nothing about DNA fingerprinting is 100% assured. The term DNA fingerprint is, in one sense, a misnomer: it implies that, like a fingerprint, the VNTR pattern for a given person is utterly and completely unique to that person. Actually, all that a VNTR pattern can do is present a probability that the person in question is indeed the person to whom the VNTR pattern (of the child, the criminal evidence, or whatever else) belongs. Given, that probability might be 1 in 20 billion, which would indicate that the person can be reasonably matched with the DNA fingerprint; then again, that probability might only be 1 in 20, leaving a large amount of doubt regarding the specific identity of the VNTR pattern's owner.

1. Generating a High Probability
The probability of a DNA fingerprint belonging to a specific person needs to be reasonably high--especially in criminal cases, where the association helps establish a suspect's guilt or innocence. Using certain rare VNTRs or combinations of VNTRs to create the VNTR pattern increases the probability that the two DNA samples do indeed match (as opposed to look alike, but not actually come from the same person) or correlate (in the case of parents and children).

2. Problems with Determining Probability

A. Population Genetics
VNTRs, because they are results of genetic inheritance, are not distributed evenly across all of human population. A given VNTR cannot, therefore, have a stable probability of occurrence; it will vary depending on an individual's genetic background. The difference in probabilities is particularly visible across racial lines. Some VNTRs that occur very frequently among Hispanics will occur very rarely among Caucasians or African-Americans. Currently, not enough is known about the VNTR frequency distributions among ethnic groups to determine accurate probabilities for individuals within those groups; the heterogeneous genetic composition of interracial individuals, who are growing in number, presents an entirely new set of questions. Further experimentation in this area, known as population genetics, has been surrounded with and hindered by controversy, because the idea of identifying people through genetic anomalies along racial lines comes alarmingly close to the eugenics and ethnic purification movements of the recent past, and, some argue, could provide a scientific basis for racial discrimination.

B. Technical Difficulties
Errors in the hybridization and probing process must also be figured into the probability, and often the idea of error is simply not acceptable. Most people will agree that an innocent person should not be sent to jail, a guilty person allowed to walk free, or a biological mother denied her legal right to custody of her children, simply because a lab technician did not conduct an experiment accurately. When the DNA sample available is minuscule, this is an important consideration, because there is not much room for error, especially if the analysis of the DNA sample involves amplification of the sample (creating a much larger sample of genetically identical DNA from what little material is available), because if the wrong DNA is amplified (i.e. a skin cell from the lab technician) the consequences can be profoundly detrimental. Until recently, the standards for determining DNA fingerprinting matches, and for laboratory security and accuracy which would minimize error, were neither stringent nor universally codified, causing a great deal of public outcry.

What is DNA ?

1. Nucleotides are the building stones of DNA.

    There are 4 different nucleotides :
    • dATP : deoxyadenosine triphosphate
    • dGTP : deoxyguanosine triphosphate
    • dTTP : deoxythymidine triphosphate
    • dCTP : deoxycytidine triphosphate
    For convenience, these 4 nucleotides are called dNTP's (deoxynucleoside triphosphates). A nucleotide is made of three major parts : a nitrogen base, a sugar molecule and a triphosphate. Only the nitrogen base is different in the 4 nucleotides.


    Figure: The components of nucleotides. (pdf file of this picture)

2. How do the nucleotides form a DNA chain ?


    Figure: From nucleotide to DNA. (pdf file of this picture) DNA is formed by coupling the nucleotides between the phosphate group from a nucleotide (which is positioned on the 5th C-atom of the sugar molecule) with the hydroxyl on the 3rd C-atom on the sugar molecule of the previous nucleotide. To accomplish this, a diphosphate molecule is split off (and releases energy). This means that new nucleotides are always added on the 3' side of the chain.

What is DNA Sequence Alignment?

To compare two or more sequences, it is necessary to align the conserved and unconserved residues across all the sequences (identification of locations of insertions and deletions that have occurred since the divergence of a common ancestor). These residues form a pattern from which the relationship between sequences can be determined with phylogenetic programs. When the sequences are aligned, it is possible to identify locations of insertions or deletions since their divergence from their common ancestor. There are three possibilities :

  • The bases match : this means that there is no change since their divergence.
  • The bases mismatch : this means that there is a substitution since their divergence.
  • There is a base in one sequence, no base in the other : there is an insertion or a deletion since their divergence.
Figure: The comparison of sequences. A good alignment is important for the next step : the construction of phylogenetic trees. The alignment will affect the distances between 2 different species and this will influence the inferred phylogeny. There are several programs available on the net for aligning sequences. These are all based on different mathematical models to compare two or more sequences with the most optimal score for matching bases with a minimum number of gaps inserted (because you can insert a huge amount of gaps, so every base will match an other).
Example : two sequences :
TCAGACGATTG
TCGGAGCTG

How can we get the best alignment ? There are several possibilities : 1. Reduce the number of mismatches :
TCAG-ACG-ATTG
|| | | | | | 0 mismatches 7 matches 6 gaps
TC-GGA-GC-T-G
2. Reduce the number of gaps :
TCAGACGATTG
|| || 5 mismatches 4 matches 2 gaps
TCGGAGCTG--
3. Reduce neither the number of gaps nor the number of mismatches :
TCAG-ACGATTG
|| | | | | 2 mismatches 6 matches 4 gaps
TC-GGA-GCTG-
4. Same as 3. but one base (or gap) moved :
TCAG-ACGATTG
|| | | | | | 1 mismatch 7 matches 4 gaps
TC-GGA-GCT-G
Which of these is now the best alignment ?? There are several alignment algorithms to choose the best alignment. Let's use a simple one in this example :

D = y + sum(wkzk)

with :

D = distance
y : number of mismatches
w : penalty for gaps of length k
z : number of gaps of length k

Take gap penalty for gap length 1 = 2
Take gap penalty for gap length 2 = 6 (short gaps occur more frequent than long gaps)

in 1. : 0 + {(2 x 6) + (6 x 0)} = 12
in 2. : 5 + {(2 x 0) + (6 x 1)} = 11
in 3. : 2 + {(2 x 4) + (6 x 0)} = 10
in 4. : 1 + {(2 x 4) + (6 x 0)} = 9

We choose alignment 4 because it has the minimum distance.
Figure: The alignment of sequences. This is done with Clustalw 1.74, and as you can see, the more variable areas are not optimally aligned (indicated with red boxes). Therefore it is mostly necessary to improve the alignment by hand. In this case, it is obvious to improve the alignment, but in other cases it could be more difficult to make improvements.

The Principle of DNA Sequencing

The purpose of sequencing is to determine the order of the nucleotides of a gene. For sequencing, we don't start from gDNA (like in PCR) but mostly from PCR fragments or cloned genes.

  1. The sequencing reaction :
  2. There are three major steps in a sequencing reaction (like in PCR), which are repeated for 30 or 40 cycles.

    1. Denaturation at 94°C :

    2. During the denaturation, the double strand melts open to single stranded DNA, all enzymatic reactions stop (for example : the extension from a previous cycle).

    3. Annealing at 50°C :

    4. In sequencing reactions, only one primer is used, so there is only one strand copied (in PCR : two primers are used, so two strands are copied). The primer is jiggling around, caused by the Brownian motion. Ionic bonds are constantly formed and broken between the single stranded primer and the single stranded template. The more stable bonds last a little bit longer (primers that fit exactly) and on that little piece of double stranded DNA (template and primer), the polymerase can attach and starts copying the template. Once there are a few bases built in, the ionic bond is so strong between the template and the primer, that it does not break anymore.

    5. extension at 60°C :

    6. This is the ideal working temperature for the polymerase (normally it is 72 °C, but because it has to incorporate ddNTP's which are chemically modified with a fluorescent label, the temperature is lowered so it has time to incorporate the 'strange' molecules. The primers, where there are a few bases built in, already have a stronger ionic attraction to the template than the forces breaking these attractions. Primers that are on positions with no exact match, come loose again and don't give an extension of the fragment.

      The bases (complementary to the template) are coupled to the primer on the 3'side (adding dNTP's or ddNTP's from 5' to 3', reading from the template from 3' to 5' side, bases are added complementary to the template).

      When a ddNTP is incorporated, the extension reaction stops because a ddNTP contains a H-atom on the 3rd carbon atom (dNTP's contain a OH-atom on that position). Since the ddNTP's are fluorescently labeled, it is possible to detect the color of the last base of this fragment on an automated sequencer.

    Figure 7 : The different steps in sequencing. (pdf file of this picture)Animated picture of sequencing (344 kB) Because only one primer is used, only one strand is copied during sequencing, there is a linear increase of the number of copies of one strand of the gene. Therefore, there has to be a large amount of copies of the gene in the starting mixture for sequencing. Suppose there are 1000 copies of the wanted gene before the cycling starts, after one cycle, there will be 2000 copies : the 1000 original templates and 1000 complementary strands with each one fluorescent label on the last base, after two cycles, there will be 2000 complementary strands, three cycles will result in 3000 complementary strands and so on.


    Figure 8 : The linear amplification of the gene in sequencing.

  1. Separation of the molecules :
  2. After the sequencing reactions, the mixture of strands, all of different length and all ending on a fluorescently labelled ddNTP have to be separated; This is done on an acrylamide gel, which is capable of separating a molecule of 30 bases from one of 31 bases, but also a molecule of 750 bases from one of 751 bases. All this is done with gel electrophoresis. DNA has a negative charge and migrates to the positive side. Smaller fragments migrate faster, so the DNA molecules are separated on their size.

    Figure 9 : The separation of the molecules with electrophoresis.(pdf file of this picture)Animated picture of gel electrophoresis (159 kB)

  1. Detection on an automated sequencer :
  2. The fluorescently labelled fragments that migrate trough the gel, are passing a laser beam at the bottom of the gel. The laser exites the fluorescent molecule, which sends out light of a distinct color. That light is collected and focused by lenses into a spectrograph. Based on the wavelength, the spectrograph separates the light across a CCD camera (charge coupled device). Each base has its own color, so the sequencer can detect the order of the bases in the sequenced gene.
    Figure 10 : The scanning and detection system on the ABI Prism 377 sequencer. (pdf file of this picture)Animated picture of scanning and detection system (182 kB)

Figure 11 : A snapshot of the detection of the molecules on the sequencer.


  1. Assembling of the sequenced parts of a gene :
  2. For publication purposes, each sequence of a gene has to be confirmed in both directions. To accomplish this, the gene has to be sequenced with forward and reverse primers. Since it is only possible to sequence a part of 750 till 800 bases in one run, a gene of, for example 1800 bases, has to be sequenced with internal primers. When all these fragments are sequenced, a computer program tries to fit the different parts together and assembles the total gene sequence.

    Figure 12 : The assemblage of the gene.
More information in power point and pdf format.