Friday, July 27, 2007

Transposon

transposons are sequences of DNA that can move around to different positions within the genome of a single cell, a process called transposition. In the process, they can cause mutations and change the amount of DNA in the genome. Transposons are also called "jumping genes", and are examples of mobile genetic elements. Discovered by Barbara McClintock early in her career[1], the topic went on to be a Nobel winning work in 1983. There are a variety of mobile genetic elements, and they can be grouped based on their mechanism of transposition. Class I mobile genetic elements, or retrotransposons, move in the genome by being transcribed to RNA and then back to DNA by reverse transcriptase, while class II mobile genetic elements move directly from one position to another within the genome using a transposase to "cut and paste" them within the genome. Transposons are very useful to researchers as a means to alter DNA inside of a living organism. Transposons make up a large fraction of genome sizes which is evident through the C-values of eukaryotic species. As an example about 48% of the human genome is composed of transposons and their defunct remnants.

Types of transposons

Transposons are classified into two classes based on their mechanism of transposition.

Class I: Retrotransposons


Retrotransposons work by copying themselves and pasting copies back into the genome in multiple places. Initially retrotransposons copy themselves to RNA (transcription) but, in addition to being transcribed, the RNA is copied into DNA by a reverse transcriptase (often coded by the transposon itself) and inserted back into the genome.

Retrotransposons behave very similarly to retroviruses, such as HIV, giving a clue to the evolutionary origins of such viruses.

There are three main classes of Retrotransposons:

  • Viral: encode reverse transcriptase (to reverse transcribe RNA into DNA), have long terminal repeats (LTRs), similar to retroviruses
  • LINEs: encode reverse transcriptase, lack LTRs, transcribed by RNA polymerase II
  • Nonviral superfamily: do not code for reverse transcriptase, transcribed by RNA polymerase III

Class II: DNA transposons

The major difference of Class II transposons from retrotransposons is that their transposition mechanism does not involve an RNA intermediate. Class II transposons usually move by cut and paste, rather than copy and paste, using the transposase enzyme. Different types of transposase work in different ways. Some can bind to any part of the DNA molecule, and the target site can therefore be anywhere, while others bind to specific sequences. Transposase makes a staggered cut at the target site producing sticky ends, cuts out the transposon and ligates it into the target site. A DNA polymerase fills in the resulting gaps from the sticky ends and DNA ligase closes the sugar-phosphate backbone. This results in target site duplication and the insertion sites of DNA transposons may be identified by short direct repeats (a staggered cut in the target DNA filled by DNA polymerase) followed by inverted repeats (which are important for the transposon excision by transposase).

Not all DNA transposons transpose through cut and paste mechanism. In some cases a replicative transposition is observed in which transposon replicates itself to a new target site.

Both classes of transposon may lose their ability to synthesise reverse transcriptase or transposase through mutation, yet continue to jump through the genome because other transposons are still producing the necessary enzyme.

Examples

  • The first transposons were discovered in maize (Zea mays), (corn species) by Barbara McClintock in 1948, for which she was awarded a Nobel Prize in 1983. She noticed insertions, deletions, and translocations, caused by these transposons. These changes in the genome could, for example, lead to a change in the color of corn kernels. About 50% of the total genome of maize consists of transposons. The Ac/Ds system McClintock described are class II transposons.
  • One family of transposons in the fruit fly Drosophila melanogaster are called P elements. They seem to have first appeared in the species only in the middle of the twentieth century. Within 50 years, they have spread through every population of the species. Artificial P elements can be used to insert genes into Drosophila by injecting the embryo. For the use of P elements as a genetic tool see: "transposons as a genetic tool".
  • Transposons in bacteria usually carry an additional gene for function other than transposition---often for antibiotic resistance. In bacteria, transposons can jump from chromosomal DNA to plasmid DNA and back, allowing for the transfer and permanent addition of genes such as those encoding antibiotic resistance (multi-antibiotic resistant bacterial strains can be generated in this way). Bacterial transposons of this type belong to the Tn family. When the transposable elements lack additional genes, they are known as insertion sequences.
  • The most common form of transposon in humans is the Alu sequence. The Alu sequence is approximately 300 bases long and can be found between 300,000 and a million times in the human genome.
  • Mu phage transposition is the best known example of replicative transposition. Its transposition mechanism is somewhat similar to a homologous recombination.

Transposons causing diseases

Transposons are mutagens. They can damage the genome of their host cell in different ways:

  • A transposon or a retroposon that inserts itself into a functional gene will most likely disable that gene.
  • After a transposon leaves a gene, the resulting gap will probably not be repaired correctly.
  • Multiple copies of the same sequence, such as Alu sequences can hinder precise chromosomal pairing during mitosis, resulting in unequal crossovers, one of the main reasons for chromosome duplication.

Diseases that are often caused by transposons include hemophilia A and B, severe combined immunodeficiency, porphyria, predisposition to cancer, and Duchenne muscular dystrophy.

Additionally, many transposons contain promoters which drive transcription of their own transposase. These promoters can cause aberrant expression of linked genes, causing disease or mutant phenotypes.

Evolution of transposons

The evolution of transposons and their effect on genome evolution is currently a dynamic field of study.

Transposons are found in all major branches of life. They may or may not have originated in the last universal common ancestor, or arisen independently multiple times, or perhaps arisen once and then spread to other kingdoms by horizontal gene transfer. While transposons may confer some benefits on their hosts, they are generally considered to be selfish DNA parasites that live within the genome of cellular organisms. In this way, they are similar to viruses. Viruses and transposons also share features in their genome structure and biochemical abilities, leading to speculation that they share a common ancestor.

Since excessive transposon activity can destroy a genome, many organisms seem to have developed mechanisms to reduce transposition to a manageable level. Bacteria may undergo high rates of gene deletion as part of a mechanism to remove transposons and viruses from their genomes while eukaryotic organisms may have developed the RNA interference (RNAi) mechanism as a way of reducing transposon activity. In the nematode Caenorhabditis elegans, some genes required for RNAi also reduce transposon activity.

Transposons may have been co-opted by the vertebrate immune system as a means of producing antibody diversity. The V(D)J recombination system operates by a mechanism similar to that of transposons.

Evidence exists that transposable elements may act as mutators in bacteria.

Applications

Transposons were first discovered in the plant maize (Zea mays, corn species), which is named dissociator (Ds). Likewise, the first transposon to be molecularly isolated was from a plant (Snapdragon). Appropriately, transposons have been an especially useful tool in plant molecular biology. Researchers use transposons as a means of mutagenesis. In this context, a transposon jumps into a gene and produces a mutation. The presence of the transposon provides a straightforward means of identifying the mutant allele, relative to chemical mutagenesis methods.

Sometimes the insertion of a transposon into a gene can disrupt that gene's function in a reversible manner; transposase mediated excision of the transposon restores gene function. This produces plants in which neighboring cells have different genotypes. This feature allows researchers to distinguish between genes that must be present inside of a cell in order to function (cell-autonomous) and genes that produce observable effects in cells other than those where the gene is expressed.

Transposons are also a widely used tool for mutagenesis in Drosophila melanogaster, and a wide variety of bacteria to study gene function.

Gene silencing

Gene silencing is a general term describing epigenetic processes of gene regulation. The term gene silencing is generally used to describe the "switching off" of a gene by a mechanism other than genetic modification. That is, a gene which would be expressed (turned on) under normal circumstances is switched off by machinery in the cell.

Genes are regulated at either the transcriptional or post-transcriptional level.

Transcriptional gene silencing is the result of histone modifications, creating an environment of heterochromatin around a gene that makes it inaccessible to transcriptional machinery (RNA polymerase, transcription factors, etc.).

Post-transcriptional gene silencing is the result of mRNA of a particular gene being destroyed. The destruction of the mRNA prevents translation to form an active gene product (in most cases, a protein). A common mechanism of post-transcriptional gene silencing is RNAi.

Both transcriptional and post-transcriptional gene silencing are used to regulate endogenous genes. Mechanisms of gene silencing also protect the organism's genome from transposons and viruses. Gene silencing thus may be part of an ancient immune system protecting from such infectious DNA elements.

What is RNAi

RNA interference (RNAi) is a highly evolutionally conserved process of post-transcriptional gene silencing (PTGS) by which double stranded RNA (dsRNA), when introduced into a cell, causes sequence-specific degradation of homogolous mRNA sequences. It was first discovered in 1998 by Andrew Fire and Craig Mello in the nematode worm Caenorhabditis elegans and later found in a wide variety of organisms, including mammals.

Mechanism of RNA interference

A. On entering the cell, long dsRNAs act as a trigger of RNAi process.

B. It is first processed by the RNAse III enzyme Dicer in an ATP-dependent reaction.

C. Dicer processes dsRNAs into 21-23 nt short interfering RNA (siRNA) with 2-nt 3' overhangs. siRNA can also be synthesized outside the cell and then be introduced into a cell.

D. The siRNAs are incorporated into the RNA-inducing silencing complex (RISC) which consists of an Argonaute (Ago) protein as one of its main components. Ago cleaves and discards the passenger (sense) strand of the siRNA duplex leading to activation of the RISC.

E and F. The remaining guide (antisense) strand of the siRNA guides RISC to its homologous mRNA, resulting in the endonucleolytic cleavage of the target mRNA

Thursday, July 26, 2007

cath

CATH is a hierarchical classification of protein domain structures, which clusters proteins at four major levels, Class(C), Architecture(A), Topology(T) and Homologous superfamily (H).

Class, derived from secondary structure content, is assigned for more than 90% of protein structures automatically. Architecture, which describes the gross orientation of secondary structures, independent of connectivities, is currently assigned manually. The topology level clusters structures into fold groups according to their topological connections and numbers of secondary structures. The homologous superfamilies cluster proteins with highly similar structures and functions. The assignments of structures to fold groups and homologous superfamilies are made by sequence and structure comparisons.

The boundaries and assignments for each protein domain are determined using a combination of automated and manual procedures. These include computational techniques, empirical and statistical evidence, literature review and expert analysis.


dna databases

DDBJ (DNA Data Bank of Japan) began DNA data bank activities in earnest in 1986 at the National Institute of Genetics (NIG).
DDBJ has been functioning as the international nucleotide sequence database in collaboration with EBI/EMBL and NCBI/GenBank.
DNA sequence records the organismic evolution more directly than other biological materials and ,thus, is invaluable not only for research in life sciences, but also human welfare in general. The databases are, so to speak, a common treasure of human beings. With this in mind, we make the databases online accessible to anyone in the world. The EMBL Nucleotide Sequence Database (also known as EMBL-Bank) constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications.

The database is produced in an international collaboration with GenBank (USA) and the DNA Database of Japan (DDBJ). Each of the three groups collects a portion of the total sequence data reported worldwide, and all new and updated database entries are exchanged between the groups on a daily basis. The current database release (Release 91, June 2007), with according Release notes and user manual are available from the EBI servers.
GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences . There are approximately 65,369,091,950 bases in 61,132,599 sequence records in the traditional GenBank divisions and 80,369,977,826 bases in 17,960,667 sequence records in the WGS division as of August 2006.

The complete release notes for the current version of GenBank are available on the NCBI ftp site. A new release is made every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis.

Chromosome jumping

Chromosome jumping is a technique of molecular biology that is used as a tool in the physical mapping of genomes. It is related to several other tools used for the same purpose, including chromosome walking.

Chromosome jumping is used to bypass regions difficult to clone, such as those containing repetitive DNA, that cannot be easily mapped by chromosome walking, and is useful in moving along a chromosome rapidly in search of a particular gene.

In chromosome jumping, the DNA of interest is identified, cut into fragments with restriction enzymes, and circularised (the beginning and end of each fragment is joined together to form a circular loop). From a known sequence a primer is designed to sequence across the circularised junction. This primer is used to jump 100 kb-300 kb intervals: a sequence 100 kb away would have come near the known sequence on circularisation. Thus, sequences not reachable by chromosome walking can be sequenced. Chromosome walking can be used from the new jump position (in either direction) to look for gene-like sequences, or additional jumps can be used to progress further along the chromosome.

Gene Chips

A microarray or gene chip has made a big impact on DNA probe technology by helping detect tens of thousands of sequences almost simultaneously. A gene chip is a device in which a large number of different probes are carefully placed at specific locations on a glass slide (known as spotted arrays) or by putting probes to specific positions on some surface.

The use of gene chips involves labeling the sample instead of the probe, propagating thousands of copies of the labeled sample across the chip and then washing away any copies of the sample that do not remain attached to some probe. Because the probes are attached to specified positions on the chip, if a labeled sample is detected at any position on the chip, it can easily be known which probe was able to hybridize its complement.

Gene chips are most commonly used to measure the expression level of various genes in an organism. Each expression level gives a picture of the rate by which a specific protein is being produced in an organism’s cells at any given time. It should also be noted that more novel uses for gene chips are being continually developed and this is what makes this particular field very exciting.

Shotgun Cloning

Shotgun cloning is the practice of clipping at random a large DNA fragment to reduce it into various smaller pieces that can then be cloned.

The method used to cut the DNA into smaller pieces can be done either through using a restriction enzyme or through more physical methods that have the end goal of smashing the DNA into smaller pieces. The resulting fragments are then gathered and then cloned into a vector. The original DNA can either be a genomic DNA (the process is then called genome shotgun cloning) or a clone like a YAC (yeast artificial chromosomes) that has a large piece of genomic DNA that needs to be split into fragments.

If the DNA is required to be a in a certain cloning vector but the vector is only capable of carrying small amounts of DNA then the shotgun method can be employed. The method is usually used to generate small fragments of DNA for sequencing.

For example, if a geneticist is studying a 50 kb gene it could be difficult to figure out the restriction map. Breaking a DNA sequence into smaller fragments and then mapping these a master restriction map can be deduced.

chromosome walking

Chromosome walking is a technique for cloning everything in the genome around a known piece of DNA (the starting probe). You screen a genomic library for all clones hybridizing with the probe, and then figure out which one extends furthest into the surrounding DNA. The most distal piece of this most distal clone is then used as a probe, so that ever more distal regions can be cloned. This has been used to move as much as 200 kb away from a given starting point (an immense undertaking). Typically used to "walk" from a starting point towards some nearby gene in order to clone that gene. Also used to obtain the remainder of a gene when you have isolated a part of it.

Sunday, July 22, 2007

FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as FASTP) by David J. Lipman and William R. Pearson in 1985 in the article Rapid and sensitive protein similarity searches. The original FASTP program was designed for protein sequence similarity searching. FASTA, described in 1988 (Improved Programs for Biological Sequence Comparison) added the ability to do DNA:DNA searches, translated protein:DNA searches, and also provided a more sophisticated shuffling program for evaluating statistical significance. There are several programs in this package that allow the alignment of protein sequences and DNA sequences. FASTA is pronounced "FAST-Aye", and stands for "FAST-All", because it works with any alphabet, an extension of "FAST-P" (protein) and "FAST-N" (nucleotide) alignment.

The current FASTA package contains programs for protein:protein, DNA:DNA, protein:translated DNA (with frameshifts), and ordered or unordered peptide searches. Recent versions of the FASTA package include special translated search algorithms that correctly handle frameshift errors (which six-frame-translated searches do not handle very well) when comparing nucleotide to protein sequence data.

In addition to rapid heuristic search methods, the FASTA package provides SSEARCH, an implementation of the optimal Smith-Waterman algorithm. A major focus of the package is the calculation of accurate similarity statistics, so that biologists can judge whether an alignment is likely to have occurred by chance, or whether it can be used to infer homology.

The web-interface to submit sequences for running a search of the European Bioinformatics Institute (EBI)'s online databases is also available called fasta.

The FASTA file format used as input for this software is now largely used by other sequence database search tools (such as BLAST) and sequence alignment programs

Search method

Fasta takes a given nucleotide or amino-acid sequence and searches a corresponding sequence database by using local sequence alignment to find matches of similar database sequences.

The FASTA program follows a largely heuristic method which contributes to the high speed of its execution. It initially observes the pattern of word hits, word-to-word matches of a given length, and marks potential matches before performing a more time-consuming optimized search using a Smith-Waterman type of algorithm. The size taken for a word, given by the parameter ktup, controls the sensitivity and speed of the program. Increasing the ktup value decreases number of background hits that are found. From the word hits that are returned the program looks for segments that contain a cluster of nearby hits. It then investigates these segments for a possible match.

There are some differences between fastn and fastp relating to the type of sequences used but both use four steps and calculate three scores to describe and format the sequence similarity results. These are:

  • Identify regions of highest density in each sequence comparison. Taking a ktup to equal 1 or 2.
In this step all or a group of the identities between two sequences are found using a look up table. The ktup value determines how many consecutive identities are required for a match to be declared. Thus the lesser the ktup value: the more sensitive the search. ktup=2 is frequently taken by users for protein sequences and ktup=4 or 6 for nucleotide sequences. Short oligonucleotides are usually run with ktup = 1. The program then finds all similar local regions, represented as diagonals of a certain length in a dot plot, between the two sequences by counting ktup matches and penalizing for intervening mismatches. This way, local regions of highest density matches in a diagonal are isolated from background hits. For protein sequences BLOSUM50 values are used for scoring ktup matches. This ensures that groups of identities with high similarity scores contribute more to the local diagonal score than to identities with low similarity scores. Nucleotide sequences use the identity matrix for the same purpose. The best 10 local regions selected from all the diagonals put together are then saved.
  • Rescan the regions taken using the scoring matrices. trimming the ends of the region to include only those contributing to the highest score.
Rescan the 10 regions taken. This time use the relevant scoring matrix while rescoring to allow runs of identities shorter than the ktup value. Also while rescoring conservative replacements that contribute to the similarity score are taken. Though protein sequences use the BLOSUM50 matrix, scoring matrices based on the minimum number of base changes required for a specific replacement, on identities alone, or on an alternative measure of similarity, can also be used with the program. For each of the diagonal regions rescanned this way, a subregion with the maximum score is identified. The initial scores found in step1 are used to rank the library sequences. The highest score is referred to as init1 score.
  • In an alignment if several initial regions with scores greater than a CUTOFF value are found, check whether the trimmed initial regions can be joined to form an approximate alignment with gaps. Calculate a similarity score that is the sum of the joined regions penalising for each gap 20 points. This initial similarity score (initn) is used to rank the library sequences. The score of the single best initial region found in step 2 is reported (init1).
Here the program calculates an optimal alignment of initial regions as a combination of compatible regions with maximal score. This optimal alignment of initial regions can be rapidily calculated using a dynamic programming algorithm. The resulting score initn is used to rank the library sequences.This joining process increases sensitivity but decreases selectivity. A carefully calculated cut-off value is thus used to control where this step is implemented, a value that is approximately one standard deviation above the average score expected from unrelated sequences in the library. A 200-residue query sequence with ktup2 uses a value 28.
  • Use a banded Smith-Waterman algorithm to calculate an optimal score for alignment.
This step uses a banded Smith-Waterman algorithm to create an optimised score (opt) for each alignment of query sequence to a database(library) sequence. It takes a band of 32 residues centered on the init1 region of step2 for calculating the optimal alignment. After all sequences are searched the program plots the initial scores of each database sequence in a histogram, and calculates the statistical significance of the "opt" score. For protein sequences, the final alignment is produced using a full Smith-Waterman alignment. For DNA sequences, a banded alignment is provided.

Mass spectrometry


Mass spectrometry (also known as mass spectroscopy (deprecated)or informally, "mass-spec" and MS) is an analytical technique used to measure the mass-to-charge ratio of ions. It is most generally used to find the composition of a physical sample by generating a mass spectrum representing the masses of sample components. The mass spectrum is measured by a mass spectrometer.

All mass spectrometers consist of three basic parts: an ion source, a mass analyzer, and a detector system. The stages within the mass spectrometer are:

  1. Producing ions from the sample
  2. Separating ions of differing masses
  3. Detecting the number of ions of each mass produced
  4. Collating the data and generating the mass spectrum

The technique has several applications, including;

  • identifying unknown compounds by the mass of the compound molecules or their fragments
  • determining the isotopic composition of elements in a compound
  • determining the structure of a compound by observing its fragmentation
  • quantifying the amount of a compound in a sample using carefully designed methods (mass spectrometry is not inherently quantitative)
  • studying the fundamentals of gas phase ion chemistry (the chemistry of ions and neutrals in vacuum)
  • determining other physical, chemical, or even biological properties of compounds with a variety of other approaches

Instrumentation

Ion source

The ion source is the part of the mass spectrometer that ionizes the material under analysis (the analyte). The ions are then transported by magnetic or electric fields to the mass analyzer.

Techniques for ionization have been key to determining what types of samples can be analyzed by mass spectrometry. Electron ionization and chemical ionization are used for gases and vapors. In chemical ionization sources, the analyte is ionized by chemical ion-molecule reactions during collisions in the source. Two techniques often used with liquid and solid biological samples include electrospray ionization (due to John Fenn) and matrix-assisted laser desorption/ionization (MALDI, due to K. Tanaka and separately, M. Karas and F. Hillenkamp). Inductively coupled plasma sources are used primarily for metal analysis on a wide array of sample types. Others include glow discharge, fast atom bombardment (FAB), thermospray, desorption/ionization on silicon (DIOS), Direct Analysis in Real Time (DART), atmospheric pressure chemical ionization (APCI), secondary ion mass spectrometry (SIMS), spark ionization and thermal ionisation.

Mass analyzer

Mass analyzers separate the ions according to their mass-to-charge ratio. All mass spectrometers are based on dynamics of charged particles in electric and magnetic fields in vacuum where the following two laws apply:

\mathbf{F} = q (\mathbf{E} + \mathbf{v} \times \mathbf{B}) (Lorentz force law)
\mathbf{F}=m\mathbf{a} (Newton's second law of motion)

where F is the force applied to the ion, m is the mass of the ion, a is the acceleration, q is the ionic charge, E is the electric field, and v x B is the vector cross product of the ion velocity and the magnetic field

Equating the above expressions for the force applied to the ion yields:

(m/q)\mathbf{a} = \mathbf{E}+ \mathbf{v} \times \mathbf{B}

This differential equation is the classic equation of motion of charged particles. Together with the particle's initial conditions it completely determines the particle's motion in space and time and therefore is the basis of every mass spectrometer. It immediately reveals that two particles with the same physical quantity m/q behave exactly the same. Thus all mass spectrometers actually measure m/q and strictly speaking should be called mass-to-charge spectrometers. When presenting data, it is common to use the (officially) dimensionless m/z (called mass-to-charge ratio, although (more accurately) it represents the ratio of the mass number and the charge number), where z is the number of elementary charges (e) on the ion (z=q/e).

There are many types of mass analyzers, using either static or dynamic fields, and magnetic or electric fields, but all operate according to this same law. Each analyzer type has its strengths and weaknesses. Many mass spectrometers use two or more mass analyzers for tandem mass spectrometry (MS/MS). In addition to the more common mass analyzers listed below, there are other less common ones designed for special situations.

Sector

A sector field mass analyzer uses an electric and/or magnetic field to affect the path and/or velocity of the charged particles in some way. As shown above, sector instruments change the direction of ions that are accelerated through the mass analyzer. The ions enter a magnetic or electric field which bends the ion paths depending on their mass-to-charge ratios, deflecting the more charged and faster-moving, lighter ions more. The ions eventually reach the detector and their relative abundances are measured. The analyzer can be used to select a narrow range of m/q or to scan through a range of m/q to catalog the ions present.

Time-of-flight

Perhaps the easiest to understand is the Time-of-flight (TOF) analyzer. It uses an electric field to accelerate the ions through the same potential, and then measures the time they take to reach the detector. If the particles all have the same charge, then their kinetic energies will be identical, and their velocities will depend only on their masses. Lighter ions will reach the detector first.

Quadrupole .

Quadrupole mass analyzers use oscillating electrical fields to selectively stabilize or destabilize ions passing through a radio frequency (RF) quadrupole field. A quadrupole mass analyzer acts as a mass selective filter and is closely related to the Quadrupole ion trap, particularly the linear quadrupole ion trap except that it operates without trapping the ions. A common variation of the quadrupole is the triple quadrupole.

Quadrupole ion trap.

The quadrupole ion trap works on the same physical principles as the QMS, but the ions are trapped and sequentially ejected. Ions are created and trapped in a mainly quadrupole RF potential and separated by m/q, non-destructively or destructively.

There are many mass/charge separation and isolation methods but most commonly used is the mass instability mode in which the RF potential is ramped so that the orbit of ions with a mass a > b are stable while ions with mass b become unstable and are ejected on the z-axis onto a detector.

Ions may also be ejected by the resonance excitation method, whereby a supplemental oscillatory excitation voltage is applied to the endcap electrodes, and the trapping voltage amplitude and/or excitation voltage frequency is varied to bring ions into a resonance condition in order of their mass/charge ratio.

The cylindrical ion trap mass spectrometer is a derivative of the quadrupole ion trap mass spectrometer.

Linear quadrupole ion trap

A linear quadrupole ion trap (LTQ) is similar to a QIT, but traps ions in a 2D quadrupole field, instead of a 3D quadrupole field as in a QIT. Ions can be stored along the entire length of the LTQ which results in a higher ion capacity.

Fourier transform ion cyclotron resonance .

Fourier transform mass spectrometry, or more precisely Fourier transform ion cyclotron resonance MS, measures mass by detecting the image current produced by ions cyclotroning in the presence of a magnetic field. Instead of measuring the deflection of ions with a detector such as an electron multiplier, the ions are injected into a Penning trap (a static electric/magnetic ion trap) where they effectively form part of a circuit. Detectors at fixed positions in space measure the electrical signal of ions which pass near them over time producing cyclical signal. Since the frequency of an ion's cycling is determined by its mass to charge ratio, this can be deconvoluted by performing a Fourier transform on the signal. FTMS has the advantage of high sensitivity (since each ion is 'counted' more than once) and much high resolution and thus precision.[8][9]

Ion cyclotron resonance is an older mass analysis technique similar to FTMS except that ions are detected with a traditional detector. Ions trapped in a Penning trap are excited by an RF electric field until they impact the wall of the trap where the detector is located with ions of different mass being resolved in time.

Orbitrap

The Orbitrap is the most recently introduced mass analyser (commercially available since 2005,ThermoElectron(R)). In the Orbitrap, ions are electrostatically trapped in an orbit around a central, spindle-shaped electrode. The electrode confines the ions so that they both orbit around the central electrode and oscillate back and forth along the central electrode's long axis. This oscillation generates an image current in the detector plates which is recorded by the instrument. The frequencies of these image currents depend on the mass to charge ratios of the ions in the Orbitrap. Mass spectra are obtained by Fourier transformation of the recorded image currents.

Similar to Fourier transform ion cyclotron resonance mass spectrometers, Orbitraps have a high mass accuracy, high sensitivity and a good dynamic range.

Detector

The final element of the mass spectrometer is the detector. The detector records the charge induced or current produced when an ion passes by or hits a surface. In a scanning instrument the signal produced in the detector during the course of the scan versus where the instrument is in the scan (at what m/q) will produce a mass spectrum, a record of ions as a function of m/q.

Typically, some type of electron multiplier is used, though other detectors including Faraday cups and ion-to-photon detectors are also used. Because the number of ions leaving the mass analyzer at a particular instant is typically quite small, significant amplification is often necessary to get a signal. Microchannel Plate Detectors are commonly used in modern commercial instruments.[11] In FTMS and Orbitraps, the detector consists of a pair of metal surfaces within the mass analyzer/ion trap region which the ions only pass near as they oscillate. No DC current is produced, only a weak AC image current is produced in a circuit between the electrodes. Other inductive detectors have also been used.

Tandem MS (MS/MS)

Tandem mass spectrometry involves multiple steps of mass selection or analysis, usually separated by some form of fragmentation. A tandem mass spectrometer is one capable of multiple rounds of mass spectrometry. For example, one mass analyzer can isolate one peptide from many entering a mass spectrometer. A second mass analyzer then stabilizes the peptide ions while they collide with a gas, causing them to fragment by collision-induced dissociation (CID). A third mass analyzer then catalogs the fragments produced from the peptides. Tandem MS can also be done in a single mass analyzer over time as in a quadrupole ion trap. There are various methods for fragmenting molecules for tandem MS, including collision-induced dissociation (CID), electron capture dissociation (ECD), electron transfer dissociation (ETD), infrared multiphoton dissociation (IRMPD) and blackbody infrared radiative dissociation (BIRD). An important application using tandem mass spectrometry is in protein identification.

Tandem mass spectrometry enables a variety of experiments. Although it allows for many uniquely designed experiments some types of experiments are commonly used and built into many commercial mass spectrometers. Examples of these include single reaction monitoring (SRM), multiple reaction monitoring (MRM) and precursor ion scan. In single reaction monitoring the first analyzer allows only a single mass through and the second analyzer monitors for a specifically defined fragment ion. MRM is nearly identical except the second analyzer monitors multiple user defined fragment ions. These monikers are most often used with scanning instruments where the second mass analysis event is duty cycle limited. These experiments are used to increase specificity of detection of known molecules such as in pharmacokinetic studies. Precursor ion scan refers to monitoring for a specific loss from the precursor ion. The first and second mass analyzers scan across the spectrum separated by a user defined m/z value. This experiment is used to detect specific motifs within unknown molecules.

Common Mass Spectrometer Configurations & Techniques

When all of the elements (source, analyzer and detector) of a mass spectrometer are combined to form a complete instrument and the specific configuration becomes common a new name, often an abbreviation of one or more of the internal components, becomes attached to the specific configuration and can become, within certain circles, more well known than the specific internal components. The most ubiquitous example of this is MALDI-TOF, which simply refers to combining a Matrix-assisted laser desorption/ionization source with a Time-of-flight mass analyzer. The MALDI-TOF moniker is, however, often more widely recognized by the non-mass spectrometrist scientist than MALDI or TOF individually as if inseparable. Other examples include inductively coupled plasma-mass spectrometry (ICP-MS), accelerator mass spectrometry (AMS), Thermal ionization-mass spectrometry (TIMS) and spark source mass spectrometry (SSMS). Sometimes the use of the generic "MS" actually implies a very specific mass analyzer and detection system as with AMS, which is always sector based. In other cases there are common configurations that may be implied but not necessarily.

Certain applications of mass spectrometry have developed monikers that although technically referring to a broad application also tend to indicate a specific or a limited number of instrument configurations. An example of this is isotope ratio mass spectrometry (IRMS). Despite only specifically indicating an application, the use of a limited number of sector based mass analyzers is implied and the name is used to refer to both the application and the instrument used for the application.

Other Separation Techniques Combined with Mass spectrometry

An important enhancement to the mass resolving and determining capacity of mass spectrometry is the combination of mass spectrometry with analysis techniques that the resolve mixtures of compounds in a sample based on other characteristics before introduction into the mass spectrometer.

Gas chromatography/MS


A common form of mass spectrometry is gas chromatography-mass spectrometry (GC/MS or GC-MS). In this technique, a gas chromatograph is used to separate different compounds. This stream of separated compounds is fed on-line into the ion source, a metallic filament to which voltage is applied. This filament emits electrons which ionize the compounds. The ions can then further fragment, yielding predictable patterns. Intact ions and fragments pass into the mass spectrometer's analyser and are eventually detected.

Liquid chromatography/MS

Similar to gas chromatography MS (GC/MS), liquid chromatography mass spectrometry (LC/MS or LC-MS) separates compounds chromatographically before they are introduced to the ion source and mass spectrometer. It differs from GC/MS in that the mobile phase is liquid, usually a combination of water and organic solvents, instead of gas. Most commonly, an electrospray ionization source is used in LC/MS.

IMS/MS

Ion mobility spectrometry/mass spectrometry is a technique where ions are first separated by drift time through some pressure of neutral gas given an electrical potential gradient before being introduced into a mass spectrometer.

The drift time is a measure of the radius relative to the charge of the ion. The duty cycle of IMS (time over which the experiment takes place) is longer than most mass spectrometers such that the mass spectrometer can sample along the course of the IMS separation. This produces data about the IMS separation and the mass-to-charge ratio of the ions in a manner similar to LC/MS.

The duty cycle of IMS is short relative to liquid chromatography or gas chromatography separations and can thus be coupled to such techniques producing triply hyphenated techniques such as LC/IMS/MS.

Data and analysis

Data representations

Mass spectrometry produces various types of data. The most ubiquitous data representation is the mass spectrum.

Certain types of mass spectrometry data are best represented as a mass chromatogram. Types of chromatograms include selected ion monitoring (SIM), total ion current (TIC), and selected reaction monitoring chromatogram (SRM), among many others.

Other types of mass spectrometry data are well represented as a contour map of mass-to-charge on one axis, intensity on another and an additional experimental parameter (often time) on the third axis, thus producing a three dimensional surface.

Data analysis

Basics

Mass spectrometry data analysis is a complicated subject matter that is very specific to the type of experiment producing the data. There are several general subdivisions of data that are fundamental to beginning to understand any data.

Many mass spectrometers work in either negative ion mode or positive ion mode. It is very important to know whether the observed ions are negatively or positively charged. This is often important in determining the neutral mass but it also indicates something about the nature of the molecules.

There are many different types of ion sources that behave very differently from each other. A source such as an electron ionization source produces many fragments and mostly odd electron species with one charge, whereas a source such as an electrospray source usually produces quasimolecular even electron species that may be multiply charged.

Tandem mass spectrometry purposely produces fragment ions post-source and can drastically change the sort of data achieved by an experiment.

By understanding the origin of a sample certain expectations can be assumed. For example, if the sample is coming from a synthesis/manufacturing process impurities are likely to be present that are related to the major component. If the sample is a relatively crude preparation of a biological sample, the sample likely contains a certain amount of salt that may form adducts with the analyte molecules in certain analyses.

Results can also depend heavily on how was the sample prepared and how was it run/introduced. An important example is which matrix was used for MALDI spotting, since much of the energetics of the desorption/ionization event is controlled by the matrix rather than the laser power. Sometimes samples are spiked with sodium or another ion-carrying species to produce adducts rather than a protonated species.

The most commonly overlooked basic question by non-mass spectrometrists trying to use mass spectrometry or interact with a mass spectrometrist is what is the over-arching goal of the project. To interpret data one must know the desired outcome (and have collected the right data in the first place). There are many bits of information that can be gleaned from mass spectrometry data, such as the masses of the molecules, the purity of the sample, and the structure of the molecules. Each of these questions requires a different approach. Simply asking for a "mass-spec" will most likely not answer the real question at hand.

Applications

Isotope ratio MS: isotope dating and tracking

Mass spectrometer to determine the 16O/18O and 12C/13C isotope ratio on biogenous carbonate
Mass spectrometer to determine the 16O/18O and 12C/13C isotope ratio on biogenous carbonate

Mass spectrometry is also used to determine the isotopic composition of elements within a sample. Differences in mass among isotopes of an element are very small, and the less abundant isotopes of an element are typically very rare, so a very sensitive instrument is required. These instruments, sometimes referred to as isotope ratio mass spectrometers (IR-MS), usually use a single magnet to bend a beam of ionized particles towards a series of Faraday cups which convert particle impacts to electric current. A fast on-line analysis of deuterium content of water can be done using Flowing afterglow mass spectrometry, FA-MS. Probably the most sensitive and accurate mass spectrometer for this purpose is the accelerator mass spectrometer (AMS). Isotope ratios are important markers of a variety of processes. Some isotope ratios are used to determine the age of materials for example as in carbon dating. Labelling with stable isotopes is also used for protein quantification. (see Protein quantitation below)

Trace gas analysis

Several techniques use ions created in a dedicated ion source injected into a flow tube or a drift tube: selected ion flow tube (SIFT-MS), and proton transfer reaction (PTR-MS), are variants of chemical ionization dedicated for trace gas analysis of air, breath or liquid headspace using well defined reaction time allowing calculations of analyte concentrations from the known reaction kinetics without the need for internal standard or calibration.

Atom Probe


An atom probe is an instrument that combines time-of-flight mass spectrometry and field ion microscopy (FIM) to map the location of individual atoms.

Pharmacokinetics


Pharmacokinetics is often studied using mass spectrometry because of the complex nature of the matrix (often blood or urine) and the need for high sensitivity to observe low dose and long time point data. The most common instrumentation used in this application is LC-MS with a triple quadrupole mass spectrometer. Tandem mass spectrometry is usually employed for added specificity. Standard curves and internal standards are used for quantitation of usually a single pharmaceutical in the samples. The samples represent different time points as a pharmaceutical is administered and then metabolized or cleared from the body. Blank or t=0 samples taken before administration are important in determining background and insuring data integrity with such complex sample matrices. Much attention is paid to the linearity of the standard curve; however it is not uncommon to use curve fitting with more complex functions such as quadratics since the response of most mass spectrometers is less than linear across large concentration ranges.

There is currently considerable interest in the use of very high sensitivity mass spectrometry for microdosing studies, which are seen as a promising alternative to animal experimentation.

Mass spectrometry of proteins

Mass spectrometry is an important emerging method for the characterization of proteins. The two primary methods for ionization of whole proteins are electrospray ionization (ESI) and matrix-assisted laser desorption/ionization (MALDI). In keeping with the performance and mass range of available mass spectrometers, two approaches are used for characterizing proteins. In the first, intact proteins are ionized by either of the two techniques described above, and then introduced to a mass analyser. In the second, proteins are enzymatically digested into smaller peptides using an agent such as trypsin or pepsin. Other proteolytic digest agents are also used. The collection of peptide products are then introduced to the mass analyser. This is often referred to as the "bottom-up" approach of protein analysis.

Whole protein mass analysis is primarily conducted using either time-of-flight (TOF) MS, or Fourier transform ion cyclotron resonance (FT-ICR). These two types of instrument are preferable here because of their wide mass range, and in the case of FT-ICR, its high mass accuracy. Mass analysis of proteolytic peptides is a much more popular method of protein characterization, as cheaper instrument designs can be used for characterization. Additionally, sample preparation is easier once whole proteins have been digested into smaller peptide fragments. The most widely used instrument for peptide mass analysis is the quadrupole ion trap. Multiple stage quadrupole-time-of-flight and MALDI time-of-flight instruments also find use in this application.

Protein and peptide fractionation coupled with mass spectrometry

Proteins of interest to biological researchers are usually part of a very complex mixture of other proteins and molecules that co-exist in the biological medium. This presents two significant problems. First, the two ionization techniques used for large molecules only work well when the mixture contains roughly equal amounts of constituents, while in biological samples, different proteins tend to be present in widely differing amounts. If such a mixture is ionized using electrospray or MALDI, the more abundant species have a tendency to "drown" signals from less abundant ones. The second problem is that the mass spectrum from a complex mixture is very difficult to interpret because of the overwhelming number of mixture components. This is exacerbated by the fact that enzymatic digestion of a protein gives rise to a large number of peptide products.

To contend with this problem, two methods are widely used to fractionate proteins, or their peptide products from an enzymatic digestion. The first method fractionates whole proteins and is called two-dimensional gel electrophoresis. The second method, high performance liquid chromatography is used to fractionate peptides after enzymatic digestion. In some situations, it may be necessary to combine both of these techniques.

Gel spots identified on a 2D Gel are usually attributable to one protein. If the identity of the protein is desired, the gel spot can be excised, and digested proteolytically. The peptide masses resulting from the digestion can be determined by mass spectrometry using peptide mass fingerprinting. If this information does not allow unequivocal identification of the protein, its peptides can be subject to tandem mass spectrometry.

Characterization of protein mixtures using HPLC/MS is also called shotgun proteomics and mudpit. A peptide mixture that results from digestion of a protein mixture is fractionated by one or two steps of liquid chromatography. The eluent from the chromatography stage can be either directly introduced to the mass spectrometer through electrospray ionization, or laid down on a series of small spots for later mass analysis using MALDI.

Protein identification

There are two main ways MS is used to identify proteins. Peptide mass fingerprinting (mentioned in the previous section) uses the masses of proteolytic peptides as input to a search of a database of predicted masses that would arise from digestion of a list of known proteins. If a protein sequence in the reference list gives rise to a significant number of predicted masses that match the experimental values, there is some evidence that this protein was present in the original sample.

Full MS and MS2 spectra of a peptide.
Full MS and MS2 spectra of a peptide.

Tandem MS is becoming a more popular experimental method for identifying proteins. Collision-induced dissociation is used in mainstream applications to generate a set of fragments from a specific peptide ion. The fragmentation process primarily gives rise to cleavage products that break along peptide bonds. Because of this simplicity in fragmentation, it is possible to use the observed fragment masses to match with a database of predicted masses for one of many given peptide sequences. Tandem MS of whole protein ions has been investigated recently using electron capture dissociation and has demonstrated extensive sequence information in principle but is not in common practice. This is sometimes referred to as the "top-down" approach in that it involves starting with the whole mass and then pulling it apart rather than starting with pieces (proteolytic fragments) and piecing the protein back together using De novo repeat detection (bottom-up).

A number of different algorithmic approaches have been described to identify peptides and proteins from tandem mass spectrometry (MS/MS), peptide de novo sequencing and sequence tag based searching.


A popular option that combines a comprehensive range of data analysis features is PEAKS *.

Other existing mass spec analysis software include: Peptide fragment fingerprinting SEQUEST, Mascot, OMSSA and X!Tandem). Peptide de novo sequencing (LuteFisk, PepNovo, and Sherenga). Peptide sequence tag based searching (SPIDER, InsPecT, and GutenTAG).


Protein quantitation

Several recent methods allow for the quantitation of proteins by mass spectrometry. Typically, stable (e.g. non-radioactive) heavier isotopes of carbon (C13) or nitrogen (N15) are incorporated into one sample while the other one is labelled with corresponding light isotopes (e.g. C12 and N14). The two samples are mixed before the analysis. Peptides derived from the different samples can be distinguished due to their mass difference. The ratio of their peak intensities corresponds to the relative abundance ratio of the peptides (and proteins). The most popular methods for isotope labelling are SILAC (stable isotope labelling with amino acids in cell culture), trypsin-catalyzed O18 labeling, ICAT (isotope coded affinity tagging), ITRAQ (isotope tags for relative and absolute quantitation). “Semi-quantitative” mass spectrometry can be performed without labeling of samples. Typically, this is done with MALDI analysis (in linear mode). The peak intensity, or the peak area, from individual molecules (typically proteins) is here correlated to the amount of protein in the sample. However, the individual signal depends on the primary structure of the protein, on the complexity of the sample, and on the settings of the instrument.

Protein structure

Characteristics indicative of the 3 dimensional structure of proteins can be probed with mass spectrometry in various ways. By using chemical crosslinking to couple parts of the protein that are close in space, but far apart in sequence, information about the overall structure can be inferred. By following the exchange of amide protons with deuterium from the solvent, it is possible to probe the solvent accessibility of various parts of the protein.

Protein microarray

A protein microarray is a piece of glass on which different molecules of protein have been affixed at separate locations in an ordered manner thus forming a microscopic array. These are used to identify protein-protein interactions, to identify the substrates of protein kinases, or to identify the targets of biologically active small molecules. The most common protein microarray is the antibody microarray, where antibodies are spotted onto the protein chip and are used as capture molecules to detect proteins from cell lysate solutions.

Applications

Protein microarrays (also biochip, proteinchip) are measurement devices used in biomedical applications to determine the presence and/or amount (referred to as quantitation) of proteins in biological samples, e.g. blood. They have the potential to be an important tool for proteomics research. Usually a multitude of different capture agents, most frequently monoclonal antibodies, are deposited on a chip surface (glass or silicon) in a miniature array. This format is often also referred to as a microarray (a more general term for chip based biological measurement devices).

Types of Protein Chips

There are several types of protein chips, however the most common are glass slide chips and nano-well arrays.

Production of Protein Arrays

The proteins can be externally synthesised, purified and attached to the array. Alternatively they can be in-situ synthesised and directly attached to the array.

The proteins can be synthesised through biosynthesis, cell-free DNA expression or chemical synthesis. In-situ synthesis is possible with the latter two. With cell-free DNA expression proteins are attached to the support right after their production. Peptides chemically procued by solid phase peptide synthesis are already attached to the support. Selective deprotection is carried out through lithographic methods or by the so called SPOT-synthesis.

Types of Capture Molecules

Capture molecules used are most commonly antibodies; however, more recently there has been a push towards other types of capture molecules which are more similar in their nature such as peptides or aptamers. Antibodies have several problems including the fact that there are not antibodies for most proteins and also problems with specificity in some commercial antibody preparations. Nevertheless, antibodies still represent the most well characterized and effective protein capture agent for microarrays. Recently, nucleic acids, receptors, enzymes, and proteins have been spotted onto chips and used as capture molecules. This will allow a vast variety of experiments to be conducted on protein-protein interactions, and all other protein binding substrates.

Detection methods

Although protein microarrays may use similar detection methods as DNA Microarrays, a problem is that protein concentrations in a biological sample may be many orders of magnitute different from that for mRNAs. Therefore, protein chip detection methods must have a much larger range of detection.

The preferred method of detection currently is fluorescence detection. Fluorescent detection is safe, sensitive, and can have a high resolution. The fluorescent detection method is compatible with standard microarray scanners, however some minor alterations to software may need to be done.