The origin of the Soybean

LITERATURE REVIEW

Soybean

Soybean [(Glycine max (L.)Merr] an East Asian crop is an important legume. It is a rich source of protein substitute for meat and diary products, soybean oil accounts for about 56% of the global oilseed production, a source of various vitamins, nutraceutical and antioxidants. United States is the world’s largest producer and exporter of soybean with approximately twenty-six percent of land acre being utilized for soybean cultivation. Other important soybean producing countries are Brazil and Argentina. The annual production of soybean has increased from 70 million metric ton to 200 million metric ton since 1972. Today’s soybean plants stand approximately one meter high and one meter wide and bear sixty to eighty pods bearing three beans each (Saucer 1993).

Origin and Domestication of Soybean

The study of the genetic structure of the ancestor population is important to determine the current pattern of variations in the modern crop species (Harrison 1991).

Soybean (G.max) was domesticated around 5 000 years ago from the wild soybean (Glycine soja Sieb and Zucc.,), found in China. In genus Glycine, subgenus Soja consists of the above mentioned species. G.soja is distributed in the central and northern parts of Eastern Asian countries, which includes China (center of origin and diversification), the Korean peninsular region, Japan, and the Far East of Russia. G.soja is an annual weedy-form climber, with small black seeds which shatters before plant attains maturity. The protein concentration in the seeds varies from 31% to 52% (similar to that of G.max). It has high lipid content and the oil concentration is approximately 9 to 12% (G.max has more than 16% oil concentration). It has less than10% oleic acid concentration (G.max has more than 20% oleic acid concentration) and more than 10% linolenic acid concentration (G.max has lower than 9.0% linolenic concentration) (Hymowitz et al. 1972). Fukuda, 1993 suggested that there is another intermediate form of the subgenus between G.soja and G.max, known as G. gracilis. A numerical taxonomic analysis of the three species (G.soja, G. gracilis and G. max) by Broich and Palmer in 1980, and 1981, based on the phenotypic traits suggested that G gracilis is an intermediate form of the G.max. According to the Studies by (Smart 1994), G.max, G. soja and G. gracilis should be classified as subspecies.

The transition of G. soja to G. max is a result of three genetic bottlenecks, namely, domestication in Asia which lead to production of many Asian landraces, founding effects which lead to selection of few landraces, introduction in northern and southern U.S. and then selective breeding which lead to production of the present cultivars (Hyten et al. 2006). The domesticated soybean can hence be classified into five groups: (i) northern United States ancestral cultivars which includes genotypes that formed the germplasm base of present cultivars grown in the northern United States and Canada, (ii) northern U.S. cultivars formed from the northern U.S. ancestral cultivars through a number of breeding cycles, (iii) southern U.S. ancestral cultivars that form the germplasm base of cultivars grown in the southern United states, (iv) southern U.S. cultivars grown formed from the southern U.S ancestral cultivars through a number of breeding cycles, (v) G.soja (wild soybean) from a range of origins in the Far East (Gizlice et al. 1993).

The three genetic bottlenecks have reduced the genetic diversity of G.max in many ways; (i) genetic drift which is a result of the founder effect in soybean, has changed the allele frequencies in the subsequent population, and hence they are more likely to be susceptible to any new emerging diseases, or pests; (ii) positive selection, i.e. fixing of a beneficial mutation at a particular loci, can cause selective sweep of the adjacent loci and hence eliminates the genetic polymorphisms in regions of the genome undergoing selection (Hyten et al 2006) and (iii) eliminate the rare alleles in the newly formed population. The extent of reduction in genetic diversity depends on (i) number of soybean accessions that have undergone genetic bottlenecks, (ii) selection intensity, and (iii) the time length of genetic bottlenecks (Hyten et al. 2006).

History of genetic diversity studies between G.soja and G.max

Dolanney et al 1983 studied the genetic relatedness between 14 accessions which contributed approximately to 70% of the gene pool found in northern and southern cultivars released between 1971 and 1981. Similar work by Gizlice 1994 identified that the ancestry of the 258 pubically released cultivars between 1947 and 1988 could be defined by 80 landraces. The coefficient of parentage for the landraces ranged from 0.0001 to 12.1539%. Approximately, 86% of the collective parentage was contributed by just 17 out of the 80 landraces, while the remaining 63 landraces contributed less than 1% each.

Molecular markers studies such as, restriction fragment length (RFLP), ribosomal DNA random amplified polymorphic DNAs (RAPDs), amplified fragment length polymorphism (AFLP) and simple sequence repeat (SSR) have helped to identify the erosion in the genetic diversity between the wild soybeans and the present cultivars.

Iron Chlorosis in Plants

Importance of Iron: Iron (Fe) is an important nutrient of every organism. It is an important component of many enzymes, such as cytochromes of the electron transport chain, and is also required for a wide range of biological functions (Schmidt 1999). The functions of iron are based on its reversible redox reaction of Fe2+ (ferrous) and Fe3+ (ferric) iron ions, its ability to form octahedral complexes with many ligands and its varying redox potential in response to different environmental conditions.

Iron is a transition metal. Fe (II) is relatively soluble but can be readily oxidized by atmospheric oxygen. Iron, nitrogen and phosphorus are the three important nutrients which limit the plant growth (Marschner 1995) in alkaline soil. Fe is essential microelements for plant metabolism, growth and crop productivity. It is generally present in two ion forms, Fe 2+ and Fe3+, in the soil. Fe ions are involved in redox reaction in the electron transport chains of photosynthesis and respiration, where energy is transported in the form of ATP. Fe is also required for the synthesis of chlorophyll and is necessary to maintain the chloroplast structure and function. (Albadia 1992). Fe is also responsible for symbiotic nitrogen fixation in root nodules of legumes as a result of its role in leghemoglobin and nitrogenase. (Dakora 1995, Kaiser et al., 2003). Leghemoglobin is an important heme protein. The root nodules establish a symbiotic association between plants of the Leguminosae family and soil bacterium of the genera Sinorhizobium, Rhizobium, Bradyrhizobium, and Azorhizobium. Fe is an important component of many plant proteins. There are two families of Fe-containing proteins in plants: the Fe-S proteins and the heme proteins, amounting to approximately 19% and 9% of the total leaf protein iron (Miller et al. 1984).

Cytochromes are the major heme protein involved in the electron transport in plants. A heme group contains a four-membrane pyrrole ring surrounding a central iron atom coordinated to the four pyrrole nitrogen atoms. Cytochromes are helpful in the photosystem I and photosystem II, and in the electron transport chain involved in the mitochondrial respiration.

Plant Strategies for the Iron uptake

Elements are classified as macronutrients (N, S, P, Ca, K, Mg), micronutrients (Fe, Mn, B, Cl , Zn, Cu and Mo) and beneficial elements (e.g. Na, Si, Se, F, Co) (Marschner 1995).

Chlorosis in the plants can be due to deficiency of many of nutrients. Chlorosis due to Fe deficiency develops at the youngest leaves, while chlorosis due to manganese deficiency affects leaves of intermediate age.

Soybean plants are defined as iron efficient and iron inefficient depending upon their response to Fe stress. Fe efficient plants respond to iron deficiency stress by inducing biochemical reactions that make Fe available in a more useful form, while the Fe inefficient plants are unable to do so (Brown 1978). Plant breeders classify Fe efficient plants as resistant to Fe stress, and Fe inefficient plants as susceptible to Fe stress (Fehr 1984).

The first strategy named as Strategy I plants, is employed by the non-gramineae plants, such as dicots Arabidopsis thaliana , Lycopersicon esculentum (tomoto) and Pisum sativum (pea). Strategy I plants regulate the uptake of iron in three ways. First, the release of H+ from the root surface by the proton pumping H+ ATPase enzyme at the expense of ATP to lower the pH value in the soil rhizosphere. Acidification of the soil results in the dissociation of Fe (OH)3, complexes, into ferrous ions. The lowering of pH by one unit increases the solubility of Fe3+ ions by a factor of thousands (Conolloy 1998). Second, the reduction of Fe3+ by a Fe3+ chelate reductase to the more soluble Fe2+ form. Ferrous irons are 106 times more soluble that ferric ions at neutral pH. Third, the plasmalemma transport of Fe2+ by iron transporters. Strategy I plants, also undergo changes in root morphology, an increase in the root hair formation to increase the surface available for iron uptake, increased citrate concentration in the phloem (Schmidt 1999).

The importance of Fe3+ chelate reductase activity was identified in Arabidopsis Thaliana, which is an iron efficient plant. A. thaliana mutants (frd1-1, frd1-2 and frd1-3) that do not show Fe3+ chelate reductase activity under iron-deficient growth conditions were identified (FRO2 allelic to the frd1 mutation affecting Fe3+ -chelate reductase activity) (Robinson et al. 1999). frd1 mutants were not able to translocate the radio labeled iron to the shoot (Yi 1996). The frd1 gene was identified due to its homology to the known yeast Fe3+- chelate reductase. The FRO genes are up-regulated in root response to Fe deficiency (Robinson 1999). Under iron deficiency, a set of genes encoding for a proton ATPase, AHA family (Palmgren 2001), the ferric-chelate reductase FRO2 (Conolly et al. 2003, Robinson et al. 1999) and the metal transporter IRT1 (Eide et al. 1996) is induced in soil rhizosphere. They help in the uptake of Fe2+ into the cytoplasm. Several iron transporters such as NRAMP, ZIP and YSL have also been identified in Arabidopsis (Curie and Briat 2003). IRT1 and IRT2 , a close homolog to IRT1 transporter belonging to the ZIP family were first identified through functional complementation of yeast mutant defective in iron uptake (Eide et al. 1996, Vert et al 2001). In A. thaliana, IRT1 is expressed in the epidermal and cortexes under iron deficient conditions (Eide et al. 1996). Only IRT1 and not IRT2 mutants of A.thaliana show symptoms of Fe deficiency under normal conditions. A knock-out mutant in the IRT1 gene , irt1, showed that under iron deficient conditions. IRT1 is the main route of entry of iron in the plants (Vert et al. 2002).

The second strategy, also called as Strategy II, is employed by the graminaceous species such as grasses like Wheat (Triticum aestivum), rice (Oryza sativa), barley (Hordeum vulgare) and maize (Zea mays). Under iron deficient conditions, the plants synthesize and secrete phytosiderophores (PS) in the apical zones of roots, composed of the mugineic acid family of phytosiderophores (MAs). PS have a relatively more effinity to chelate Fe3+ than any other metals. The Fe3+ -PS complex is then taken up by the iron uptake system from the soil. The biochemecial pathway has been understood by comparing iron efficient and iron inefficient plants. Most of the genes underlying the strategy II mechanism have been cloned from barley and rice (Negishi et al. 2002).

In maize, a transporter for Fe-PSs has been cloned and it is classified as a member of the oligopeptide transporter (OPT) family. Later the OPT was named yellow stripe1 (YS1) after observing the phenotype of a maize mutant deficient in the Fe-PSs uptake (Curie 2001). Under Fe- deficient conditions, the levels of phytosiderophores released and YS1 is increased in grasses (Mori 1999, Curie et al. 2001). Schaaf et al. 2004b reported that YS1 is a H+-Fe3-phutosiderophore transporter which depends on proton co-transport and pH and hence YS1 mediated transport occurs even alkaline pH.

Allocation and transport of the Fe in the plant:

After iron enters the root symplast through the strategy I and strategy II mechanism, it has to transfer to other parts of the plants. However it has to be protected from oxygen to prevent its precipitation and generation of harmful oxygen radicals. The precipitation of iron is controlled by nicotianamine (NA), which is a default chelator of iron to avoid its precipitation and synthesis of radical oxygen. NA forms stable complexes with both states of Fe at neutral pH and weakly alkaline pH (Stephen et al 1996). Fe3+ has a higher formation constant while the Fe2+ -NA complex is more stable under aerobic conditions (Wiren et al. 1992). NA is present abundantly in all plant tissues (Scholz et al. 1992). NA-deficient tomato mutant Chloronerva shows increased levels of antioxidant enzymes (Herbik et al 1996) and precipitation of iron in vacuoles and mitochondria (Liu et al 1998).

The transport of the iron from the root epidermis to the xylem vessels takes place as an Fe2+-NA complex on a symplastic route (Stephen et al 1996). The release of iron into the

xylem, is mediated by parenchyma or transfer cells. It is believed that iron is again oxidized when released into the xylem vessel and then transpored as Fe3+ -citrate complex (Tiffin 1996). The distribution of iron from the xylem to the leaf lamina is again nediated by NA-iron complex. This was shown in NA-deficient mutant chloronerva where most of the iron was deposited along the veins. The mechanism of the uptake of iron from the roots and its release in the cells is explained in figure.

Iron Deficiency Chlorosis: Iron deficiency chlorosis (IDC) is a very common and important yield limiting factor in soybean grown on calcerous soil in the north-central regions of United States. The region extends from central Iowa to central Minnesota and then into southeast South Dakota. The yearly loss of soybean due to Fe deficiency is estimated to be worth $120 millions. In Iowa and Minnesota alone, IDC leads to a loss of over ten million dollars in soybean production (Fleming et al. 1984).

The symptoms of Fe deficiency can be seen in the interveinal tissue of young leaves as a result of plant’s inability to uptake Fe from the soil. Deficiency of Fe in plant is referred to as IDC because iron is required by the plant to make chlorophyll which is green in color and a lack of iron in the plant results in less production of chlorophyll and hence leaves have a yellow color. Soybean plants show a wide range of IDC symptoms ranging from a milder form where the plant lacks a dark green color to severe symptoms, where the leaves may become yellow with green veins, while in extreme cases the leave may turn entirely yellow-to-bleached white which eventually results in the plant death. IDC is considered as a quantitative trait, as the complex phenotype variations suggest the segregation of multiple genes at many loci. The interaction of genes controlling large and small phenotypic variations and their interactions with the environment results in the wide variation for the IDC trait. These multiple genes are referred to as quantitative trait loci (QTL).

Reasons for Iron Deficiency Chlorosis:

Fe is considered as the fourth most abundant element in the soil so Fe deficiencies results due to the interaction of several factors which controls its solubility in the soil solution and plant sap (Brown 1959). Walker 1966 explained that IDC in sorghum generally results from factors other than the lack of iron present in soil.

The Fe3+ ion form of iron is less soluble in soil as compared to the Fe2+ ions form. The more soluble form i.e. the Fe2+ can be readily oxidized by atmospheric oxygen to the less soluble form i.e. Fe3+ form which again precipitates (Guerinot and Yi 1994). The solubility of Fe3+ in aerobic soil with neutral pH is between 10-11 to 10-10 which is lower than the optimal concentration required ranging between 10-4 and 10-8. Multiple factors which make the abundant iron present in the soil inaccessible to the plant may include; (i) high precipitation (ii) high soil water content, which influence concentration of Fe in solution by regulating the soil aeration as CO2 concentration influences soil pH and HCO3- and O2 concentrations influences soil redox potential and the ratio of relative Fe2+ to less soluble Fe3+ (Chen and Barak, 1982); (iii) low soil temperature; (iv ) high lime, ethylene, bicarbonate, heavy metal content concentration in the soil; (v) soil pH, when the soil pH is between 6.0 and 6.5, iron is adequately available. Fe deficiency is observed at high soil pH. In alkaline soil, the availability is controlled by the presence of carbonates; (vi) high soil compaction hinders the movement of Fe and (vii) low root growth of the plants. So depending upon the soil composition, the intensity of IDC symptoms in soybean plant can vary within a given area and both the size and shape of chlorotic regions may vary from year to year (Inskeep and Bloom, 1986).

The identification of the causes of IDC in soybean is further complicated by effects of factors like infestation with soybean cyst nematodes (SCN, Heterodero glycines Ichinohe), which often produce symptoms similar to IDC (Tylka 2001) and also because the cultivars differ in the sensitivity to IDC (Cianzio et al. 1979)

Reducing Soybean productivity loss from IDC:

There are two ways of protecting the loss of soybean production as a result of iron deficiency chlorosis. First, develop high yielding chlorosis resistant cultivars through breeding programs. This is the primary aim of plant breeders, i.e. to develop improved cultivars mainly through selection (Rex Bernardo 2008). Second, study the genetic inheritance for iron utilization in soybean plants. This is the primary aim of a geneticist to understand the underlying mechanism of inheritance and variation. (Rex Bernardo 2008).

Development of a Iron Efficient cultivar: Cultivar development is conducted by public and private institutions. The initial breeding programs in the United States were conducted by USDA, spread in many regions of the country. At present, breeding programs of private seed companies such as Monsanto, Pioneer Internation Inc (2004), and public institution such as North Dakota State University, Iowa State University, University of Minnesota and Kansas State University are trying to develop IDC resistant cultivars which can be made commercially available. Cultivars and germplasm with better resistance to iron chlorosis are available for oat (Avena byzantina C.Koch), sorghum (Sorghum bicolor (L.) Moench), dry bean (Phaseolus vulgaris L.), and soybean (Glycine max (L.) Merr.). The development of a cultivar involves following steps; (i) making crosses between genotypes with desirable traits to generate variations in the segregating population (like, F1 and F2); (ii) selection of genotypes which show the desirable traits from both the parents, which is followed by, (iii) pedigree breeding, recurrent selection, and repeated backcrossing to develop improved cultivars.

Soybean is a self pollinating legume with less than 0.5 to 1% percentage of outcrossing (Carlson and Lersten 1987). Commonly used plant-breeding procedures include backcrossing, single pod descent, pedigree breeding, and bulk population breeding (Poehlman and Sleper 1995). Breeders can not classify soybean into iron efficient and iron inefficient cultivars as they depict, a continuous variation in chlorosis resistance, e.g. highly iron inefficient T-203 genotype (a selection from PC 54619 of the soybean germplasm collection) to highly iron efficient A15 (developed from recurrent selection for chlorosis resistance).

Weiss (1943) was the first person to publish his results on the study of inheritance pattern for IDC in soybean. He generated F1, F2, F3 and backcross generation from multiple combinations from ten inbred genotypes Six were susceptible to chlorosis (PA 54610-5, T-203, PI88508, PI88358, PI87619, and PI88345 ) and four were resistant to chlorosis (Dunfield, Mandell, Illini and Mukden). The population was grown in the greenhouse by the use of nutrient solution. Weiss (1943) concluded that iron utilizations in soybean is controlled by a single major gene with dominant allele (Fe) for iron efficient and recessive allele (fe) for iron inefficiency and there was complete dominance. He ignored some variations among inefficient cultivars for degree of iron efficiency as the magnitude of expression of the modifying gene was very little as compared to the dominant gene.

The single gene inheritance for IDC by Weiss (1943) was further confirmed by Bernard (1947), who developed iron inefficient isolines by backcrossing fe allele from T-203 into cultivar ‘ Clark’ and ‘Harsoy’. Cianzio and Fehr 1980, based on the past studies, crossed Anoka (IDC susceptible cultivar) and ICR EX-5003 (IDC resistant line) and evaluated F2 derived lines in F4 lines and also in progeny developed by backcrossing with both parents. Their results further confirmed the observations made by Weiss 1943.

Past researchers indicated that backcrossing of the segregating population with the resistant genotype is an ideal way to introgress the Dominant Fe allele into the new segregating population. Cianzio and Fehr 1982, working under the Iowa State University breeding program, developed a cross between Pride B-216 (a high yielding cultivar susceptible to IDC) and A2 (a low yielding cultivar resistant to IDC). The F1 progeny had intermediate chlorosis scores compared to the genotypes of both the parents. Also segregating populations of F2 derived lines and F2 backcrossed with A2 (one of the parental genotypes), showed a continuous distribution of chlorosis scores. None of these lines had IDC resistance similar to A2. This indicated that instead of one gene, several genes were controlling the chlorsis resistance trait.

In 1984 Fehr et al, released IDC resistant inbred breeding line named, A7 through recurrent selection procedure. Jessan et al 1988b released A15, a resistant breeding line through recurrent selection procedure. However these lines lacked any agronomic beneficial traits to be made commercially available.

Screening for iron-efficient cultivars:

Most of the screening procedures is done in the field nurseries, where environment factors like, rainfall patterns, temperature, soil heterogeneity, water content in the soil contribute to the non-genetic variation. These non-genetic variations make the research difficult to reproduce across years and environments. To minimize these variations, breeders replicate the entries within a field nursery and grow the variety over a period of several years to estimate the resistance inheritance of that variety accurately.

Alternative to a field nursery is a laboratory technique that uses a nutrient solution to screen the genotypes for Fe efficiency. The nutrient solutions are made to identify iron efficient genotypes based on field conditions of calcerous soils. Strategy I plants such as soybean release H+ and reduce Fe(III) at the root plasmalemma and a laboratory technique having nutrient solution to measure the relationship between release H+ and IDC was developed. Weiss 1943 collected soils from field nurseries and mixed it, in an attempt to screen soybean genotypes for chlorosis under greenhouse conditions. He concluded that nutrient solution was a more reliable estimator of chlorosis resistance than field tests. Cianzio and Fehr 1982 used the same genotypes as Weiss 1943, both in fields and in nutrient solution and concluded that the results did not correlate. Bryan and Lambert 1943 homogenized the potted soil in the growth chamber experiments, but discovered less chlorosis symptoms among plants than that observed in field nurseries. Inskeep and Bloom 1984 concluded that a variety of soil factors were associated with chlorotic soybeans such as soil water content, bicarbonates concentration, and soil density. Plotted calcareous soils with soil packed density of 1.1 Mg m-1 in an undrained soil with water level maintained just below saturation was used to screen Fe efficient lines among genotypes (Fairbanks et al. 1987). Chlorosis symptoms are due increased concentration of bicarbonates (Jessen et al.1986) and nutritional deficiencies such as nitrogen, manganese, phosphorus, boron , calcium and iron (Taiz and Zeiger, 1991). Jessen et al. 1988 and Coulombe et al. 1984b reported that IDC scores collected from lines grown in field and that grown in nutrient solution systems were same. Their nutrient solution system consisted of high HCO3- concentration and low Fe concentration. Jessan et al. 1988a tested soybean varieties on both fields and greenhouse and found a rank correlation of 0.98 between field scores and nutrient solution.

The main drawbacks of soybean breeding are; (i) the narrow germplasm base of soybean limits the genetic diversity; (ii) private companies rarely share germplasm for crossing, which further leads to the narrowness of the germplasm; (iii) IDC symptoms vary from severe to nonexistent within a meter due to soil heterogeneity. It is difficult to have large fields with uniform calcareous soils to evaluate Fe efficiency which makes variety screen difficult (Diers et al. 1991, Lin et al. 1999) and (iv) variety breeding and selection is time consuming and expensive. It is estimated that it takes at least six years of intensive testing before a new plant variety is commercially available. This means that soybean crosses made in 2008 will be able to reach preliminary trials in 2011 and then commercially available in 2014.

To overcome some of the drawbacks of traditional plant breeding, like efficiency of the breeding procedures, scientists seek alternative techniques, to select IDC resistant varieties. Molecular markers may be used to study the genome regions responsible for desirable traits like genes governing resistance in iron efficient soybean varieties.

Application of Molecular marker: The development of molecular markers has increased our knowledge of plant genetics and inheritance of the QTL affecting the trait. Molecular markers are the polymorphisms in the DNA between two genotypes. Molecular markers are used to construct genetic linkage maps, which are representative of the chromosomal location of the genes responsible for the phenotype of the plant. They provide advantages over the conventional breeding practices as they are more abundant than phenotypic markers and they are unaffected by the environment.

The following paragraphs summarize the molecular markers used widely in genetic studies.

Restriction fragment length polymorphisms (RFLPs) are detected as variations in the DNA fragments produced by digestion of the DNA sample with restriction endonucleases and are therefore called as RFLP (Grodzicker et al. 1974, Botstein et al., 1980). Genetic variations in DNA nucleotide sequence between two genotypes would result in different distribution of the restriction enzyme sites. This would result in different combination of restriction fragments upon digestion with restriction enzymes. Upon hybridization with specific radio actively labeled probe, the fragment would present at a different location (Zabeau and Roberts 1979). Keim et al 1990 generated the first soybean genetic linkage map by a cross between G.max (A81-356022 x G.soja (PI468916), comprising of 26 linkage groups based on 150 RFLP. Soybean has low levels of genetic diversity in morphological and RFLP markers. The two RFLP alleles have asymmetric frequency, e.g. p>0.9, q<0.1, so the likelihood of two genotypes being polymorphic at a particular RFLP locus is relatively low (Keim et al. 1989, Keim et al. 1992). Lark et al.1993, constructed another linkage map with 31 linkage groups with 132 RFLP, isozymes and one morphological marker. Shoemaker and Specht 1995 reported that out of 365 polymorphic RFLP markers mapped in a G.max x G.soja population, only 118 were polymorphic in the Clark and Harsoy mapping population. However, since soybean has a tetraploid origin (Hymowitz and Singh 1987), this led to detection of multiple DNA fragments with the RFLP probes. So other molecular markers such as RAPDs (random-amplified polymorphic DNA), SSR (simple sequence repeats), Single nucleotide polymorphism (SNP) were employed.

Random amplified polymorphic DNAs (RAPDs) are based on the presence of inverted repeats, which are generally 9-11 bases long. Single arbitrary oligonucleotide primer is used for amplification of template DNA without prior knowledge of the amplified sequence. A PCR product is produced when the single primer binds to the template DNA on opposite strands at a distance less than 3,000 base pairs. Specht 1995 generated a linkage map of 110 RFLP, eight RAPD DNA markers and seven isozymes markers in an F2 population generated by mating Clark x Harsoy, two important soybean cultivars.

Simple sequence repeats (SSR) are DNA markers that contain ten-twenty di- or tri- nucleotide repeats. Previous research suggest that (AT)n repeat was the most abundant SSR type followed by (A)n, (CT)n , (AAT)n , (ATT)n , (AAC)n, (AGC)n, (AAG)n, (AATT)n, (AAAT)n , and (AC)n repeats (Wang Z. et al 1994). The frequency of SSR (AT)n was estimated to be approximately two SSR per 100kbp of soybean sequence (Akkaya et al 1995). SSR- specific primers are designed from the nucleotide bases flanking the repeat regions, which are usually conserved between genotypes of same species, to allow amplification of the intervening SSR in many genotypes. The amplification product is electrophoresed on the agarose gel to visualize the length variation in different genotypes, as a result of variation in nucleotide repeat. Highly polymorphic microsatellite loci may have as many as 26 alleles (Maughan et al. 1995). SSR markers are useful for genetic mapping experiments as they show co-dominant Mendelian inheritance. SSR markers have been used to develop genetic map in plant species such as maize (Sharopova et al. 2002), rice (McCough et al. 2002), Arabidopsis (Ponce et al. 1999), common bean (Yu et al. 2000) and barley (Barr and Langridge, 1999). Cregan et al 1999a mapped 606 SSR loci, 689 RFLP, 79 RAPD, 11AFLP, ten isozymes, and 26 classical loci on 20 consensus linkage groups, in one or more of the three mapping populations: the USDA/Iowa State G.max x G.soja F2, the university of Utah ‘Minsoy’x ‘Noir 1’ 240 recombinant inbred lines (RIL), and the University of Nebraska ‘Clark’ x ‘Harsoy’ F2 populations. Earlier SSR markers were derived from genomic DNA fragments from genomic libraries. However, recently expressed sequence tags (EST) sequencing projects have revealed numerous SSR markers. A number of crop species such as rice (Cho et al. 2000), grape (Scott et al. 2000), barley (Kota et al. 2001) and wheat (Eujayl et al. 2002), ESTs have been used as a source to develop SSR markers. Song et al. 2004 generated a detailed SSR genetic map with 391 SSR markers designed from genomic DNA library clones of ‘William’ soybean. These SSR were mapped in five soybean mapping populations, three of these mapping populations were same as used by Cregan et al 1999a. The other two mapping populations were from the university of Utah, ’Minsoy’ x ‘Archer’ (MA) and ‘Archer’ x ‘Noir 1’.

Amplified fragment length polymorphism (AFLPs) are based on the selective amplification of restriction enzyme digested DNA fragments. They are analyzed on the basis of the presence or absence of a band on an electrophoretic gel (Piepho and Koch 2000)

Single nucleotide polymorphisms (SNP) are single base pair changes between different genotypes. They are considered as the most abundant source of DNA polymorphism. SNP are more frequently found in the non-coding regions as compared to coding regions of DNA (Zhu et al. 2003).

Once a genomic region, having major effect for a trait, is detected with the help of an adjacent marker locus, breeders can perform repeated backcross to generate individuals which have that marker allele which signifies the introgression of the quantitative allele for the phenotypic trait.

QTL mapping Studies in Soybean : In soybean, a lack of genetic variations in the germplasm, difficulty in performing crosses and a lack of RFLP prevented the genetic mapping (Doyle and Beachy, 1985). A genetic map with twenty-six linkage group covering 1200 recombination units was generated with one-hundred-fifty RFLPs from an F2 segregating population by a cross between G.max (A81-356022, an Iowa State University breeding line) and G.soja accession ( PI 468.916) for the study of several reproductive and morphological traits (Keim et al. 1990).

Diers et al. 1992 identified three markers out of 272 mapped RFLP markers screened on 13 F2-derived lines, which were developed from a cross between G. max (Fe-inefficient) and G.soja (Fe-efficient) to be linked with the QTL for Fe-efficiency in a set of lines. The markers were significantly (P<0.01) associated with QTL for Fe-efficiency and explained 31%, 25% and 17% of the phenotypic variations. However these linkage associations of were not reproducible in a second tester population.

Lien et al. 1999 mapped QTL for Fe-efficiency in a two bi-parental cross of G. max x G. max . The pride population, developed from a cross between Pride B216 (Fe-inefficient) x A15(Fe-efficient) , and the Anoka population, developed from a cross between Anoka (Fe-inefficient) x A7 (Fe-efficient). The populations were scored visually and the chlorophyll concentration was measured in the laboratory. Ninety RFLP, and ten SSR markers were used to construct the linkage map in Pride population comprising of 120 F2 plants. Eighty-two RFLP, fourteen SSR and one morphological (hilum color) markers were used to construct the linkage map in the Anoka population comprising of 92 F2. Replicated Field trials were conducted from the F2 derived lines using randomized complete block design. The interval mapping method was used for QTL detection. A Multigenic model of inheritance was reported in the pride population as four QTL, two mapped on linkage group B2, one mapped on G and one mapped on N linkage were reported. These QTL individually explained 7.7 to 10.8% of the phenotypic variation of the IDC visual scores. This research confirmed the work of Cianzio and Fehr 1982, about multiple gene action for Fe –efficiency. In the Anoka population, two QTL were mapped on linkage group A1 and N, each explained 35.2% (LOD=13.1) and 72.7% (LOD =7.3) of the total phenotypic variation of the IDC visual scores. QTL on linkage group N was regarded as a major region because it was mapped with a high LOD score and contributed for large phenotypic variation. This also confirmed the results of the past research of Cianzio and Fehr 1980. As QTL located on linkage group I and N were common in both the populations, markers from Anoka population were evaluated in the Pride population and vice-versa. This was done to enhance their accuracy for marker assisted selection, but in neither of the markers could be scored in the other population.

Charlson et al. 2003 developed F2 and F2:4 populations using parent A97-770012(Fe resistance and moderate yield) and Pioneer 9254 (P9254) (moderate resistance and superior yield). Chlorosis scores were evaluated for parents and F2:4 derived in replicated field trails on calcareous soils at two locations in Iowa i.e. Ames and Humboldt while the genotypic determination was conducted on F2 lines. Out of 103 SSR markers previously identified to be linked with QTL for IDC resistance, only 23% were polymorphic between P9254 and A97-770012. Three different SSR markers were associated with IDC for each location.(p<0.1).

To overcome this problem, association mapping approach has been used to discover QTL with greater accuracy. Association mapping is a population level approach which depends on the linkage disequilibrium relationship within population.

Association Mapping : Association mapping, an alternative to bi-parental mapping can detect genetic contributions to quantitative traits with greater confidence than linkage analysis (Risch, 2000). The basic concept is to look for marker-trait associations in population of unrelated individuals rather than a population with known relationships (individuals with known pedigree information, or the offspring’s of an experimental cross) (Nordborg et al. 2002). Individuals in a population will be more distantly related to each other, as compared to the individuals in an experimental cross. A genome wide scan in a bi-parental linkage analysis help to find of the location of the QTL, while Linkage disequilibrium (LD) mapping detects the precise location of the QTL controlling the trait under investigation (Mackay 2001, Glazier et al. 2002). Association mapping using LD has many advantages over bi-parental linkage mapping. First, there is no pedigree or crosses required. Variations in the trait are not studied by segregating loci between two genotypes, but by segregating loci in a population. Second, due to many generations of recombination accumulated in a population consisting of many genotypes, the resolution of finding a marker in close proximity to QTL is higher as compared to an F2 population or a recombinant inbred lines where homozygosity after few generations. Third, more than two alleles for a locus can be identified in a population, while in a bi-parental population there are just two segregating alleles for a locus.

Steps involved in association mapping include; (i) selecting a group of individuals from a natural population or germplasm collection representing wide range of phenotypic diversity; (ii) collecting the phenotypic traits from the selected population after replicated trials in different environment; (iii) genotyping the population with molecular markers; (iv) determining the LD decay using the molecular marker data; (v) determining the population structure (the level of genetic differentiation among the groups in the population ) and kinship (coefficient of relatedness between pairs of each individual within the population) and (vi) using the appropriate statistics along with the genotypic data, results from the LD analysis, population structure and kinship and molecular markers in close proximity to the QTL of interest are detected.

The power of association mapping relies on the linkage disequilibrium, between the marker-trait. Linkage Disequilibrium, also known as gametic phase disequilibrium is defined as the inheritance of two alleles located on same or different chromosomes more frequently than expected by chance. Second definition of LD is also defined as the non-random association of alleles at more than one locus, which increases or decreases the frequency of the haplotypes in a population at random combination of alleles at different loci. Over a series of generations, in a random mating population, only correlation between the QTL affecting the trait and molecular markers closely linked to QTL will exist (Mackay and Powell 2007).

The resolution of association mapping is attributed to careful estimation on the decay of LD across the genome. LD decay is not constant throughout the genome (Nordborg 2002), nor is it constant with the genomic regions across multiple populations. Genetic bottlenecks, founder population influence LD pattern across entire genome, while selection, recombination rate, mutation has more localized effect on the genome. The desired cause of LD is the physical linkage, but Linkage disequilibrium can be generated due to a number of other factors which are summarized in table (Rafalski and Morgante 2004)

Recombination rate: recombination frequency depends on the degree of polymorphisms between the homologus chromosomes. Recombination shuffles the regions of chromosomes and hence causes shuffling of the alleles. After many generations of enough recombination, only tight linkage between alleles in LD persists. The extent of LD decay between two polymorphic markers is a function of the recombination rate and the time which is the number of generations.

∆D = (1-r) t, where, ∆D is the rate of LD decay, r is the recombination rate that is the function of the genetic distance between polymorphic markers and t is the number of generations. If r= 0.001, then D will decrease by a factor of 10 in 2000 generations

Population Structure/genetic stratification: the range of LD depends on the population history, i.e. population genetic bottlenecks, followed by the geographical expansion and subpopulations in which it is measured (Rafalski and Morgante 2004). According to the population genetics, a population is a group of individuals which are freely mate and hence there is no restriction to gene flow. Individuals in a population are heterozygous for many loci, with more than two alleles present for a particular locus controlling the quantitative trait. However, a population always has many subpopulations, where gene flow occurs just within the individuals of the subpopulation (Wright S. 1951). A sub-population is formed when few individuals are isolated and they become the founders of a new population. There are two sources of genetic variation between the newly formed subpopulations. First, independent of source of the founders, the subpopulations have less amount of genetic diversity as compared to the original population. This less diversity is estimated by the proportion of heterozygous individuals averaged over loci, gene diversity (i.e. complement of sum of allele frequencies, averaged over loci) and also in the number of segregating alleles (Crow, J.F. 1970). Second, new mutations, are not transferred to among different subpopulations, which further results in genetic drift. Genetic drift is the random change in the allele frequency as gametes transmitted from one generation to the next carry only a sample of the alleles present in the parental generation. In large population, changes in allele frequency due to drift are small but in a small population allele frequency may change dramatically (Ellstrand and Elam 1993). Above mentioned factors have influence on individual locus; however population structure can also have an impact on multiple loci. Small founders for a subpopulation means restricted multi-locus combination of alleles on chromosomes. Inbreeding (attributed to geographic and reproducibility isolation) leads to homogenous genomic regions as a result strong LD as recombination between homogenous regions produces same allelic combinations.

Apart from the presence of subpopulation for a crop species, mating system of the crop species also determines the genetic variability present within a sub-population. The amount of gene flow in a selfing species is smaller as compared to an outcrossing; more differences would be observed between subpopulations of the selfing species than between subpopulations of an outcrossing species.

Mating system (selfing/outcrossing): Selfing species like A.thaliana (Nordborg 2002) or G.max (Zhu et al. 2003), have high level of homozygosity and so recombination between homologus chromosome will not reduce LD. The LD in these species can stay upto to hundreds of kilobases. In their study, Nordborg et al. 2002 estimated the LD decay by sequencing 13 short segments of 0.5-1kb separated widely, from a 250kb region on chromosome 4 using 83 polymorphic sites from 20 individuals. The distribution of distances between markers was observed to be non-uniform. LD decayed approximately within 1cM region on chromosome 4. In outcrossing species such as humans, maize, LD declines more rapidly. In maize, out of the six genes sampled, in four the LD decayed to less than 0.1 within 2000 bp (Remington et al. 2001). In humans; the LD decay may vary from a few kilobases to several hundred kilobases. In some maize population LD, declines over a few hundred base pairs (Tenaillon et al. 2001). However the extend of LD is comparable between A. thaliana and humans. A.thaliana has more polymorphism than Homo sapiens and therefore has many haplotype structures. . However in species showing mixed mating systems, LD decays at the rate that is a function of recombination and also the level of outcrossing. Outcrossing causes introduction of new alleles at different loci that within few generations become homozygous due to recombination (Allard 1973). LD studies in different crop species depending upon their mating system is summarized in table.

Population admixture: results in gene flow between individual of genetically distinct population followed by intermating (Flint-Garcia et al. 2003). This causes changes in allele frequencies, resulting in LD decay between unlinked sites. Random mating in the newly formed population results in breakage of the linkage disequilibrium due to admixture (Flint-Garcia et al 2003).

Selection: at a particular locus is expected to decrease the genetic diversity and increase LD in the surrounding region also known as selective sweep. The extent of selective sweep e.g., artificial selection for the dominant allele Y1, encoding for phytoene synthase, responsible for the yellow color of endosperm in many maize lines over white endosperm present in the progenitor of maize has led to depletion of nucleotide diversity 500kb from the gene (Palaisa et al. 2002). In A. thaliana, the LD decayed within 1cm distance for a 250 kb fragment analyzed on a region on chromosome 4 in 20 global accessions with 89 nonuniform SNP markers. However analysis of 163 SNP markers on 76 global accessions resulted in LD decay in chromosome 4 more rapidly than in the FR1 region, may be within 50 kb. This shows that may be the FR1 region has undergone local adaptation, which decreased heterozygosity and hence increased LD at that locus while the other region containing the disease resistance locus RPP5 has not. In soybean, three regions greater than 300kb were analyzed in four different populations, G. soja(PI), landraces (selected from a wide range of geographic origin and various maturity groups), 17 N.Am. ancestors (the G.max accessions that contributed 86% if the genes present in the gene pool of N. Am) and elite cultivars (Gizlice et al. 1994). LD has degraded throughout the three regions in G.soja, with an average block length of 4.8kb.block covering 18% of the sequence length. In landraces the haplotype block spanned 186 kb, but many were <1kb, and in N. Am. Ancestor the haplotype blocks were 89kb of the three regions studies . The haplotype block was largest for the elite cultivars. The decay of LD in four soybean population is largely due to domestication, selection and founding effect (Flint-Garcia et al 2003). Divergent selection for the time of maturation in different regions might have created LD among chromosomal regions containing major genes these trait (Remington et al 2001)

Mutation rate: introduction of new mutation can disrupt the LD between pairs of alleles. Over generation of crossing over will cause the generate LD between the old pairs of alleles and the new mutant allele.

LD studies in Plants

LD studies are conducted in many plants: These studies include (i) estimation of the extent of LD in different plants genomes and also in different parts of the plant genomes (ii) measure of nucleotide diversity to access genetic diversity in the population, (iii) assessment of the effect of selection/ domestication of the crop species, (iv) detection of marker –trait association by LD analysis (Gupta et al 2005)

Association mapping in a structure population: Population structure or stratification is one important factor which contributed to the Type I error (declaring a positive association even when the there is no association). Population structure created LD between unlinked loci. If the population is homogenous (unstructured), i.e molecular markers associated with the putative QTL for the trait can be inferred by studying the difference in the marker allele frequencies between genotypes showing variations. However if a population in a structured population(presence of subpopulation within the population), following factors, when combined, lead to false positive marker-trait association due to population structure are (i) variations in the disease (susceptible/resistant) rate across the subpopulation, (ii) variations in the allele frequencies across the groups.

Pritchard and Rosenbery (1999) and Pritchard et al (2000a) developed two models based approach, (i) no-admixture model and (ii) admixture model, that utilized a small collection of unlinked markers to define population structure. They developed a software package called STRUCTURE for this analysis. Pritchard et al. 2000b developed an association mapping method for structured population. Both the models assumed that markers were unlinked and hence provided independent information about the individual’s ancestry. Pritchard et al. 2003 introduced a third model, the “linkage model” which extends the admixture model to account for correlations between linked markers as a result of admixture.

Factors affecting linkage disequilibrium (LD) in a population

Linkage disequilibrium (LD) in Plants

References

Wright. S. (1951).The genetical structure of populations. Annals of Eugenics 15. 323-354.

Crow . J.F. and Kimura M, (1970). An introduction to population genetics theory. Harper and Row. New York.

Ellstand N.C. and Elam D.R.(1993) Population genetic consequences of small population size: implications for plant conservation. Annu. Rev. Ecol. Syst. 24:217-242

Order Now