03.02.2020
Posted by 

With the advent of new sequencing technologies, deep sequencing is becoming the standard approach to obtain complete genomes ( ). In particular, there has been a rapid increase in the number of sequenced organelle genomes that have been widely used for evolutionary and phylogenetic studies. Today, there are 858 plastid genomes and 137 mitochondrial genomes publicly available for plants (National Center for Biotechnology Information NCBI database, accessed 16 July 2015). Advances in DNA sequencing technologies are providing a new cost‐effective option not only for genome comparisons at a large scale but also for the study of interactions between organelle and nuclear genomes in plants.The plastid genome is haploid, but there are several copies per organelle that do not recombine. In angiosperms, they usually exhibit uniparental (maternal) inheritance and considerable sequence and structural conservation ( ).

This structure consists of a quadripartite organization with two single‐copy regions, a longer (long single copy LSC, 80–90 kb) and a shorter one (short single copy SSC, 16–27 kb), and two inverted repeat regions (IR; 12–25 kb) ( ). Despite their high degree of structural and sequence conservation, chloroplast genomes usually display enough variation to perform inter‐ and intraspecific variability studies using whole chloroplast genome comparisons (;;;; ).It is also known that several fragments, and in some cases almost entire copies of the chloroplast genome, may be found in the nuclear genomes of plants.

However, few studies have described these sequences and their evolutionary processes in detail (; ). These nuclear copies of the organelle genomes are the product of a continuous process of transference of plastid sequences to the nucleus. After their insertion into the nuclear genome, plastid sequences exhibit a high rate of fragmentation and accumulation of single nucleotide substitutions ( ). Specifically, these authors presented evidence indicating that there is at least a 10‐fold increase in the nucleotide substitution rate in nuclear‐inserted plastid DNA when compared with their counterparts that remain in the chloroplast genome. On the other hand, the fragmentation of these DNA segments in the nucleus is expected to render many sequencing reads that will exhibit a chimeric matching pattern (discussed below). Therefore, depending on the time of DNA transfer events, these sequences may retain different degrees of similarity to the original plastid genome, and consequently introduce noise that poses additional complications to chloroplast sequence assembly from whole genome data sets.

As a consequence, these nuclear DNA segments of chloroplast origin should be taken into account for chloroplast genome recovery from total DNA sequence data, not only because they can introduce distortions in the assembly, but also because they can provide valuable evolutionary information.Different strategies to obtain whole chloroplast sequences have been reported, and they often involve prior plastid DNA isolation (;; ) or plastid DNA enrichment (;;; ). Although successful for a range of less‐studied species, these approaches can be time consuming and costly ( ). An alternative strategy consists of sequencing the total genomic DNA of a plant and subsequently isolating the chloroplast sequences using in silico approaches (;;; ). Such methods may require both a reference genome and resequencing (; ), or the use of paired‐end or mate‐pair libraries to recover whole chloroplast genome sequences without using reference genomes (; ). Owing to the well‐documented difficulty of removing all plastid DNA, even when nuclear DNA enrichment protocols are used (; ), raw read samples, produced by projects aimed at obtaining nuclear genomes, also contain plastid‐derived reads that are usually in sufficient amounts to assemble their corresponding genomes. Therefore, developing strategies to efficiently recover and analyze this type of data deposited in public repositories is highly desirable.

In this study, we describe an approach to recover high‐quality complete chloroplast genome sequences from a whole plant DNA single‐read data set produced on a 454 FLX Titanium platform (454 Life Sciences, a Roche Company, Branford, Connecticut, USA).In this study, we used weedy rice ( Oryza sativa L.) as a model plant. This choice allowed us both to obtain a chloroplast sequence of interest for research, and to take advantage of the wealth of available information to validate our results. Publicly available information on rice genomes includes two complete nuclear genomes representing the two main domesticated subspecies, O. Sativa subsp.

Kato ( ) and O. Sativa subsp. Kato ( ); five chloroplast genomes representing different taxa within the O. Sativa complex: the two cultivated subspecies O. Sativa subsp. Japonica and O.

Sativa subsp. Indica ( ), and three wild species, O. Nivara Sharma & Shastry ( ), O.

Rar Password Remover

Rufipogon Griff., and O. Meridionalis N. Ng ( ); and one mitochondrial genome from O. Sativa subsp. Japonica ( ).Weedy rice, also called “red rice” because of its colored endosperm, is a clear example of a conspecific weed that is a major problem for the irrigated rice production system.

Like many domesticated plants, rice occurs as part of a crop‐weed‐wild complex (; ). Complete plastid genomes may provide a wealth of information and genetic markers that can be applied to evolutionary studies. A better understanding of the evolution of these weeds can contribute to unraveling the genetic basis underlying their ecological success ( ).

Plant materialRed rice biotypes were collected on a farm in Cerro Largo, Uruguay (31°46′S, 54°26′W), and maintained in a greenhouse (approximately 25°C) with regular irrigation. After two weeks of growth, fresh green leaves from one of these individuals (AM356‐8) were collected and genomic DNA was extracted. For DNA extraction, 0.2–0.4 g of fresh green plant tissue was ground with liquid nitrogen.

Then 700 μL of cetyltrimethylammonium bromide (CTAB) extraction buffer (2% CTAB, 1.4 M NaCl, 20 mM EDTA pH 8, 100 mM Tris pH 8, PVP 2% β‐mercaptoethanol 0.125%) was added, and the mix was incubated at 65°C for 20 min. After incubation, 700 μL of chloroform:isoamyl alcohol (24:1) was added and samples were centrifuged at 12,000 × g for 20 min at 4°C. The aqueous phase was precipitated with 0.7 volumes of isopropanol, and the precipitate was washed with 70% ethanol.

The pellet was dissolved in 100 μL of bidistilled water. Library construction and sequencingWe used 5 μg of purified DNA to construct the sequencing genomic libraries using the GS FLX Titanium Rapid Library with Multiplex Identifier (MID) 5 adapters for barcoding (454 Life Sciences, a Roche Company). Briefly, genomic DNA was mechanically sheared to obtain 400–1000‐bp fragments, ligated to the A and B adapters, and amplified using adapter‐specific primers. We used Agencourt AMPure XP beads (Beckman Coulter, Brea, California, USA) to discard fragments smaller than 350 bp. A TBS 380 Fluorometer (Turner BioSystems, Sunnyvale, California, USA) was used to adjust aliquot concentrations to 1 × 10 7 molecules/μL.

Emulsion PCR (emPCR) was performed with the GS FLX Titanium SV emPCR Kit (Lib‐L) (454 Life Sciences, a Roche Company) for 50 amplification cycles as follows: 30 s at 94°C, 4.5 min at 58°C, and 30 s at 68°C. For sequencing, we used the GS Titanium Sequencing XLR70 kit (454 Life Sciences, a Roche Company) for 1/4 of a GS Titanium PicoTiterPlate (PTP) 70 × 75 in a 454 Genome Sequencer FLX System (454 Life Sciences, a Roche Company). The raw reads obtained were deposited in the NCBI Sequence Read Archive (SRA) public repository (Bioproject ID PRJNA284786). Identification and classification of chloroplast sequences and de novo assemblyThe identification of chloroplast reads was performed by comparative analysis with BLAST ( ) against the three reference plastid genomes available in March 2011 ( O. Nivara AP006728, O. Sativa subsp. 9311 AY522329.1, and O.

Rar password recovery download

Sativa subsp. Nipponbare AY522330.1) from the NCBI (ftp server ). Two different filters were applied on the results: one on alignment length (100 nucleotides) and a second one on the overlap percentage 99% (alignment overlap, from now on referred to as O%).

Rar Password Recovery Software

The resulting set of reads was identified as the set of reads with complete alignment (RC). One set of reads was generated this way for each of the three reference genomes used at this stage (RC japonica, RC indica, RC nivara). Reads exhibiting incomplete alignment, namely those with overlap percentages. Search for divergent regions between public reference genomes and chloroplast AM356‐8 in silico classificationIdentification of divergent regions among rice chloroplast genomes was made by comparison with four chloroplast genomes: O. Rufipogon (NC017835.1), O. Sativa subsp. Indica, and O.

Sativa subsp. Oryza meridionalis was disregarded from this analysis because the species is a distant relative from cultivated rice, its genetic distance with O. Sativa (subsp. Indica and subsp.

Japonica) being 20× as much as that between O. Sativa and Asian O. Rufipogon ( ). This alignment was carried out with Whole Genome VISTA Tools ( ). We performed a visual inspection of the alignment using a sliding window 600 nucleotides in length to identify regions containing variable sites such as single‐nucleotide polymorphisms (SNPs) and indels ( ).

A 600‐bp region containing each informative indel was used for comparative analyses. These regions were compared to the RC japonica set with BLASTN with an E‐value of 1 × 10 −10 and flag ‐FF to keep the low‐complexity sequences. We selected reads with sequence identity (ID%) 90% and alignment lengths 100 bases. Finally, the selected reads were aligned to the reference regions with CLUSTALW ( ) to confirm the presence of variants in the AM356‐8 chloroplast read set.

Strategy to identify chloroplast‐nucleus DNA transfer eventsThe read sets with complete and incomplete alignment to the O. Sativa subsp.

Password

Japonica chloroplast genome (RC japonica and RI japonica) were compared to the O. Sativa subsp. Japonica nuclear genome to identify segments of chloroplast origin inserted in the nucleus.

We then classified these segments on the basis of a tentative estimation of the age of insertion into the nuclear genome. To help understand how different types of reads were identified, we present a schematic representation of the evolutionary process that the inserted chloroplast DNA segments undergo after their insertion in the nuclear genome.

As shown in this figure, it is evident that one should consider the degree of overlap (O%) between the read and both genomes as well as their degree of sequence identity (ID%) with both the nuclear and chloroplast genomes. Modern inserts (recently transferred) are somewhat difficult to identify because they are expected to show a high sequence identity with both the nuclear and chloroplast genomes. Consequently, reads derived from internal parts of the transferred DNA (reads type 1A in ) are indistinguishable from reads derived from the chloroplast genome. However, when sequencing reads derived from recent transfers include the insertion edges (boundary of insertion), they can be readily identified. This type of read (represented by 1B in ) will partially align with the chloroplast genome (with a high degree of identity), but they will align completely and with very high identity with the nuclear genome. Therefore, we identified these sequences by using the following filtering criterion: O%.

Schematic representation of the evolutionary process that occurs after the insertion of a chloroplast DNA segment in the nuclear genome. The inserted fragment is represented by a blue box in the nuclear genome, whereas the homolog fragment that remains in the chloroplast (referred to as “donor” DNA) is represented by a green box. The main evolutionary events are depicted: accumulation of point mutations in both genomes (represented by yellow vertical lines) and fragmentation of inserts in the nuclear genome. The different predicted types of sequencing reads (1A, 1B, 2A, 2B, 3A, and 3B) and how they are expected to match with both genomes are also schematized.Reads derived from older transfers (represented by 2A and 2B in ) can be identified by using the sequence identity level. Specifically, they are expected to exhibit a very high nucleotide identity with the nuclear genome and noticeably less identity with the chloroplast genome. Reads of type 2B can also be identified using the O%. Consequently, we looked for reads with an identity percentage threshold lower than 98% (ID% 99% with both genomes (further details in ).

Sequence dataA set of 295,159 single reads of red rice biotype AM356‐8 was generated with a mean length of 277 bp (96 Mb data) from a 1/4 run on a 454 GS FLX (454 Life Sciences, a Roche Company). Read quality was satisfactory, with a low ratio of duplicates (9.6%). The representation of the three plant cell genomes in the data were as follows: 177,920 reads were mapped to the nuclear genome with a coverage level of 0.13×, 17,003 reads showed similarity with the mitochondrial genome (coverage level 10×), and 47,817 reads showed similarity with the chloroplast genome (coverage level 106×). Identification and classification of chloroplast sequencesThe identification of chloroplast sequences within the complete sequence data set was performed by following the comparative genomics strategy shown in. On average, 47,800 reads were identified with similarity to at least one of the chloroplast reference genomes, which represent 16.2% of the data. After applying the two previously defined filters (alignment length 100 nucleotides; overlap percentage 99%), we obtained three new sequence data sets: RC japonica with 34,091 reads, RC indica with 33,888 reads, and RC nivara with 33,925 reads.

Assembly of the chloroplast genomeThe de novo assembly of the chloroplast sequence of AM356‐8 was obtained with Newbler software using default parameters. The assembler generated two contigs: a larger one of 101,363 bp in length and a shorter one of 12,637 bp, with a total length of 114 kb corresponding to 85% of the expected length of the chloroplast genome. Because Newbler collapses repeated sequences, we interpreted this difference as the result of collapsing the two IR regions into only one sequence. To confirm this, the two contigs were aligned against the O. Sativa subsp. Japonica chloroplast genome with BLAST/ACT (Artemis Comparison Tool; ). The alignment of the two contigs against the O.

Sativa subsp. Japonica chloroplast genome confirmed this interpretation because the longer contig had a very high sequence identity to the LSC region and the inverted repeat, whereas the shorter contig showed high similarity with the SSC region. Search for divergent regions among public reference genomes and AM356‐8 chloroplast in silico classificationWe identified 47 indels and 111 SNPs among the four Oryza public chloroplast genomes. All variable sites with the exception of one of the indels were found in the single copy regions. This spatial distribution of SNPs and indels is congruent with the divergence rates reported for chloroplast genomes, as previous studies showed that the IR region has a slower rate of divergence than the LSC and SSC regions (; ). This observation is in line with the well‐established concept that evolutionarily divergent regions exhibit higher intraspecific variability; the link between both levels of variability is given by the degree of functional constraints (less constrained regions evolve faster and have higher polymorphism levels, see for instance ).

Indels showed the following interspecific distribution: 14 indels were exclusive to the O. Sativa subsp. Japonica chloroplast genome, 12 to O. Rufipogon, seven to O. Sativa subsp.

Indica, and nine to O. Three indels were shared between O. Sativa subsp. Japonica and O. Rufipogon, and another three were shared between O.

Sativa subsp. Indica and O. We did not identify indels shared between both O. Sativa subspecies, nor between the two wild species.Indels encompassing at least two nucleotides were searched for their presence in the data set used for the AM356‐8 chloroplast assembly (RC japonica set). Eight of these indels were identified in the AM356‐8 chloroplast DNA, among which three correspond to those shared by the O.

Sativa subsp. Japonica and O. Rufipogon chloroplast sequences and the remaining five were only shared with the O. Sativa subsp. Japonica chloroplast sequence.

Indel position (kb)Variant genomeAM356‐8 (no. Of reads)Length (bp)Variable sequenceAnnotation8japonica/ rufipogon2369GAATCCTATTTTTGTTCTTATACCCATGCAATAGAGAGGAGTGGGAAAAGGGAGGTTACTTTTTTTCANonannotated predicted ORF12japonica144AGGGIntergenic14japonica12ACIntergenic46japonica85TATATIntergenic57japonica/ rufipogon1016TTTTTTAGAATACTAAIntergenic60japonica325deletion: TATTGIntergenic65japonica232TTIntergenic77japonica/ rufipogon433deletion: TGGIntergenic.