Bisulfite Oligonucleotide Capture Sequencing (BOCS)

Description

Epigenetic regulation through DNA methylation (5mC) plays an important role in development, as well as a variety of diseases and the aging process. We have developed an oligonucleotide-based capture approach to interrogate human, mouse and rat 5mC levels in genomic regions of interest with base- and strand-specific absolute 5mC quantitation in CG and non-CG contexts. Probesets for human, mouse and rat analyze promoters of all RefSeq/MGC genes and all CG islands, shores and shelves not in repeat regions are available. Targeting these specific regions reduces the amount of sequencing required by ~85% as compared to whole genome bisulfite sequencing and allows for increased sequencing depth at these regions, thereby increasing quantitative accuracy.  Advantages of this approach include: focusing sequencing reads on targeted regions; enabling analysis of more samples by decreasing the amount of sequence required per sample; and fully utilizing the strengths of bisulfite sequencing, namely, base and strand-specific absolute quantitation of CG and non-CG methylation levels.  

Our approach is analogous to exome sequencing for genomics in that selected regions of the genome are captured with thousands of oligonucleotide probes.  The rationale for using a capture approach is that quantitative accuracy of bisulfite DNA sequencing requires high levels of coverage depth (the number of times a specific C is sequenced), with ≥20X coverage being optimal. 

Figure 1. BOCS analysis. A) Definitions of the genomics regions targeted in the genome-wide analysis. B) BOCS uses oligonucleotide capture to isolate the gene regulatory regions of interest (100-200Mb of genome) sequencing library generation

Experimental Methods

Genomic DNA is used to generate whole genome bisulfite-converted libraries (KAPA Biosystems) for enrichment using hybridization to oligonucleotide probes for regions of interest (Roche Nimblegen/Agilent Technologies).  One µg of each DNA sample is sonicated to an average peak size of 200bp using the Covaris M220. Average fragment size is confirmed by capillary electrophoresis (DNA 1000, Agilent Bioanalyzer, Agilent). Libraries with methylated adaptors are created through blunt end repair, A-tailing, and adapter ligation. After library clean-up and size selection, libraries are bisulfite converted using the EZ DNA Methylation–Lightning method. Bisulfite-converted libraries are then amplified and purified using AMPure XP beads. Libraries are then solution hybridized to oligonucleotide probes for ~72 hours at 47° C in a thermocycler with a heated lid. After hybridization, captured libraries are washed and isolated using magnetic beads. After isolation, bisulfite-converted and captured libraries are amplified and cleaned using AMPure XP beads. Size and concentrations of libraries are determined using a DNA 1000 chip on the Bioanalyzer (Agilent) and KAPA library quantification qPCR, respectively. Quantified libraries are sequenced on a HiSeq 2500 in a paired-end fashion.

FASTQ files from the sequencing run are assessed for quality metrics with FastQC and then trimmed for quality with CLC Genomics Workbench.  Trimmed reads are aligned and mapped with Bismark/Bowtie version adjusting alignment parameters for the highest sensitivity and against both strands.  Data analysis and file management utilizes NGSUtils, samtools, and bedtools.  

Data provided to core users include the FASTQ sequencing files and data files with location and level of each cytosine.  

Sample Preparation Guidelines

Total genomic DNA (both nuclear and mitochondrial) should be extracted from tissues using a standard column preparation (e.g., Qiagen AllPrep DNA/RNA) or using solution (Trizol/TriReagent) methods.  Core users should prepare their own DNA according their experience as isolation techniques vary according to sample type (tissues, cells, blood). DNA quality should be determined by spectrophotometry with attention to organic contamination (absorbance at 230nm) as high levels of organic contamination can inhibit PCR amplification. >2-10ug DNA per sample is the preferred starting amount.