Genomic Sequencing
Sequencing the first human genome took 13 years to complete and cost 3 billion dollars, making use of Sanger Sequencing, a the chain-termination sequencing method visualised by gel electrophoresis. Since then, Sanger sequencing has been automated for high throughput using capillary electrophoresis and is massively parallel.
Second-generation sequencing methods, more commonly referred to as Next-Generation Sequencing (NGS), offer affordable, rapid, massively parallel, comprehensive sequencing of large genomes. These advances have resulted in the ability to sequence a human genome in one day for under $1000
There are four broad classes of NGS:
-
Sequencing by synthesis; simultaneously identifies fluorescent DNA bases added to nucleic acid chains.
-
Pyrosequencing; detects light released upon the incorporation of DNA bases to nucleic acid chains.
-
Sequencing by ligation; uses sequential ligation of short oligonucleotides to sequence short reads.
-
Ion Torrent; measures the direct release of H+ during the incorporation of nucleotides.
All four techniques generate short reads assembled using overlapping regions. Every area of the genome is sequenced multiple times, described as the depth of coverage. With good quality coverage, it is possible to identify individual nucleotides that differ in newly sequenced genomes. Using NGS, the whole genome is sequenced or just areas of interest like the exomes or RNA.
Data Analysis Workflows
NGS data is produced as either FASTQ of CSFASTA. These files contain sequence reads with quality values associated with each base. The reads need to be aligned and mapped to a reference genome, if available or assembled de novo. Modern algorithms are faster than the traditional sequence alignment algorithms. Once aligned the quality of the reads and depth of coverage is assessed. If the quality of the data is sufficient, then variants that differ from the reference genome may be identified. The variants are then annotated based on existing knowledge and visualised using genome browsers.
Next-Generation Sequencing analysis consists of multiple steps, to ensure that data processing is consistent, H3A BioNet has compiled several robust workflows from trusted tools.
Bioinformaticians are available through DIPLOMICS to assist you with your project.
The earlier you contact them, the more assistance they will be able to offer. Omics research is costly, choosing the most appropriate technology for your experiment, and budget is, therefore, crucial. In particular, it is essential to discuss the experimental design with a statistician to ensure your experiment has sufficient statistical power.
It is advisable to run a pilot study and have an expert check the quality of the results before continuing with the bulk of the analysis. The pilot project will familiarise you with the sample processing, data generated, and data processing, before embarking on the main project. Issues identified at this point can improve the quality of the data generated thereby making better use of the funds spent.
Omics technologies produce masses of data and require expertise for processing. Fortunately, Bioinformatics tools and resources are available to store and process omics data. Contact our team of experts bioinformaticians for assistance on all levels of your project. The earlier, the better.