Genomic Sequencing

Sequencing the first human genome took 13 years to complete and cost 3 billion dollars, making use of Sanger Sequencing, a the chain-termination sequencing method visualised by gel electrophoresis. Since then, Sanger sequencing has been automated for high throughput using capillary electrophoresis and is massively parallel.

Second-generation sequencing methods, more commonly referred to as Next-Generation Sequencing (NGS), offer affordable, rapid, massively parallel, comprehensive sequencing of large genomes. These advances have resulted in the ability to sequence a human genome in one day for under $1000


There are four broad classes of NGS:

  • Sequencing by synthesis; simultaneously identifies fluorescent DNA bases added to nucleic acid chains.

  • Pyrosequencing; detects light released upon the incorporation of DNA bases to nucleic acid chains. 

  • Sequencing by ligation; uses sequential ligation of short oligonucleotides to sequence short reads. 

  • Ion Torrent; measures the direct release of H+ during the incorporation of nucleotides.


All four techniques generate short reads assembled using overlapping regions. Every area of the genome is sequenced multiple times, described as the depth of coverage. With good quality coverage, it is possible to identify individual nucleotides that differ in newly sequenced genomes. Using NGS, the whole genome is sequenced or just areas of interest like the exomes or RNA.

Next-Generation Sequencers locally available

Thermo Fisher IonProton

ChIP sequencing
Exome sequencing
Gene expression sequencing
De novo sequencing
Small RNA sequencing
Whole transcriptome sequencing

illumina NextSeq

"The NextSeq 550 System brings the power of a high-throughput sequencing system to your benchtop. With tunable output and high data quality, it provides the flexible power you need for whole-genome, transcriptome, and targeted resequencing plus the ability to scan microarrays."

Thermo Fisher Ion S5

The Ion S5™ next-generation sequencing system enables a simple targeted sequencing workflow

illumina MiSeq

Small genome sequencing provides comprehensive analysis of microbial or viral genomes for public health, epidemiology, and disease studies. Sequence up to 24 small genomes per MiSeq run

illumina HiSeq

The HiSeq 2500 System is a powerful high-throughput sequencing system. High-quality data using proven Illumina SBS chemistry has made it the instrument of choice for major genome centers and research institutions throughout the world.


Data Analysis Workflows

NGS data is produced as either FASTQ of CSFASTA. These files contain sequence reads with quality values associated with each base. The reads need to be aligned and mapped to a reference genome, if available or assembled de novo. Modern algorithms are faster than the traditional sequence alignment algorithms. Once aligned the quality of the reads and depth of coverage is assessed. If the quality of the data is sufficient, then variants that differ from the reference genome may be identified. The variants are then annotated based on existing knowledge and visualised using genome browsers.

Next-Generation Sequencing analysis consists of multiple steps, to ensure that data processing is consistent, H3A BioNet has compiled several robust workflows from trusted tools.

Variant calling

Outlines the essential steps in calling short germline variants, and recommends tools that have gained community acceptance for this purpose

16s rDNA diversity analysis

16S rDNA diversity analysis of bacteria and archaea enables their identification and determination of their relative abundance

Genome Wide Association Studies

This the key workflow of the H3Africa deigned for bioinformaticians doing GWAS

Genome Analysis Toolkit

The primary focus of the toolkit is variant discovery and genotyping.

Bioinformaticians are available through DIPLOMICS to assist you with your project.

The earlier you contact them, the more assistance they will be able to offer. Omics research is costly, choosing the most appropriate technology for your experiment, and budget is, therefore, crucial. In particular, it is essential to discuss the experimental design with a statistician to ensure your experiment has sufficient statistical power.

It is advisable to run a pilot study and have an expert check the quality of the results before continuing with the bulk of the analysis. The pilot project will familiarise you with the sample processing, data generated, and data processing, before embarking on the main project. Issues identified at this point can improve the quality of the data generated thereby making better use of the funds spent.

Omics technologies produce masses of data and require expertise for processing. Fortunately, Bioinformatics tools and resources are available to store and process omics data. Contact our team of experts bioinformaticians for assistance on all levels of your project. The earlier, the better.

Dr Katie Lennard

Bioinformatician at the Institute of Infectious Diseases & Molecular Medicine


  • Genomics

  • Transcriptomics

  • Differential Abundance Statistics



  • 16S rRNA gene amplicon sequencing

  • WGS metagenomics sequencing

  • RNAseq

  • Pathogen isolate profiling


  • Multivariate analyses: PCA, NMDS, MDS, PERMANOVA, PLSDA, RDA

  • Machine learning techniques; Random forests

  • Statistical tools for differential abundance testing

  • Nextflow

  • edgeR

  • metagenomeSeq (R)


©2018 by SA-DIPLOMICS. Proudly created with