
Vinayak S , BS-MS Student , IISER Berhampur
Next Generation Sequencing to Predict the Parasitic Nature of Apple scab pathogen: Venturia inaequalis
Venturia inaequalis is a fungal pathogen that affects members of the family Rosaceae. It causes scab diseases on apples, which is a route to great economic loss. The pathogen produces brown or black lesions on the surface of the leaves, buds and fruits. It does not kill the plant but reduces the quality and yield. The presence of these lesions make the fruit less marketable, so a solution is required. However, it was still highly debatable whether V. inaequalis feed on the living apple (biotrophs) or dead apple (necrotrophs) or both (hemi-biotrophs). By sequencing the genome of the apple scab fungus, it might be possible to create a genetically modified fungus which is less infectious,, and to identify the nature of the deadly apple pathogen. Genome sequencing of V. inaequalis has been previously carried out[1]; however, there is a need for a better, more accurate construction of the genome having larger size, lesser fragments, more repeat elements and better functional annotation. Therefore, we have carried out the V. inaequalis genome sequencing using high end third generation (reading nucleotides at a single molecule level) sequencing, PacBio with the aim of improving the genome size, assembly and functional annotation of the deadly apple pathogen with respect to the genome that has been published.
MATERIALS AND METHODS
PacBio Single Molecule Real Time DNA sequencing technology, developed by Pacific Biosciences was considered with 75X coverage (more the coverage , better the quality of the genome) for the sequencing of the apple pathogen genome. Genome assembly is a process of reconstructing a genome from DNA sequences obtained from different sequencing platforms. Sequencing platforms are devices that work on various principles to give out the sequence of DNA/RNA in the form of data (raw reads). Assemblers are softwares that use various pipelines to produce assembled genomes. The efficiency of different assemblers can vary from species to species. Thus, it will be important to compare different genome assemblies produced by the assemblers. The assembly of the output obtained from PacBio (long reads , as they give output as long sequences) were tested on various assemblers and compared to obtain the best assembly of the genome. The contigs, which are fragments produced in the assembly process of various assemblers were further joined to form scaffolds using SSPACE[2]. SSPACE takes contigs from the assembler and the long reads from the sequencer, and converts the contigs into scaffolds. Scaffolds are fragments that are formed by further analysis and joining of contigs. Lesser number of scaffolds and contigs result in a more complete genome. It has been seen to give high N50 ( Minimum contig length needed to cover 50% of the genome ) values which represent the approximate length of a contig, and the number of scaffolds becomes almost half the number of contigs, giving rise to more continuous and complete genomes. After the assembly of the genome, the assembled genomes were evaluated for completeness using BUSCO (Benchmarking Universal Single-Copy.
Orthologs)[3]. BUSCO is used as a measure of quantitative assessment of completeness of a genome assembly and of proteins during annotation. Annotation refers to prediction of all the proteins that can be produced from the genes that are present in the genome. BUSCO marks genes as complete, duplicated, fragmented and missing by comparing with the data available in the database. A BUSCO score of above 95 for a genome represents a nearly complete genome. We then identify the locations of the genes, of all the coding regions, and the proteins they produce. It is an important step in sequencing as it gives meaning to the genome. We use FunGap (Fungal Genome Annotation Pipeline)[4] for the functional annotation of our fungal genome. The obtained proteins from the annotation pipeline were then used for downstream processing where it was examined for the presence of various enzymes and genes to establish the nature of its infection.
RESULTS
The genome assemblies obtained from different assemblers show that the assembly produced by Canu assembler[5] was better both in having larger size and having 564 contigs. Further scaffolding gave 356 scaffolds while the previously sequenced genome had 1012 scaffolds , showing that the genome we obtained is less fragmented and more complete than the one published. Evaluation of completeness by BUSCO showed that the assembly by Canu showed 98.2% completeness and was used for the functional annotation .
FunGap results showed that the functional annotation of the Canu assembly with 5 DPI (Days Post Inoculation) transcriptome (Assembly of all the mRNA that has been expressed) reads obtained from the CSIR-IHBT database[6] yielded 13,310 proteins with a BUSCO completeness of 99.3%. Table 1 represents the gene functions assigned according to the alignment to various databases. The SwissProt IDs of proteins were identified by aligning to Swissprot database using BLAST[7]. The motifs and domains of the proteins were identified using InterProScan[8] with public databases.
Proteins of phytopathogens viz., Botrytis cinerea , Colletotrichum higginsianum, Magnaporthe oryzae , Puccinia triticina , Pyrenophora tritici-repentis , Ustilago maydis were downloaded from the Joint Genome Institute (JGI)[9] and EnsemblFungi[10] for phylogenetic comparison. A phylogenetic tree was constructed from the output resulted using OrthoFinder[11] where orthologs (proteins of the same family found in different species) from the above mentioned species were considered for constructing phylogenetic tree with S.cerevisiae as an outgroup. Phylogenetic tree constructed by comparing with the phytopathogens revealed that V.inaequalis showed closer relationship with clade containing only one necrotroph, P.tritici-repentis. The other necrotroph (B. cinerea) was found to be in closer relationship with two hemi-biotrophs in the phylogenetic tree.
Thus, the finding suggested that V.inaequalis is likely a necrotroph. The analysis of the various protein groups like transporters and kinase groups also show that our species shows close similarity with the other necrotrophs. It can, hence, be concluded that our species is a necrotrophic pathogen.
References
1. Deng, Cecilia Hong et al. “Comparative analysis of the predicted secretomes of Rosaceae scab pathogens Venturia inaequalis and V. pirina reveals expanded effector families and putative determinants of host range.” BMC Genomics (2017).
2. Marten Boetzer, Christiaan V. Henkel, Hans J. Jansen, Derek Butler, Walter Pirovano; Scaffolding pre-assembled contigs using SSPACE, Bioinformatics, Volume 27, Issue 4, 15 February 2011, Pages 578–579
3. Simão FA, Waterhouse RM, Ioannidis P, et al. BUSCO: assessing genome assembly and annotation completeness with singlecopy orthologs. Bioinformatics 2015;31(19):3210–12
4. Byoungnam Min, Igor V. Grigoriev, In-Geol Choi; FunGAP: Fungal Genome Annotation Pipeline using evidence-based gene model evaluation, Bioinformatics, Volume 33, Issue 18, 15 September 2017, Pages 2936–2937, https://doi.org/10.1093/ bioinformatics/btx353
5. Berlin K, Koren S, Chin CS et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol 2015;33(6):623–30.
6. Thakur K, Chawla V, Bhatti S, Swarnkar MK, Kaur J, Shankar R, et al. (2013) De Novo Transcriptome Sequencing and Analysis for Venturia inaequalis, the Devastating Apple Scab Pathogen. PLoS ONE 8(1): e53937. https://doi.org/10.1371/ journal.pone.0053937
7. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) “Basic local alignment search tool.” J. Mol. Biol. 215:403-410.
8. Jones P, Binns D, Chang H-Y, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30(9):1236- 1240. doi:10.1093/bioinformatics/btu031.
9. Grigoriev IV, Nordberg H, Shabalov I, Aerts A, Cantor M, Goodstein D, Kuo A, Minovitsky S, Nikitin R, Ohm RA, Otillar R, Poliakov A, Ratnere I, Riley R, Smirnova T, Rokhsar D, Dubchak I.
10. Nucleic Acids Res. 2012 Jan;40(Database issue):D26-32.Joint Genome Institute : Genome portal
11. P.J. Kersey, J.E. Allen, A. Allot, M. Barba, S. Boddu, B.J. Bolt, D. Carvalho-Silva, M. Christensen, P. Davis, C. Grabmueller, N. Kumar, Z. Liu, T. Maurel, B. Moore, M. D. McDowall, U. Maheswari, G. Naamati, V. Newman, C.K. Ong, D.M. Bolser., N. De Silva, K.L. Howe, N. Langridge, G.Maslen,D.M.Staines,A.Yates.Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species Nucleic Acids Research 2018 46(D1)D802–D808
12. David M. Emms and Steven KellyEmail author ,Genome Biology,201516:157;OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accurac

