Microsatellite Mining and Protein Dynamics of 43 Species of Alphapolyomavirus
Posted: 12 Feb 2020
Date Written: February 10, 2020
Microsatellites or Simple Sequence Repeats (SSRs) are 1-6 nucleotides long tandem repetitive genetic sequences, present across the genomes of the biosphere. They are responsible for the regulation of transcription and translation and used as a molecular marker for their polymorphic and codominant nature. The microsatellites conservancy and divergency in coding and non-coding regions make them ideal therapeutic agents and diagnostic tools which in terms leads to drug discovery. Alphapolyomavirus, member of the family Polyomaviridae, are small (diameter 40 -45 nm), icosahedral, nonenveloped, dsDNA virus and produce multiple tumours in its host. The genome is either circular or linear, approx. 5000 base pairs and encode two types of proteins that are early regulatory protein, expressed during infection and late structural protein, expressed after the onset of viral DNA replication. The regulatory proteins are large tumour antigen (LTAg), small tumour antigen (STAg), middle tumour antigen (MTAg), alternative tumour antigen (ATAg) and putative alternative large tumour antigen (PALTAg), plays a role in replication, transcription, maturation and egress. The structural proteins are composed of major capsid protein, VP1 and minor capsid proteins, VP2 and VP3, plays a pivotal role in capsid formation.
Whole-genome sequence of 43 species of Alphapolyomavirus which is listed in ICTV(https://talk.ictvonline.org/ictv-reports/ictv_online_report/dsdna-viruses/w/polyomaviridae) was retrieved from NCBI. SSRs extractions were carried out using the ‘Advanced – Mode’ of IMEx-webserver with a minimum repeat no. 6-3-3-3-3-3. Other parameters were set to the defaults. To explore the protein dynamics, we use MATLAB based tools IGLNNF for gene locations and IGLSF for distribution of SSRs across coding and non-coding regions. Microsoft Office Excel 2019 and Tableau 2018 were used for data presentation and visualization respectively.
The genome size ranged from 4752 bp (BM93) to 5387 bp (BM15) and an average value is 5145.11 bp. A total of 1315 SSRs were extracted from 2,21,240 bp of 43 species of Alphapolyomavirus. Maximum of 39 SSRs were present in BM27 whereas a minimum of 18 was present in BM21, exhibit much diversity as compare to genome size. The estimation of SSRs distribution across the genomes of alphapolyomavirus revealed that noncoding sequences accounted for 23.57% whereas coding sequences comprised of 76.43% of SSRs. Four protein-coding non-overlapping genes accounted for 82.49% of SSRs of which LTAg gene alone accounted over 473 SSRs whereas twelve protein-coding overlapping genes accounted for 17.51% of SSRs of which LTA/STA gene host maximum(74) SSRs.
The SSRs from 43 genomes of Alphapolyomavirus exhibit some interesting variations. Presence of SSRs in overlapping regions has a significant impact on gene regulation and protein synthesis. We believe that the data has the foundation of viral pathogenicity, host determination and genome evolution, which may be unearthed in due course of time, not only for viruses but other species as well.
Keywords: Microsatellite, Protein, Polyomaviridae, Simple and Sequence Repeat (SSR), Virus
JEL Classification: J10
Suggested Citation: Suggested Citation