Robersy Sanchez

Department of Biology. Eberly College of Science.

Pennsylvania State University, University Park, PA 16802

genomicmath@gmail.com

ORCID: orcid.org/0000-0002-5246-1453

## Overview

This repository contains a ‘beta’ version of the GenomAutomorphism R package.

This is a R package to compute the automorphisms between pairwise aligned DNA sequences represented as elements from a Genomic Abelian group as described in the paper Genomic Abelian Finite Groups. In a general scenario, whole chromosomes or genomic regions from a population (from any species or close related species) can be algebraically represented as a direct sum of cyclic groups or more specifically Abelian *p*-groups. Basically, we propose the representation of multiple sequence alignments (MSA) of length *N* as a finite Abelian group created by the direct sum of homocyclic Abelian group of *prime-power order*:

*G* = (ℤ_{p1α1})^{n1} ⊕ (ℤ_{p1α2})^{n2} ⊕ … ⊕ (ℤ_{pkαk})^{nk}

Where, the *p*_{i}’s are prime numbers, *α*_{i} ∈ ℕ and ℤ_{piαi} is the group of integer modulo *p*_{i}^{αi}.

For the purpose of automorphism between two aligned DNA sequences, *p*_{i}^{αi} ∈ {5, 2^{6}, 5^{3}}.

## Status

This application is currently available in Bioconductor (version 3.18) https://doi.org/doi:10.18129/B9.bioc.GenomAutomorphism. Watch this repo or check for updates.

## Tutorials

There are several tutorials on how to use the package at GenomAutomorphism website

- Get started-with GenomAutomorphism
- Analysis of Automorphisms on a DNA Multiple Sequence Alignment
- Analysis of Automorphisms on a MSA of Primate BRCA1 Gene
- A Short Introduction to Algebraic Taxonomy on Genes Regions
- Automorphism analysis on COVID-19 data
- Modular Matrix Operations of Mutational Events

## Installation of R dependencies:

```
if (!requireNamespace("BiocManager")) install.packages("BiocManager")
BiocManager::install(c("Biostrings", "GenomicRanges", "S4Vectors",
"BiocParallel", "GenomeInfoDb", "BiocGenerics", "numbers", "devtools",
"doParallel", "data.table", "foreach","parallel"), dependencies = TRUE)
```

# References

Sanchez R, Morgado E, Grau R. Gene algebra from a genetic code algebraic structure. J Math Biol. 2005 Oct;51(4):431-57. doi: 10.1007/s00285-005-0332-8. Epub 2005 Jul 13. PMID: 16012800. ( PDF).

Sanchez R, Grau R, Morgado E. A novel Lie algebra of the genetic code over the Galois field of four DNA bases. Math Biosci. 2006;202: 156–174. doi:10.1016/j.mbs.2006.03.017

Sanchez R, Grau R. An algebraic hypothesis about the primeval genetic code architecture. Math Biosci. 2009/07/18. 2009;221: 60–76. doi:10.1016/j.mbs.2009.07.001

Robersy Sanchez, Jesús Barreto (2021) Genomic Abelian Finite Groups. doi: 10.1101/2021.06.01.446543.

M. V José, E.R. Morgado, R. Sánchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119–152.PDF.

R. Sanchez. Symmetric Group of the Genetic–Code Cubes. Effect of the Genetic–Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560. PDF.

Sanchez, R., 2014. Evolutionary Analysis of DNA-protein-coding regions based on a genetic code cube metric. Current Topics in Medicinal Chemistry, 14(3), pp.407-417. https://doi.org/10.2174/1568026613666131204110022.