A simple function to represent DNA bases as elements from the Abelian group of integers modulo 4 (Z4), 5 (Z5), or 2 (Z2).
Usage
base2int(base, ...)
# S4 method for character
base2int(
base,
group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3", "Z2"),
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
phychem = list(A = NULL, T = NULL, C = NULL, G = NULL, N = NULL)
)
# S4 method for data.frame
base2int(
base,
group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3", "Z2"),
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
phychem = list(A = NULL, T = NULL, C = NULL, G = NULL, N = NULL)
)
Arguments
- base
A character vector, string , or a dataframe of letters from the DNA/RNA alphabet.
- ...
Not in use.
- group
A character string denoting the group representation for the given base or codon as shown in reference (2-3).
- cube
A character string denoting one of the 24 Genetic-code cubes, as given in references (2-3).
- phychem
Optional. Eventually, it could be useful to represent DNA bases by numerical values of measured physicochemical properties. If provided, then this argument must be a named numerical list. For example, the
scale
values of deoxyribonucleic acids proton affinity (available at https://www.wolframalpha.com/ and in cell phone app: Wolfram Alpha):\(list('A' = 0.87, 'C' = 0.88, 'T' = 0.82, 'G' = 0.89, 'N' = NA)\)
where symbol 'N' provide the value for any letter out of DNA base alphabet. In this example, we could write NA or 0 (see example section).
Details
For Z2 (binary representation of DNA bases), the cube bases are represented in their order by: '00', '01', '10', and '11' (examples section).
References
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi: 10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.
See also
base_coord, codon_coord, and dna_phychem.
Author
Robersy Sanchez https://genomaths.com
Examples
## A triplet with a letter not from DNA/RNA alphabet
## 'NA' is introduced by coercion!
base2int("UDG")
#> [1] 3 NA 2
## The base replacement in cube "ACGT and group "Z4"
base2int("ACGT")
#> [1] 0 1 2 3
## The base replacement in cube "ACGT and group "Z5"
base2int("ACGT", group = "Z5")
#> [1] 1 2 3 4
## A vector of DNA base triplets
base2int(c("UTG", "GTA"))
#> [,1] [,2] [,3]
#> [1,] 3 3 2
#> [2,] 2 3 0
## A vector of DNA base triplets with different number of triplets.
## Codon 'GTA' is recycled!
base2int(base = c("UTGGTA", "CGA"), group = "Z5")
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 4 4 3 3 4 1
#> [2,] 2 3 1 2 3 1
## Data frames
base2int(data.frame(x1 = c("UTG", "GTA"), x2 = c("UTG", "GTA")))
#> x1 x1 x1 x2 x2 x2
#> [1,] 3 2 3 2 0 3
#> [2,] 3 3 2 3 2 0
## Cube bases are represented n their order by: '00', '01', '10', and '11',
## For example for cube = "ACGT" we have mapping: A -> '00', C -> '01',
## G -> '11', and C -> '10'.
base2int("ACGT", group = "Z2", cube = "ACGT")
#> [1] 0 0 0 1 1 0 1 1