Replace bases with integers from Z4 and Z5

A simple function to represent DNA bases as elements from the Abelian group of integers modulo 4 (Z4), 5 (Z5), or 2 (Z2).

Usage

base2int(base, ...)

# S4 method for class 'character'
base2int(
  base,
  group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3", "Z2"),
  cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
    "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
    "CTGA", "GACT", "GCAT", "TACG", "TCAG"),
  phychem = list(A = NULL, T = NULL, C = NULL, G = NULL, N = NULL)
)

# S4 method for class 'data.frame'
base2int(
  base,
  group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3", "Z2"),
  cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
    "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
    "CTGA", "GACT", "GCAT", "TACG", "TCAG"),
  phychem = list(A = NULL, T = NULL, C = NULL, G = NULL, N = NULL)
)

Arguments

base

A character vector, string , or a dataframe of letters from the DNA/RNA alphabet.

...

Not in use.

group

A character string denoting the group representation for the given base or codon as shown in reference (2-3).

cube

A character string denoting one of the 24 Genetic-code cubes, as given in references (2-3).

phychem

Optional. Eventually, it could be useful to represent DNA bases by numerical values of measured physicochemical properties. If provided, then this argument must be a named numerical list. For example, the scale values of deoxyribonucleic acids proton affinity (available at https://www.wolframalpha.com/ and in cell phone app: Wolfram Alpha):

\(list('A' = 0.87, 'C' = 0.88, 'T' = 0.82, 'G' = 0.89, 'N' = NA)\)

where symbol 'N' provide the value for any letter out of DNA base alphabet. In this example, we could write NA or 0 (see example section).

Value

A numerical vector.

Details

For Z2 (binary representation of DNA bases), the cube bases are represented in their order by: '00', '01', '10', and '11' (examples section).

References

Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi: 10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.

Author

Robersy Sanchez https://genomaths.com

Examples

## A triplet with a letter not from DNA/RNA alphabet
## 'NA' is introduced by coercion!
base2int("UDG")
#> [1]  3 NA  2

## The base replacement in cube "ACGT and group "Z4"
base2int("ACGT")
#> [1] 0 1 2 3

## The base replacement in cube "ACGT and group "Z5"
base2int("ACGT", group = "Z5")
#> [1] 1 2 3 4

## A vector of DNA base triplets
base2int(c("UTG", "GTA"))
#>      [,1] [,2] [,3]
#> [1,]    3    3    2
#> [2,]    2    3    0

##  A vector of DNA base triplets with different number of triplets.
##  Codon 'GTA' is recycled!
base2int(base = c("UTGGTA", "CGA"), group = "Z5")
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    4    4    3    3    4    1
#> [2,]    2    3    1    2    3    1

## Data frames 

base2int(data.frame(x1 = c("UTG", "GTA"), x2 = c("UTG", "GTA")))
#>      x1 x1 x1 x2 x2 x2
#> [1,]  3  2  3  2  0  3
#> [2,]  3  3  2  3  2  0


## Cube bases are represented n their order by: '00', '01', '10', and '11',
## For example for cube = "ACGT" we have mapping: A -> '00', C -> '01',
## G -> '11', and C -> '10'.

base2int("ACGT", group = "Z2", cube = "ACGT")
#> [1] 0 0 0 1 1 0 1 1