Modular Matrix Operations of Mutational Events
Robersy Sanchez
Department of Biology. Pennsylvania State University, University Park, PA 16802rus547@psu.edu
28 April 2024
Source:vignettes/Modular_Matrix_Operation_of_Mutational_Events.Rmd
Modular_Matrix_Operation_of_Mutational_Events.Rmd
Abstract
A fast introduction into the analysis of DNA mutational events as modular matrix operation, i.e., as automorphisms over a ring.
Introduction
In the Abelian p-group defined on \(\mathbb{Z}_{p^{\alpha_{i}}_i}\) the endomorphisms \(\eta_i \in End(\mathbb{Z}_{p^{\alpha_{i}}_i})\) are described as functions \(f(x)=kx\) mod \(p^{\alpha_{i}}_i\), where \(k\) and \(x\) are elements from the set of integers modulo \(p^{\alpha_{i}}_i\).
Next, we proceed to load the R package required libraries
Modular operation on homocyclic group
The mutational events transforming the codon sequence ATACCCATGGCCAAC into the sequence ACACCCATGACCAAC can be represented by means of automorphisms.
In the cube ACGT the codon sequence ATACCCATGGCCAAC is represented by the vector \((48,21,50,25,1) \in (\mathbb{Z}_{2^6})^5\), where \((\mathbb{Z}_{2^6})^5\) stands for the homocyclic Abelian group of vectors of integers modulo \(\mathbb{Z}_{2^6}\), while the sequence ACACCCATGACCAAC is represented by \((16,21,50,17,1)\).
To translate the DNA sequence into numerical vectors from \((\mathbb{Z}_{2^6})^5\) they are previously incorporated into a DNAStringSet object.
dna <- DNAStringSet(c('ATACCCATGGCCAAC', 'ACACCCATGACCAAC'))
dna
#> DNAStringSet object of length 2:
#> width seq
#> [1] 15 ATACCCATGGCCAAC
#> [2] 15 ACACCCATGACCAAC
cd <- codon_coord(dna, cube = "ACGT", group = "Z64")
cd
#> CodonGroup object with 5 ranges and 4 metadata columns:
#> seqnames ranges strand | seq1 seq2 coord1 coord2
#> <Rle> <IRanges> <Rle> | <character> <character> <character> <character>
#> [1] 1 1 + | ATA ACA 48 16
#> [2] 1 2 + | CCC CCC 21 21
#> [3] 1 3 + | ATG ATG 50 50
#> [4] 1 4 + | GCC ACC 25 17
#> [5] 1 5 + | AAC AAC 1 1
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
A list of coordinates is retrieved from object cd
cd <- coordList(get_coord(cd))
cd
#> $coord1
#> [1] 48 21 50 25 1
#>
#> $coord2
#> [1] 16 21 50 17 1
Homomorphism matrices
Most of the mutational events in small DNA segments can be represented by means homomorphisms with which, in turns, are presented by diagonal matrices. The diagonal elements of matrix a transforming coordinate elements from coord1 into coord2 are estimated as shown below
dg <- modlineq(cd$coord1, cd$coord2, 64)
dg
#> [1] 3 1 1 57 1
That is, we have the matrix:
diag(dg)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 3 0 0 0 0
#> [2,] 0 1 0 0 0
#> [3,] 0 0 1 0 0
#> [4,] 0 0 0 57 0
#> [5,] 0 0 0 0 1
The multiplication of vector coord1 by the matrix diag(dg) yields vector coord2:
(cd$coord1 %*% diag(dg)) %% 64
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 16 21 50 17 1
# Or
cat("\n---- \n")
#>
#> ----
(cd$coord1 %*% diag(dg)) %% 64 == cd$coord2
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] TRUE TRUE TRUE TRUE TRUE
dna2 <- DNAStringSet(c('GACAGAGCAGTATTAGCTTCACAC', 'GAAAAC---GTATTA---TCAAAG'))
dna2
#> DNAStringSet object of length 2:
#> width seq
#> [1] 24 GACAGAGCAGTATTAGCTTCACAC
#> [2] 24 GAAAAC---GTATTA---TCAAAG
cd2 <- codon_coord(dna2, cube = "ACGT", group = "Z64")
cd2
#> CodonGroup object with 8 ranges and 4 metadata columns:
#> seqnames ranges strand | seq1 seq2 coord1 coord2
#> <Rle> <IRanges> <Rle> | <character> <character> <character> <character>
#> [1] 1 1 + | GAC GAA 9 8
#> [2] 1 2 + | AGA AAC 32 1
#> [3] 1 3 + | GCA --- 24 <NA>
#> [4] 1 4 + | GTA GTA 56 56
#> [5] 1 5 + | TTA TTA 60 60
#> [6] 1 6 + | GCT --- 27 <NA>
#> [7] 1 7 + | TCA TCA 28 28
#> [8] 1 8 + | CAC AAG 5 2
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
cd2$coord2[ is.na(cd2$coord2) ] <- 0
cd2
#> CodonGroup object with 8 ranges and 4 metadata columns:
#> seqnames ranges strand | seq1 seq2 coord1 coord2
#> <Rle> <IRanges> <Rle> | <character> <character> <character> <character>
#> [1] 1 1 + | GAC GAA 9 8
#> [2] 1 2 + | AGA AAC 32 1
#> [3] 1 3 + | GCA --- 24 0
#> [4] 1 4 + | GTA GTA 56 56
#> [5] 1 5 + | TTA TTA 60 60
#> [6] 1 6 + | GCT --- 27 0
#> [7] 1 7 + | TCA TCA 28 28
#> [8] 1 8 + | CAC AAG 5 2
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
cd2 <- coordList(get_coord(cd2))
cd2
#> $coord1
#> [1] 9 32 24 56 60 27 28 5
#>
#> $coord2
#> [1] 8 1 0 56 60 0 28 2
Coordinates of seq2 can be obtained from seq1 solving the equation: \(seq1 \cdot x = seq2\) mod \(n\)
modulo <- c(64,125,64,64,64,64,64,64)
coord1 <- cd2$coord1
coord2 <- cd2$coord2
dg <- modlineq(coord1, coord2, modulo)
dg
#> [1] 8 43 0 1 1 0 1 26
Or in a form of diagonal matrix:
m <- diag(dg)
That is, the matrix multiplication below yields coordinates of seq2.
Or
Affine Transformations
Next, we search for the homomorphims transforming coordinates coord2 into coodinates coord1:
modulo <- c(64,125,64,64,64,64,64,64)
dg1 <- modlineq(a = coord2, b = coord1, n = modulo, no.sol = 0L)
dg1
#> $diag
#> [1] 0 32 0 1 1 0 1 0
#>
#> $translation
#> [1] 9 0 24 0 0 27 0 5
This is an affine transformation, which involves the sum of a homomorphism:
m1 <- diag(dg1$diag)
m1
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 0 32 0 0 0 0 0 0
#> [3,] 0 0 0 0 0 0 0 0
#> [4,] 0 0 0 1 0 0 0 0
#> [5,] 0 0 0 0 1 0 0 0
#> [6,] 0 0 0 0 0 0 0 0
#> [7,] 0 0 0 0 0 0 1 0
#> [8,] 0 0 0 0 0 0 0 0
Plus a translation. That is:
(coord2 %*% m1 + dg1$translation) %% modulo
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 9 32 24 56 60 27 28 5
# Or
cat("\n---- \n")
#>
#> ----
(coord2 %*% m1 + dg1$translation) %% modulo == coord1
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
However, the solution is not unique. For example, setting no.sol = 1, we have another solution
dg2 <- modlineq(a = coord2, b = coord1, n = modulo, no.sol = 1)
dg2
#> $diag
#> [1] 1 32 1 1 1 1 1 1
#>
#> $translation
#> [1] 1 0 24 0 0 27 0 3
m2 <- diag(dg2$diag)
(coord2 %*% m2 + dg2$translation) %% modulo
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 9 32 24 56 60 27 28 5
# Or
cat("\n---- \n")
#>
#> ----
(coord2 %*% m2 + dg2$translation) %% modulo == coord1
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Since the second element of seq2 has coordinate is 1, to build a homomorphism that transform coordinates from seq2 into coordinates from seq1, we can just sum the translation vector to the second column of matrix m2:
m3 <- m2
m3[, 2] <- (m2[, 2] + dg2$translation) %% modulo
m3
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1 1 0 0 0 0 0 0
#> [2,] 0 32 0 0 0 0 0 0
#> [3,] 0 24 1 0 0 0 0 0
#> [4,] 0 0 0 1 0 0 0 0
#> [5,] 0 0 0 0 1 0 0 0
#> [6,] 0 27 0 0 0 1 0 0
#> [7,] 0 0 0 0 0 0 1 0
#> [8,] 0 3 0 0 0 0 0 1
It is easy to see that:
(coord2 %*% t(m3)) %% modulo
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 9 32 24 56 60 27 28 5
Or, as a column vector: