Codon coordinates on a given a given Abelian group representation.
Source:R/codon_coord.R
codon_coord.Rd
Given a string denoting a codon or base from the DNA (or RNA) alphabet and a genetic-code Abelian group as given in reference (1).
Usage
codon_coord(codon = NULL, ...)
# S4 method for BaseGroup
codon_coord(codon, group = NULL)
# S4 method for DNAStringSet_OR_NULL
codon_coord(
codon = NULL,
filepath = NULL,
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3"),
start = NA,
end = NA,
chr = 1L,
strand = "+"
)
# S4 method for matrix_OR_data_frame
codon_coord(
codon,
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
group = c("Z64", "Z125", "Z4^3", "Z5^3")
)
Arguments
- codon
An object from
BaseGroup-class
(generated with functionbase_coord
),DNAStringSet
or fromDNAMultipleAlignment
class carrying the DNA pairwise alignment of two sequences.- ...
Not in use.
- group
A character string denoting the group representation for the given base or codon as shown in reference (2-3).
- filepath
A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided.
- cube
A character string denoting one of the 24 Genetic-code cubes, as given in references (2-3).
- start, end, chr, strand
Optional parameters required to build a
GRanges-class
. If not provided the default values given for the function definition will be used.
Value
A CodonGroup-class
object.
Details
Symbols "-" and "N" usually found in DNA sequence alignments to denote gaps and missing/unknown bases are represented by the number: '-1' on Z4 and '0' on Z5. In Z64 the symbol 'NA' will be returned for codons including symbols "-" and "N".
This function returns a GRanges-class
object
carrying the codon sequence(s) and their respective coordinates in the
requested Abelian group or simply, when group = 'Z5^3'
3D-coordinates, which are derive from Z5 as indicated in reference (3).
Notice that the coordinates can be 3D or just one-dimension ("Z64" or
"Z125"). Hence, the pairwise alignment provided in argument
codon must correspond to codon sequences.
References
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi: 10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.
Author
Robersy Sanchez https://genomaths.com
Examples
## Load a pairwise alignment
data("aln", package = "GenomAutomorphism")
aln
#> DNAStringSet object of length 2:
#> width seq
#> [1] 51 ACCTATGTTGGTATT---GCGCTCCAACTCCTTGGCTCTAGCTCACTACAT
#> [2] 51 ATCTATGTTGGTATTACGACGCTCCAATTCCTTGGGTCC------CTCCTT
## DNA base representation in the Abelian group Z5
bs_cor <- codon_coord(
codon = aln,
cube = "ACGT",
group = "Z5"
)
bs_cor ## 3-D coordinates
#> CodonGroup object with 17 ranges and 4 metadata columns:
#> seqnames ranges strand | seq1 seq2 coord1
#> <Rle> <IRanges> <Rle> | <character> <character> <character>
#> [1] 1 1 + | ACC ATC 1,2,2
#> [2] 1 2 + | TAT TAT 4,1,4
#> [3] 1 3 + | GTT GTT 3,4,4
#> [4] 1 4 + | GGT GGT 3,3,4
#> [5] 1 5 + | ATT ATT 1,4,4
#> ... ... ... ... . ... ... ...
#> [13] 1 13 + | TCT TCC 4,2,4
#> [14] 1 14 + | AGC --- 1,3,2
#> [15] 1 15 + | TCA --- 4,2,1
#> [16] 1 16 + | CTA CTC 2,4,1
#> [17] 1 17 + | CAT CTT 2,1,4
#> coord2
#> <character>
#> [1] 1,4,2
#> [2] 4,1,4
#> [3] 3,4,4
#> [4] 3,3,4
#> [5] 1,4,4
#> ... ...
#> [13] 4,2,2
#> [14] 0,0,0
#> [15] 0,0,0
#> [16] 2,4,2
#> [17] 2,4,4
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## DNA base representation in the Abelian group Z64
bs_cor <- codon_coord(
codon = aln,
cube = "ACGT",
group = "Z64"
)
bs_cor
#> CodonGroup object with 17 ranges and 4 metadata columns:
#> seqnames ranges strand | seq1 seq2 coord1
#> <Rle> <IRanges> <Rle> | <character> <character> <character>
#> [1] 1 1 + | ACC ATC 17
#> [2] 1 2 + | TAT TAT 15
#> [3] 1 3 + | GTT GTT 59
#> [4] 1 4 + | GGT GGT 43
#> [5] 1 5 + | ATT ATT 51
#> ... ... ... ... . ... ... ...
#> [13] 1 13 + | TCT TCC 31
#> [14] 1 14 + | AGC --- 33
#> [15] 1 15 + | TCA --- 28
#> [16] 1 16 + | CTA CTC 52
#> [17] 1 17 + | CAT CTT 7
#> coord2
#> <character>
#> [1] 49
#> [2] 15
#> [3] 59
#> [4] 43
#> [5] 51
#> ... ...
#> [13] 29
#> [14] <NA>
#> [15] <NA>
#> [16] 53
#> [17] 55
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Giving a matrix of codons
codon_coord(base2codon(x = aln))
#> seq1 seq2 coord1 coord2
#> 1 ACC ATC 17 49
#> 2 TAT TAT 15 15
#> 3 GTT GTT 59 59
#> 4 GGT GGT 43 43
#> 5 ATT ATT 51 51
#> 6 --- ACG NA 18
#> 7 GCG ACG 26 18
#> 8 CTC CTC 53 53
#> 9 CAA CAA 4 4
#> 10 CTC TTC 53 61
#> 11 CTT CTT 55 55
#> 12 GGC GGG 41 42
#> 13 TCT TCC 31 29
#> 14 AGC --- 33 NA
#> 15 TCA --- 28 NA
#> 16 CTA CTC 52 53
#> 17 CAT CTT 7 55