This function build the coordinate matrix for each sequence from an aligned set of DNA codon sequences.
Usage
codon_matrix(base, ...)
# S4 method for BaseSeqMatrix
codon_matrix(base, num.cores = 1L, tasks = 0L, verbose = TRUE, ...)
# S4 method for DNAStringSet
codon_matrix(
base,
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
group = c("Z4", "Z5"),
num.cores = 1L,
tasks = 0L,
verbose = TRUE
)
# S4 method for DNAMultipleAlignment
codon_matrix(
base,
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
group = c("Z4", "Z5"),
num.cores = 1L,
tasks = 0L,
verbose = TRUE
)
Arguments
- base
A
DNAMultipleAlignment
, aDNAStringSet
, or a BaseSeqMatrix.- ...
Not in use yet.
- num.cores, tasks
Parameters for parallel computation using package
BiocParallel-package
: the number of cores to use, i.e. at most how many child processes will be run simultaneously (seebplapply
and the number of tasks per job (only for Linux OS).- verbose
If TRUE, prints the function log to stdout
- cube
A character string denoting one of the 24 Genetic-code cubes, as given in references (3-4).
- group
A character string denoting the group representation for the given base or codon as shown in reference (3-4).
Value
A ListCodonMatrix class object with the codon coordinate on its metacolumns.
Details
The purpose of this function is making the codon coordinates from multiple sequence alignments (MSA) available for further downstream statistical analyses, like those reported in references (1) and (2).
References
Lorenzo-Ginori, Juan V., Aníbal Rodríguez-Fuentes, Ricardo Grau Ábalo, and Robersy Sánchez Rodríguez. "Digital signal processing in the analysis of genomic sequences." Current Bioinformatics 4, no. 1 (2009): 28-40.
Sanchez, Robersy. "Evolutionary analysis of DNA-protein-coding regions based on a genetic code cube metric." Current Topics in Medicinal Chemistry 14, no. 3 (2014): 407-417.
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi: 10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.
See also
codon_coord, base_coord and base2int.
Author
Robersy Sanchez https://genomaths.com
Examples
## Load the MSA of Primate BRCA1 DNA repair genes
data("brca1_aln")
## Get the DNAStringSet for the first 33 codons and apply 'codon_matrix'
brca1 <- unmasked(brca1_aln)
brca1 <- subseq(brca1, start = 1, end = 33)
codon_matrix(brca1)
#> ListCodonMatrix object of length: 11
#> Seq.Alias: codon.1 codon.2 codon.3 codon.4 codon.5 codon.6 ...
#>
#> -------
#> $codon.1
#> CodonMatrix object with 3 ranges and 20 metadata columns:
#> seqnames ranges strand | S1 S2 S3 S4
#> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric>
#> [1] 1 1-3 + | 0 0 0 0
#> [2] 1 1-3 + | 3 3 3 3
#> [3] 1 1-3 + | 2 2 2 2
#> S5 S6 S7 S8 S9 S10 S11
#> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> [1] 0 0 0 0 0 0 0
#> [2] 3 3 3 3 3 3 3
#> [3] 2 2 2 2 2 2 2
#> S12 S13 S14 S15 S16 S17 S18
#> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> [1] 0 0 0 0 0 0 0
#> [2] 3 3 3 3 3 3 3
#> [3] 2 2 2 2 2 2 2
#> S19 S20
#> <numeric> <numeric>
#> [1] 0 0
#> [2] 3 3
#> [3] 2 2
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>
#>
#> $codon.2
#> CodonMatrix object with 3 ranges and 20 metadata columns:
#> seqnames ranges strand | S1 S2 S3 S4
#> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric>
#> 4 1 4-6 + | 2 2 2 2
#> 5 1 4-6 + | 0 0 0 0
#> 6 1 4-6 + | 3 3 3 3
#> S5 S6 S7 S8 S9 S10 S11
#> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> 4 2 2 2 2 2 2 2
#> 5 0 0 0 0 0 0 0
#> 6 3 3 3 3 3 3 3
#> S12 S13 S14 S15 S16 S17 S18
#> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> 4 2 2 2 2 2 2 2
#> 5 0 0 0 0 0 0 0
#> 6 3 3 3 3 3 3 3
#> S19 S20
#> <numeric> <numeric>
#> 4 2 2
#> 5 0 0
#> 6 3 3
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>
#> ...
#>
#> <9 more CodonMatrix element(s)>
#> Three slots: 'DataList', 'group', 'cube' & 'seq_alias'
#> -------
## Get back the alignment object and apply 'codon_matrix' gives us the
## same result.
brca1 <- DNAMultipleAlignment(as.character(brca1))
codon_matrix(brca1)
#> ListCodonMatrix object of length: 11
#> Seq.Alias: codon.1 codon.2 codon.3 codon.4 codon.5 codon.6 ...
#>
#> -------
#> $codon.1
#> CodonMatrix object with 3 ranges and 20 metadata columns:
#> seqnames ranges strand | S1 S2 S3 S4
#> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric>
#> [1] 1 1-3 + | 0 0 0 0
#> [2] 1 1-3 + | 3 3 3 3
#> [3] 1 1-3 + | 2 2 2 2
#> S5 S6 S7 S8 S9 S10 S11
#> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> [1] 0 0 0 0 0 0 0
#> [2] 3 3 3 3 3 3 3
#> [3] 2 2 2 2 2 2 2
#> S12 S13 S14 S15 S16 S17 S18
#> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> [1] 0 0 0 0 0 0 0
#> [2] 3 3 3 3 3 3 3
#> [3] 2 2 2 2 2 2 2
#> S19 S20
#> <numeric> <numeric>
#> [1] 0 0
#> [2] 3 3
#> [3] 2 2
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>
#>
#> $codon.2
#> CodonMatrix object with 3 ranges and 20 metadata columns:
#> seqnames ranges strand | S1 S2 S3 S4
#> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric>
#> 4 1 4-6 + | 2 2 2 2
#> 5 1 4-6 + | 0 0 0 0
#> 6 1 4-6 + | 3 3 3 3
#> S5 S6 S7 S8 S9 S10 S11
#> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> 4 2 2 2 2 2 2 2
#> 5 0 0 0 0 0 0 0
#> 6 3 3 3 3 3 3 3
#> S12 S13 S14 S15 S16 S17 S18
#> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> 4 2 2 2 2 2 2 2
#> 5 0 0 0 0 0 0 0
#> 6 3 3 3 3 3 3 3
#> S19 S20
#> <numeric> <numeric>
#> 4 2 2
#> 5 0 0
#> 6 3 3
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>
#> ...
#>
#> <9 more CodonMatrix element(s)>
#> Three slots: 'DataList', 'group', 'cube' & 'seq_alias'
#> -------