Skip to contents

This function build the coordinate matrix for each sequence from an aligned set of DNA codon sequences.

Usage

codon_matrix(base, ...)

# S4 method for BaseSeqMatrix
codon_matrix(base, num.cores = 1L, tasks = 0L, verbose = TRUE, ...)

# S4 method for DNAStringSet
codon_matrix(
  base,
  cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
    "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
    "CTGA", "GACT", "GCAT", "TACG", "TCAG"),
  group = c("Z4", "Z5"),
  num.cores = 1L,
  tasks = 0L,
  verbose = TRUE
)

# S4 method for DNAMultipleAlignment
codon_matrix(
  base,
  cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
    "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
    "CTGA", "GACT", "GCAT", "TACG", "TCAG"),
  group = c("Z4", "Z5"),
  num.cores = 1L,
  tasks = 0L,
  verbose = TRUE
)

Arguments

base

A DNAMultipleAlignment, a DNAStringSet, or a BaseSeqMatrix.

...

Not in use yet.

num.cores, tasks

Parameters for parallel computation using package BiocParallel-package: the number of cores to use, i.e. at most how many child processes will be run simultaneously (see bplapply and the number of tasks per job (only for Linux OS).

verbose

If TRUE, prints the function log to stdout

cube

A character string denoting one of the 24 Genetic-code cubes, as given in references (3-4).

group

A character string denoting the group representation for the given base or codon as shown in reference (3-4).

Value

A ListCodonMatrix class object with the codon coordinate on its metacolumns.

Details

The purpose of this function is making the codon coordinates from multiple sequence alignments (MSA) available for further downstream statistical analyses, like those reported in references (1) and (2).

References

  1. Lorenzo-Ginori, Juan V., Aníbal Rodríguez-Fuentes, Ricardo Grau Ábalo, and Robersy Sánchez Rodríguez. "Digital signal processing in the analysis of genomic sequences." Current Bioinformatics 4, no. 1 (2009): 28-40.

  2. Sanchez, Robersy. "Evolutionary analysis of DNA-protein-coding regions based on a genetic code cube metric." Current Topics in Medicinal Chemistry 14, no. 3 (2014): 407-417.

  3. Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi: 10.1101/2021.06.01.446543

  4. M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.

  5. R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.

Author

Robersy Sanchez https://genomaths.com

Examples

## Load the MSA of Primate BRCA1 DNA repair genes
data("brca1_aln")

## Get the DNAStringSet for the first 33 codons and apply 'codon_matrix'
brca1 <- unmasked(brca1_aln)
brca1 <- subseq(brca1, start = 1, end = 33)
codon_matrix(brca1)
#> ListCodonMatrix object of length: 11
#> Seq.Alias: codon.1 codon.2 codon.3 codon.4 codon.5 codon.6 ...
#> 
#> ------- 
#> $codon.1
#> CodonMatrix object with 3 ranges and 20 metadata columns:
#>       seqnames    ranges strand |        S1        S2        S3        S4
#>          <Rle> <IRanges>  <Rle> | <numeric> <numeric> <numeric> <numeric>
#>   [1]        1       1-3      + |         0         0         0         0
#>   [2]        1       1-3      + |         3         3         3         3
#>   [3]        1       1-3      + |         2         2         2         2
#>              S5        S6        S7        S8        S9       S10       S11
#>       <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#>   [1]         0         0         0         0         0         0         0
#>   [2]         3         3         3         3         3         3         3
#>   [3]         2         2         2         2         2         2         2
#>             S12       S13       S14       S15       S16       S17       S18
#>       <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#>   [1]         0         0         0         0         0         0         0
#>   [2]         3         3         3         3         3         3         3
#>   [3]         2         2         2         2         2         2         2
#>             S19       S20
#>       <numeric> <numeric>
#>   [1]         0         0
#>   [2]         3         3
#>   [3]         2         2
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> 
#> $codon.2
#> CodonMatrix object with 3 ranges and 20 metadata columns:
#>     seqnames    ranges strand |        S1        S2        S3        S4
#>        <Rle> <IRanges>  <Rle> | <numeric> <numeric> <numeric> <numeric>
#>   4        1       4-6      + |         2         2         2         2
#>   5        1       4-6      + |         0         0         0         0
#>   6        1       4-6      + |         3         3         3         3
#>            S5        S6        S7        S8        S9       S10       S11
#>     <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#>   4         2         2         2         2         2         2         2
#>   5         0         0         0         0         0         0         0
#>   6         3         3         3         3         3         3         3
#>           S12       S13       S14       S15       S16       S17       S18
#>     <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#>   4         2         2         2         2         2         2         2
#>   5         0         0         0         0         0         0         0
#>   6         3         3         3         3         3         3         3
#>           S19       S20
#>     <numeric> <numeric>
#>   4         2         2
#>   5         0         0
#>   6         3         3
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> ...
#> 
#> <9 more CodonMatrix element(s)>
#> Three slots: 'DataList', 'group', 'cube' & 'seq_alias'
#> ------- 

## Get back the alignment object and apply 'codon_matrix' gives us the 
## same result.
brca1 <- DNAMultipleAlignment(as.character(brca1))
codon_matrix(brca1)
#> ListCodonMatrix object of length: 11
#> Seq.Alias: codon.1 codon.2 codon.3 codon.4 codon.5 codon.6 ...
#> 
#> ------- 
#> $codon.1
#> CodonMatrix object with 3 ranges and 20 metadata columns:
#>       seqnames    ranges strand |        S1        S2        S3        S4
#>          <Rle> <IRanges>  <Rle> | <numeric> <numeric> <numeric> <numeric>
#>   [1]        1       1-3      + |         0         0         0         0
#>   [2]        1       1-3      + |         3         3         3         3
#>   [3]        1       1-3      + |         2         2         2         2
#>              S5        S6        S7        S8        S9       S10       S11
#>       <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#>   [1]         0         0         0         0         0         0         0
#>   [2]         3         3         3         3         3         3         3
#>   [3]         2         2         2         2         2         2         2
#>             S12       S13       S14       S15       S16       S17       S18
#>       <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#>   [1]         0         0         0         0         0         0         0
#>   [2]         3         3         3         3         3         3         3
#>   [3]         2         2         2         2         2         2         2
#>             S19       S20
#>       <numeric> <numeric>
#>   [1]         0         0
#>   [2]         3         3
#>   [3]         2         2
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> 
#> $codon.2
#> CodonMatrix object with 3 ranges and 20 metadata columns:
#>     seqnames    ranges strand |        S1        S2        S3        S4
#>        <Rle> <IRanges>  <Rle> | <numeric> <numeric> <numeric> <numeric>
#>   4        1       4-6      + |         2         2         2         2
#>   5        1       4-6      + |         0         0         0         0
#>   6        1       4-6      + |         3         3         3         3
#>            S5        S6        S7        S8        S9       S10       S11
#>     <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#>   4         2         2         2         2         2         2         2
#>   5         0         0         0         0         0         0         0
#>   6         3         3         3         3         3         3         3
#>           S12       S13       S14       S15       S16       S17       S18
#>     <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#>   4         2         2         2         2         2         2         2
#>   5         0         0         0         0         0         0         0
#>   6         3         3         3         3         3         3         3
#>           S19       S20
#>     <numeric> <numeric>
#>   4         2         2
#>   5         0         0
#>   6         3         3
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> ...
#> 
#> <9 more CodonMatrix element(s)>
#> Three slots: 'DataList', 'group', 'cube' & 'seq_alias'
#> -------