Skip to contents

Given a string denoting a codon or base from the DNA (or RNA) alphabet and a genetic-code Abelian group as given in reference (1).

Usage

codon_coord(codon = NULL, ...)

# S4 method for BaseGroup
codon_coord(codon, group = NULL)

# S4 method for DNAStringSet_OR_NULL
codon_coord(
  codon = NULL,
  filepath = NULL,
  cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
    "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
    "CTGA", "GACT", "GCAT", "TACG", "TCAG"),
  group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3"),
  start = NA,
  end = NA,
  chr = 1L,
  strand = "+"
)

# S4 method for matrix_OR_data_frame
codon_coord(
  codon,
  cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
    "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
    "CTGA", "GACT", "GCAT", "TACG", "TCAG"),
  group = c("Z64", "Z125", "Z4^3", "Z5^3")
)

Arguments

codon

An object from BaseGroup-class (generated with function base_coord), DNAStringSet or from DNAMultipleAlignment class carrying the DNA pairwise alignment of two sequences.

...

Not in use.

group

A character string denoting the group representation for the given base or codon as shown in reference (2-3).

filepath

A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided.

cube

A character string denoting one of the 24 Genetic-code cubes, as given in references (2-3).

start, end, chr, strand

Optional parameters required to build a GRanges-class. If not provided the default values given for the function definition will be used.

Value

A CodonGroup-class object.

Details

Symbols "-" and "N" usually found in DNA sequence alignments to denote gaps and missing/unknown bases are represented by the number: '-1' on Z4 and '0' on Z5. In Z64 the symbol 'NA' will be returned for codons including symbols "-" and "N".

This function returns a GRanges-class object carrying the codon sequence(s) and their respective coordinates in the requested Abelian group or simply, when group = 'Z5^3' 3D-coordinates, which are derive from Z5 as indicated in reference (3). Notice that the coordinates can be 3D or just one-dimension ("Z64" or "Z125"). Hence, the pairwise alignment provided in argument codon must correspond to codon sequences.

References

  1. Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi: 10.1101/2021.06.01.446543

  2. M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.

  3. R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.

Author

Robersy Sanchez https://genomaths.com

Examples

## Load a pairwise alignment
data("aln", package = "GenomAutomorphism")
aln
#> DNAStringSet object of length 2:
#>     width seq
#> [1]    51 ACCTATGTTGGTATT---GCGCTCCAACTCCTTGGCTCTAGCTCACTACAT
#> [2]    51 ATCTATGTTGGTATTACGACGCTCCAATTCCTTGGGTCC------CTCCTT

## DNA base representation in the Abelian group Z5
bs_cor <- codon_coord(
    codon = aln,
    cube = "ACGT",
    group = "Z5"
)
bs_cor ## 3-D coordinates
#> CodonGroup object with 17 ranges and 4 metadata columns:
#>        seqnames    ranges strand |        seq1        seq2      coord1
#>           <Rle> <IRanges>  <Rle> | <character> <character> <character>
#>    [1]        1         1      + |         ACC         ATC       1,2,2
#>    [2]        1         2      + |         TAT         TAT       4,1,4
#>    [3]        1         3      + |         GTT         GTT       3,4,4
#>    [4]        1         4      + |         GGT         GGT       3,3,4
#>    [5]        1         5      + |         ATT         ATT       1,4,4
#>    ...      ...       ...    ... .         ...         ...         ...
#>   [13]        1        13      + |         TCT         TCC       4,2,4
#>   [14]        1        14      + |         AGC         ---       1,3,2
#>   [15]        1        15      + |         TCA         ---       4,2,1
#>   [16]        1        16      + |         CTA         CTC       2,4,1
#>   [17]        1        17      + |         CAT         CTT       2,1,4
#>             coord2
#>        <character>
#>    [1]       1,4,2
#>    [2]       4,1,4
#>    [3]       3,4,4
#>    [4]       3,3,4
#>    [5]       1,4,4
#>    ...         ...
#>   [13]       4,2,2
#>   [14]       0,0,0
#>   [15]       0,0,0
#>   [16]       2,4,2
#>   [17]       2,4,4
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths


## DNA base representation in the Abelian group Z64
bs_cor <- codon_coord(
    codon = aln,
    cube = "ACGT",
    group = "Z64"
)
bs_cor
#> CodonGroup object with 17 ranges and 4 metadata columns:
#>        seqnames    ranges strand |        seq1        seq2      coord1
#>           <Rle> <IRanges>  <Rle> | <character> <character> <character>
#>    [1]        1         1      + |         ACC         ATC          17
#>    [2]        1         2      + |         TAT         TAT          15
#>    [3]        1         3      + |         GTT         GTT          59
#>    [4]        1         4      + |         GGT         GGT          43
#>    [5]        1         5      + |         ATT         ATT          51
#>    ...      ...       ...    ... .         ...         ...         ...
#>   [13]        1        13      + |         TCT         TCC          31
#>   [14]        1        14      + |         AGC         ---          33
#>   [15]        1        15      + |         TCA         ---          28
#>   [16]        1        16      + |         CTA         CTC          52
#>   [17]        1        17      + |         CAT         CTT           7
#>             coord2
#>        <character>
#>    [1]          49
#>    [2]          15
#>    [3]          59
#>    [4]          43
#>    [5]          51
#>    ...         ...
#>   [13]          29
#>   [14]        <NA>
#>   [15]        <NA>
#>   [16]          53
#>   [17]          55
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

## Giving a matrix of codons
codon_coord(base2codon(x = aln))
#>    seq1 seq2 coord1 coord2
#> 1   ACC  ATC     17     49
#> 2   TAT  TAT     15     15
#> 3   GTT  GTT     59     59
#> 4   GGT  GGT     43     43
#> 5   ATT  ATT     51     51
#> 6   ---  ACG     NA     18
#> 7   GCG  ACG     26     18
#> 8   CTC  CTC     53     53
#> 9   CAA  CAA      4      4
#> 10  CTC  TTC     53     61
#> 11  CTT  CTT     55     55
#> 12  GGC  GGG     41     42
#> 13  TCT  TCC     31     29
#> 14  AGC  ---     33     NA
#> 15  TCA  ---     28     NA
#> 16  CTA  CTC     52     53
#> 17  CAT  CTT      7     55