Skip to contents

Given a string denoting a codon or base from the DNA (or RNA) alphabet and a genetic-code Abelian group as given in reference (1).

Usage

base_coord(base = NULL, filepath = NULL, cube = "ACGT", group = "Z4", ...)

# S4 method for DNAStringSet_OR_NULL
base_coord(
  base = NULL,
  filepath = NULL,
  cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
    "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
    "CTGA", "GACT", "GCAT", "TACG", "TCAG"),
  group = c("Z4", "Z5"),
  start = NA,
  end = NA,
  chr = 1L,
  strand = "+"
)

Arguments

base

An object from a DNAStringSet or DNAMultipleAlignment class carrying the DNA pairwise alignment of two sequences.

filepath

A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided.

cube

A character string denoting one of the 24 Genetic-code cubes, as given in references (2 2 3).

group

A character string denoting the group representation for the given base or codon as shown in reference (1).

...

Not in use.

start, end, chr, strand

Optional parameters required to build a GRanges-class. If not provided the default values given for the function definition will be used.

Value

A BaseGroup-class object.

Details

Symbols "-" and "N" usually found in DNA sequence alignments to denote gaps and missing/unknown bases are represented by the number: '-1' on Z4 and '0' on Z5. In Z64 the symbol 'NA' will be returned for codons including symbols "-" and "N".

This function returns a BaseGroup object carrying the DNA sequence(s) and their respective coordinates in the requested Abelian group of base representation (one-dimension, "Z4" or "Z5"). Observe that to get coordinates in the set of of integer numbers ("Z") is also possible but they are not defined to integrate a Abelian group. These are just used for the further insertion the codon set in the 3D space (R^3).

References

  1. Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi:10.1101/2021.06.01.446543

  2. M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.

  3. R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.

Author

Robersy Sanchez https://genomaths.com

Examples

## Example 1. Let's get the base coordinates for codons "ACG"
## and "TGC":
x0 <- c("ACG", "TGC")
x1 <- DNAStringSet(x0)
x1
#> DNAStringSet object of length 2:
#>     width seq
#> [1]     3 ACG
#> [2]     3 TGC

## Get the base coordinates on cube = "ACGT" on the Abelian group = "Z4"
base_coord(x1, cube = "ACGT", group = "Z4")
#> BaseGroup object with 3 ranges and 4 metadata columns:
#>       seqnames    ranges strand |        seq1        seq2    coord1    coord2
#>          <Rle> <IRanges>  <Rle> | <character> <character> <numeric> <numeric>
#>   [1]        1         1      + |           A           T         0         3
#>   [2]        1         2      + |           C           G         1         2
#>   [3]        1         3      + |           G           C         2         1
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

## Example 2. Load a pairwise alignment
data(aln, package = "GenomAutomorphism")
aln
#> DNAStringSet object of length 2:
#>     width seq
#> [1]    51 ACCTATGTTGGTATT---GCGCTCCAACTCCTTGGCTCTAGCTCACTACAT
#> [2]    51 ATCTATGTTGGTATTACGACGCTCCAATTCCTTGGGTCC------CTCCTT

## DNA base representation in the Abelian group Z4
bs_cor <- base_coord(
    base = aln,
    cube = "ACGT"
)
bs_cor
#> BaseGroup object with 51 ranges and 4 metadata columns:
#>        seqnames    ranges strand |        seq1        seq2    coord1    coord2
#>           <Rle> <IRanges>  <Rle> | <character> <character> <numeric> <numeric>
#>    [1]        1         1      + |           A           A         0         0
#>    [2]        1         2      + |           C           T         1         3
#>    [3]        1         3      + |           C           C         1         1
#>    [4]        1         4      + |           T           T         3         3
#>    [5]        1         5      + |           A           A         0         0
#>    ...      ...       ...    ... .         ...         ...       ...       ...
#>   [47]        1        47      + |           T           T         3         3
#>   [48]        1        48      + |           A           C         0         1
#>   [49]        1        49      + |           C           C         1         1
#>   [50]        1        50      + |           A           T         0         3
#>   [51]        1        51      + |           T           T         3         3
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

## Example 3. DNA base representation in the Abelian group Z5
bs_cor <- base_coord(
    base = aln,
    cube = "ACGT",
    group = "Z5"
)
bs_cor
#> BaseGroup object with 51 ranges and 4 metadata columns:
#>        seqnames    ranges strand |        seq1        seq2    coord1    coord2
#>           <Rle> <IRanges>  <Rle> | <character> <character> <numeric> <numeric>
#>    [1]        1         1      + |           A           A         1         1
#>    [2]        1         2      + |           C           T         2         4
#>    [3]        1         3      + |           C           C         2         2
#>    [4]        1         4      + |           T           T         4         4
#>    [5]        1         5      + |           A           A         1         1
#>    ...      ...       ...    ... .         ...         ...       ...       ...
#>   [47]        1        47      + |           T           T         4         4
#>   [48]        1        48      + |           A           C         1         2
#>   [49]        1        49      + |           C           C         2         2
#>   [50]        1        50      + |           A           T         1         4
#>   [51]        1        51      + |           T           T         4         4
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths