Skip to contents

Given two codon sequences represented in a given Abelian group, this function computes the automorphisms describing codon mutational events. Basically, this function is a wrapping to call the corresponding function for a specified Abelian group.

Usage

automorphisms(seqs = NULL, filepath = NULL, group = "Z4", ...)

# S4 method for DNAStringSet_OR_NULL
automorphisms(
  seqs = NULL,
  filepath = NULL,
  group = c("Z5", "Z64", "Z125", "Z5^3"),
  cube = c("ACGT", "TGCA"),
  cube_alt = c("CATG", "GTAC"),
  nms = NULL,
  start = NA,
  end = NA,
  chr = 1L,
  strand = "+",
  num.cores = multicoreWorkers(),
  tasks = 0L,
  verbose = TRUE
)

Arguments

seqs

An object from a DNAStringSet or DNAMultipleAlignment class carrying the DNA pairwise alignment of two sequences. The pairwise alignment provided in argument seq or the 'fasta' file filepath must correspond to codon sequences.

filepath

A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided.

group

A character string denoting the group representation for the given base or codon as shown in reference (1).

...

Not in use.

cube, cube_alt

A character string denoting pairs of the 24 Genetic-code cubes, as given in references (2-3). That is, the base pairs from the given cubes must be complementary each other. Such a cube pair are call \(dual cubes\) and, as shown in reference (3), each pair integrates group.

nms

Optional. Only used if the DNA sequence alignment provided carries more than two sequences. A character string giving short names for the alignments to be compared. If not given then the automorphisms between pairwise alignment are named as: 'aln_1', 'aln_2', and so on.

start, end, chr, strand

Optional parameters required to build a GRanges-class. If not provided the default values given for the function definition will be used.

num.cores, tasks

Parameters for parallel computation using package BiocParallel-package: the number of cores to use, i.e. at most how many child processes will be run simultaneously (see bplapply and the number of tasks per job (only for Linux OS).

verbose

If TRUE, prints the progress bar.

Value

This function returns a Automorphism-class object with four columns on its metacolumn named: seq1, seq2, autm, and cube.

Details

Herein, automorphisms are algebraic descriptions of mutational event observed in codon sequences represented on different Abelian groups. In particular, as described in references (3-4), for each representation of the codon set on a defined Abelian group there are 24 possible isomorphic Abelian groups. These Abelian groups can be labeled based on the DNA base-order used to generate them. The set of 24 Abelian groups can be described as a group isomorphic to the symmetric group of degree four (\(S_4\), see reference (4)). Function automorphismByRanges permits the classification of the pairwise alignment of protein-coding sub-regions based on the mutational events observed on it and on the genetic-code cubes that describe them.

Automorphisms in Z5, Z64 and Z125 are described as functions \(f(x) = k x mod 64\) and \(f(x) = k x mod 125\), where k and x are elements from the set of integers modulo 64 or modulo 125, respectively. If an automorphisms cannot be found on any of the cubes provided in the argument \(cube\), then function automorphisms will search for automorphisms in the cubes provided in the argument \(cube_alt\).

Automorphisms in Z5^3' are described as functions \(f(x) = Ax mod Z5\), where A is diagonal matrix.

Arguments cube and cube_alt must be pairs of' dual cubes (see section 2.4 from reference 4).

Methods

automorphismByRanges:

This function returns a GRanges-class object. Consecutive mutational events (on the codon sequence) described by automorphisms on a same cube are grouped in a range.

automorphism_bycoef

This function returns a GRanges-class object. Consecutive mutational events (on the codon sequence) described by the same automorphisms coefficients are grouped in a range.

getAutomorphisms

This function returns an AutomorphismList-class object as a list of Automorphism-class objects, which inherits from GRanges-class objects.

conserved_regions

Returns a AutomorphismByCoef class object containing the requested regions.

References

  1. Sanchez R, Morgado E, Grau R. Gene algebra from a genetic code algebraic structure. J Math Biol. 2005 Oct;51(4):431-57. doi: 10.1007/s00285-005-0332-8. Epub 2005 Jul 13. PMID: 16012800. ( PDF).

  2. Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi:10.1101/2021.06.01.446543

  3. M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 110-152.PDF.

  4. R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560. PDF

See also

Author

Robersy Sanchez (https://genomaths.com).

Examples

## Load a pairwise alignment
data("aln", package = "GenomAutomorphism")
aln
#> DNAStringSet object of length 2:
#>     width seq
#> [1]    51 ACCTATGTTGGTATT---GCGCTCCAACTCCTTGGCTCTAGCTCACTACAT
#> [2]    51 ATCTATGTTGGTATTACGACGCTCCAATTCCTTGGGTCC------CTCCTT

## Automorphism on "Z5^3"
autms <- automorphisms(seqs = aln, group = "Z5^3", verbose = FALSE)
autms
#> Automorphism object with 17 ranges and 8 metadata columns:
#>        seqnames    ranges strand |        seq1        seq2         aa1
#>           <Rle> <IRanges>  <Rle> | <character> <character> <character>
#>    [1]        1         1      + |         ACC         ATC           T
#>    [2]        1         2      + |         TAT         TAT           Y
#>    [3]        1         3      + |         GTT         GTT           V
#>    [4]        1         4      + |         GGT         GGT           G
#>    [5]        1         5      + |         ATT         ATT           I
#>    ...      ...       ...    ... .         ...         ...         ...
#>   [13]        1        13      + |         TCT         TCC           S
#>   [14]        1        14      + |         AGC         ---           S
#>   [15]        1        15      + |         TCA         ---           S
#>   [16]        1        16      + |         CTA         CTC           L
#>   [17]        1        17      + |         CAT         CTT           H
#>                aa2   coord1   coord2        autm        cube
#>        <character> <matrix> <matrix> <character> <character>
#>    [1]           I    1:2:2    1:4:2       1,2,1        ACGT
#>    [2]           Y    4:1:4    4:1:4       1,1,1        ACGT
#>    [3]           V    3:4:4    3:4:4       1,1,1        ACGT
#>    [4]           G    3:3:4    3:3:4       1,1,1        ACGT
#>    [5]           I    1:4:4    1:4:4       1,1,1        ACGT
#>    ...         ...      ...      ...         ...         ...
#>   [13]           S    4:2:4    4:2:2       1,1,3        ACGT
#>   [14]           -    1:3:2    0:0:0           0        Trnl
#>   [15]           -    4:2:1    0:0:0           0        Trnl
#>   [16]           L    2:4:1    2:4:2       1,1,2        ACGT
#>   [17]           L    2:1:4    2:4:4       1,4,1        ACGT
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

## Automorphism on "Z64"
autms <- automorphisms(seqs = aln, group = "Z64", verbose = FALSE)
autms
#> Automorphism object with 17 ranges and 8 metadata columns:
#>        seqnames    ranges strand |        seq1        seq2         aa1
#>           <Rle> <IRanges>  <Rle> | <character> <character> <character>
#>    [1]        1         1      + |         ACC         ATC           T
#>    [2]        1         2      + |         TAT         TAT           Y
#>    [3]        1         3      + |         GTT         GTT           V
#>    [4]        1         4      + |         GGT         GGT           G
#>    [5]        1         5      + |         ATT         ATT           I
#>    ...      ...       ...    ... .         ...         ...         ...
#>   [13]        1        13      + |         TCT         TCC           S
#>   [14]        1        14      + |         AGC         ---           S
#>   [15]        1        15      + |         TCA         ---           S
#>   [16]        1        16      - |         CTA         CTC           L
#>   [17]        1        17      + |         CAT         CTT           H
#>                aa2    coord1    coord2      autm        cube
#>        <character> <numeric> <numeric> <numeric> <character>
#>    [1]           I        17        49        33        ACGT
#>    [2]           Y        15        15         1        ACGT
#>    [3]           V        59        59         1        ACGT
#>    [4]           G        43        43         1        ACGT
#>    [5]           I        51        51         1        ACGT
#>    ...         ...       ...       ...       ...         ...
#>   [13]           S        31        29         3        ACGT
#>   [14]           -        33        NA         0        Trnl
#>   [15]           -        28        NA         0        Trnl
#>   [16]           L        52        53        30        TGCA
#>   [17]           L         7        55        17        ACGT
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

## Automorphism on "Z64" from position 1 to 33
autms <- automorphisms(
    seqs = aln,
    group = "Z64",
    start = 1,
    end = 33,
    verbose = FALSE
)
autms
#> Automorphism object with 11 ranges and 8 metadata columns:
#>        seqnames    ranges strand |        seq1        seq2         aa1
#>           <Rle> <IRanges>  <Rle> | <character> <character> <character>
#>    [1]        1         1      + |         ACC         ATC           T
#>    [2]        1         2      + |         TAT         TAT           Y
#>    [3]        1         3      + |         GTT         GTT           V
#>    [4]        1         4      + |         GGT         GGT           G
#>    [5]        1         5      + |         ATT         ATT           I
#>    [6]        1         6      + |         ---         ACG           -
#>    [7]        1         7      + |         GCG         ACG           A
#>    [8]        1         8      + |         CTC         CTC           L
#>    [9]        1         9      + |         CAA         CAA           Q
#>   [10]        1        10      + |         CTC         TTC           L
#>   [11]        1        11      + |         CTT         CTT           L
#>                aa2    coord1    coord2      autm        cube
#>        <character> <numeric> <numeric> <numeric> <character>
#>    [1]           I        17        49        33        ACGT
#>    [2]           Y        15        15         1        ACGT
#>    [3]           V        59        59         1        ACGT
#>    [4]           G        43        43         1        ACGT
#>    [5]           I        51        51         1        ACGT
#>    [6]           T        NA        18         0        Trnl
#>    [7]           T        26        18        45        ACGT
#>    [8]           L        53        53         1        ACGT
#>    [9]           Q         4         4         1        ACGT
#>   [10]           F        53        61        41        ACGT
#>   [11]           L        55        55         1        ACGT
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths