Base coordinates on a given Abelian group representation
Given a string denoting a codon or base from the DNA (or RNA) alphabet, function base_coord return the base coordinates in the specify genetic-code Abelian group, as given in reference (1).
DNA sequences to GRanges
of bases.
Function seq2granges transform an object from
DNAStringSet
,
DNAMultipleAlignment-class
or a character into
an object from BaseSeq.
BaseSeq-class object to DNAStringSet-class object.
Function base_seq2string_set transforms an object from
BaseSeq into an object from DNAStringSet-class
.
Usage
base_coord(base = NULL, filepath = NULL, cube = "ACGT", group = "Z4", ...)
# S4 method for DNAStringSet_OR_NULL
base_coord(
base = NULL,
filepath = NULL,
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
group = c("Z4", "Z5"),
start = NA,
end = NA,
chr = 1L,
strand = "+"
)
seq2granges(base = NULL, filepath = NULL, ...)
# S4 method for DNAStringSet_OR_NULL
seq2granges(
base = NULL,
filepath = NULL,
start = NA,
end = NA,
chr = 1L,
strand = "+",
seq_alias = NULL,
...
)
base_seq2string_set(x, ...)
# S4 method for BaseSeq
base_seq2string_set(x)
base_matrix(base, ...)
# S4 method for DNAStringSet_OR_NULL
base_matrix(
base,
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
group = c("Z4", "Z5"),
seq_alias = NULL
)
Arguments
- base
An object from a
DNAStringSet
orDNAMultipleAlignment
class carrying the DNA pairwise alignment of two sequences.- filepath
A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided.
- cube
A character string denoting one of the 24 Genetic-code cubes, as given in references (2 2 3).
- group
A character string denoting the group representation for the given base or codon as shown in reference (1).
- ...
Not in use yet.
- start, end, chr, strand
Optional parameters required to build a
GRanges-class
. If not provided the default values given for the function definition will be used.- seq_alias
DNA sequence alias/ID and description.
- x
A 'BaseSeq' class object.
Value
Depending on the function called, different object will be returned:
Function 'base_coord'
This function returns a BaseGroup
object
carrying the DNA sequence(s) and their respective coordinates in the
requested Abelian group of base representation (one-dimension, "Z4" or
"Z5"). Observe that to get coordinates in the set of of integer numbers
("Z") is also possible but they are not defined to integrate a Abelian
group. These are just used for the further insertion the codon set in the
3D space (R^3).
Function 'seq2granges'
This function returns a BaseGroup object carrying the DNA sequence(s), one
base per ranges. A BaseGroup class object inherits from
GRanges-class
.
Function 'base_seq2string_set'
This function returns a DNAStringSet-class
.
A BaseGroup-class object.
Details
Function 'base_coord'
Function base_coord is defined only for pairwise aligned sequences. Symbols "-" and "N" usually found in DNA sequence alignments to denote gaps and missing/unknown bases are represented by the number: '-1' on Z4 and '0' on Z5. In Z64 the symbol 'NA' will be returned for codons including symbols "-" and "N".
References
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi:10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.
Author
Robersy Sanchez https://genomaths.com
Examples
## Example 1. Let's get the base coordinates for codons "ACG"
## and "TGC":
x0 <- c("ACG", "TGC")
x1 <- DNAStringSet(x0)
x1
#> DNAStringSet object of length 2:
#> width seq
#> [1] 3 ACG
#> [2] 3 TGC
## Get the base coordinates on cube = "ACGT" on the Abelian group = "Z4"
base_coord(x1, cube = "ACGT", group = "Z4")
#> BaseGroup object with 3 ranges and 4 metadata columns:
#> seqnames ranges strand | seq1 seq2 coord1 coord2
#> <Rle> <IRanges> <Rle> | <character> <character> <numeric> <numeric>
#> [1] 1 1 + | A T 0 3
#> [2] 1 2 + | C G 1 2
#> [3] 1 3 + | G C 2 1
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Example 2. Load a pairwise alignment
data("aln", package = "GenomAutomorphism")
aln
#> DNAStringSet object of length 2:
#> width seq
#> [1] 51 ACCTATGTTGGTATT---GCGCTCCAACTCCTTGGCTCTAGCTCACTACAT
#> [2] 51 ATCTATGTTGGTATTACGACGCTCCAATTCCTTGGGTCC------CTCCTT
## DNA base representation in the Abelian group Z4
bs_cor <- base_coord(
base = aln,
cube = "ACGT"
)
bs_cor
#> BaseGroup object with 51 ranges and 4 metadata columns:
#> seqnames ranges strand | seq1 seq2 coord1 coord2
#> <Rle> <IRanges> <Rle> | <character> <character> <numeric> <numeric>
#> [1] 1 1 + | A A 0 0
#> [2] 1 2 + | C T 1 3
#> [3] 1 3 + | C C 1 1
#> [4] 1 4 + | T T 3 3
#> [5] 1 5 + | A A 0 0
#> ... ... ... ... . ... ... ... ...
#> [47] 1 47 + | T T 3 3
#> [48] 1 48 + | A C 0 1
#> [49] 1 49 + | C C 1 1
#> [50] 1 50 + | A T 0 3
#> [51] 1 51 + | T T 3 3
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Example 3. DNA base representation in the Abelian group Z5
bs_cor <- base_coord(
base = aln,
cube = "ACGT",
group = "Z5"
)
bs_cor
#> BaseGroup object with 51 ranges and 4 metadata columns:
#> seqnames ranges strand | seq1 seq2 coord1 coord2
#> <Rle> <IRanges> <Rle> | <character> <character> <numeric> <numeric>
#> [1] 1 1 + | A A 1 1
#> [2] 1 2 + | C T 2 4
#> [3] 1 3 + | C C 2 2
#> [4] 1 4 + | T T 4 4
#> [5] 1 5 + | A A 1 1
#> ... ... ... ... . ... ... ... ...
#> [47] 1 47 + | T T 4 4
#> [48] 1 48 + | A C 1 2
#> [49] 1 49 + | C C 2 2
#> [50] 1 50 + | A T 1 4
#> [51] 1 51 + | T T 4 4
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Example 4. Load a multiple sequence alignment (MSA) of primate BRCA1 DNA
## repair genes
data("brca1_aln2", package = "GenomAutomorphism")
brca1_aln2
#> DNAMultipleAlignment with 41 rows and 2283 columns
#> aln names
#> [1] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [2] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA U64805.1:1-2280_H...
#> [3] ATGGATTTCTCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [4] ATGGATTTGTCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [5] ATGGATTTTTCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [6] ATGGATTTTTCTGCTCTTCGGGTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [7] ATGGATTTTTCTGCTCTTCGCTTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [8] ATGGATTTTTCTGCTCTTCGCCTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [9] ATGGATTTTTCTGCTCTTCGCATTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> ... ...
#> [33] ATGGATTTATCTGCTCTTCGCGTTG...TCAGATCCCCCACAGCCACTACTGA XM_034941185.1:24...
#> [34] ATGGATTTATCTGCTCTTCGCGTTG...TCAGATCCCCCACAGCCACTACTGA XM_034941182.1:25...
#> [35] ATGGATTTATCTGCTGTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_032163757.1:14...
#> [36] ATGGATTTATCTGCTGTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_032163756.1:14...
#> [37] ATGGATTTATCTGCTGTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_032163758.1:13...
#> [38] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_030923119.1:18...
#> [39] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_030923118.1:18...
#> [40] ATGGATTTACCTGCTGTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_025363316.1:14...
#> [41] ATGGATTTATCTGCTGTTCGTGTTG...CCAGATCCCTCACAGCCACTACTGA XM_039475995.1:49...
## Get BaseSeq-class object
gr <- seq2granges(brca1_aln2)
gr
#> BaseSeq object with 2283 ranges and 41 metadata columns:
#> seqnames ranges strand | S1 S2 S3
#> <Rle> <IRanges> <Rle> | <character> <character> <character>
#> [1] 1 1 + | A A A
#> [2] 1 2 + | T T T
#> [3] 1 3 + | G G G
#> [4] 1 4 + | G G G
#> [5] 1 5 + | A A A
#> ... ... ... ... . ... ... ...
#> [2279] 1 2279 + | A A A
#> [2280] 1 2280 + | C C C
#> [2281] 1 2281 + | T T T
#> [2282] 1 2282 + | G G G
#> [2283] 1 2283 + | A A A
#> S4 S5 S6 S7 S8
#> <character> <character> <character> <character> <character>
#> [1] A A A A A
#> [2] T T T T T
#> [3] G G G G G
#> [4] G G G G G
#> [5] A A A A A
#> ... ... ... ... ... ...
#> [2279] A A A A A
#> [2280] C C C C C
#> [2281] T T T T T
#> [2282] G G G G G
#> [2283] A A A A A
#> S9 S10 S11 S12 S13
#> <character> <character> <character> <character> <character>
#> [1] A A A A A
#> [2] T T T T T
#> [3] G G G G G
#> [4] G G G G G
#> [5] A A A A A
#> ... ... ... ... ... ...
#> [2279] A A A A A
#> [2280] C C C C C
#> [2281] T T T T T
#> [2282] G G G G G
#> [2283] A A A A A
#> S14 S15 S16 S17 S18
#> <character> <character> <character> <character> <character>
#> [1] A A A A A
#> [2] T T T T T
#> [3] G G G G G
#> [4] G G G G G
#> [5] A A A A A
#> ... ... ... ... ... ...
#> [2279] A A A A A
#> [2280] C C C C C
#> [2281] T T T T T
#> [2282] G G G G G
#> [2283] A A A A A
#> S19 S20 S21 S22 S23
#> <character> <character> <character> <character> <character>
#> [1] A A A A A
#> [2] T T T T T
#> [3] G G G G G
#> [4] G G G G G
#> [5] A A A A A
#> ... ... ... ... ... ...
#> [2279] A A A A A
#> [2280] C C C C C
#> [2281] T T T T T
#> [2282] G G G G G
#> [2283] A A A A A
#> S24 S25 S26 S27 S28
#> <character> <character> <character> <character> <character>
#> [1] A A A A A
#> [2] T T T T T
#> [3] G G G G G
#> [4] G G G G G
#> [5] A A A A A
#> ... ... ... ... ... ...
#> [2279] A A A A A
#> [2280] C C C C C
#> [2281] T T T T T
#> [2282] G G G G G
#> [2283] A A A A A
#> S29 S30 S31 S32 S33
#> <character> <character> <character> <character> <character>
#> [1] A A A A A
#> [2] T T T T T
#> [3] G G G G G
#> [4] G G G G G
#> [5] A A A A A
#> ... ... ... ... ... ...
#> [2279] A A A A A
#> [2280] C C C C C
#> [2281] T T T T T
#> [2282] G G G G G
#> [2283] A A A A A
#> S34 S35 S36 S37 S38
#> <character> <character> <character> <character> <character>
#> [1] A A A A A
#> [2] T T T T T
#> [3] G G G G G
#> [4] G G G G G
#> [5] A A A A A
#> ... ... ... ... ... ...
#> [2279] A A A A A
#> [2280] C C C C C
#> [2281] T T T T T
#> [2282] G G G G G
#> [2283] A A A A A
#> S39 S40 S41
#> <character> <character> <character>
#> [1] A A A
#> [2] T T T
#> [3] G G G
#> [4] G G G
#> [5] A A A
#> ... ... ... ...
#> [2279] A A A
#> [2280] C C C
#> [2281] T T T
#> [2282] G G G
#> [2283] A A A
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Transform the BaseSeq-class object into a DNAStringSet-class object
str_set <- base_seq2string_set(gr)
str_set
#> DNAStringSet object of length 41:
#> width seq names
#> [1] 2283 ATGGATTTATCTGCTCTTCGCGT...GATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [2] 2283 ATGGATTTATCTGCTCTTCGCGT...GATCCCCCACAGCCACTACTGA U64805.1:1-2280_H...
#> [3] 2283 ATGGATTTCTCTGCTCTTCGCGT...GATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [4] 2283 ATGGATTTGTCTGCTCTTCGCGT...GATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [5] 2283 ATGGATTTTTCTGCTCTTCGCGT...GATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> ... ... ...
#> [37] 2283 ATGGATTTATCTGCTGTTCGCGT...GATCCCCCACAGCCACTACTGA XM_032163758.1:13...
#> [38] 2283 ATGGATTTATCTGCTCTTCGCGT...GATCCCCCACAGCCACTACTGA XM_030923119.1:18...
#> [39] 2283 ATGGATTTATCTGCTCTTCGCGT...GATCCCCCACAGCCACTACTGA XM_030923118.1:18...
#> [40] 2283 ATGGATTTACCTGCTGTTCGCGT...GATCCCCCACAGCCACTACTGA XM_025363316.1:14...
#> [41] 2283 ATGGATTTATCTGCTGTTCGTGT...GATCCCTCACAGCCACTACTGA XM_039475995.1:49...
## Recovering the original MSA
DNAMultipleAlignment(as.character(str_set))
#> DNAMultipleAlignment with 41 rows and 2283 columns
#> aln names
#> [1] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [2] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA U64805.1:1-2280_H...
#> [3] ATGGATTTCTCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [4] ATGGATTTGTCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [5] ATGGATTTTTCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [6] ATGGATTTTTCTGCTCTTCGGGTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [7] ATGGATTTTTCTGCTCTTCGCTTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [8] ATGGATTTTTCTGCTCTTCGCCTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> [9] ATGGATTTTTCTGCTCTTCGCATTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
#> ... ...
#> [33] ATGGATTTATCTGCTCTTCGCGTTG...TCAGATCCCCCACAGCCACTACTGA XM_034941185.1:24...
#> [34] ATGGATTTATCTGCTCTTCGCGTTG...TCAGATCCCCCACAGCCACTACTGA XM_034941182.1:25...
#> [35] ATGGATTTATCTGCTGTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_032163757.1:14...
#> [36] ATGGATTTATCTGCTGTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_032163756.1:14...
#> [37] ATGGATTTATCTGCTGTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_032163758.1:13...
#> [38] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_030923119.1:18...
#> [39] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_030923118.1:18...
#> [40] ATGGATTTACCTGCTGTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_025363316.1:14...
#> [41] ATGGATTTATCTGCTGTTCGTGTTG...CCAGATCCCTCACAGCCACTACTGA XM_039475995.1:49...
## Example 5.
base_matrix(base = aln, cube = "CGTA", group = "Z5")
#> BaseSeqMatrix object with 51 ranges and 2 metadata columns:
#> seqnames ranges strand | S1 S2
#> <Rle> <IRanges> <Rle> | <numeric> <numeric>
#> [1] 1 1 + | 4 4
#> [2] 1 2 + | 1 3
#> [3] 1 3 + | 1 1
#> [4] 1 4 + | 3 3
#> [5] 1 5 + | 4 4
#> ... ... ... ... . ... ...
#> [47] 1 47 + | 3 3
#> [48] 1 48 + | 4 1
#> [49] 1 49 + | 1 1
#> [50] 1 50 + | 4 3
#> [51] 1 51 + | 3 3
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Example 5.