This function computes the weighted Manhattan distance between codons from two sequences as given in reference (1). That is, given two codons \(x\) and \(y\) with coordinates on the set of integers modulo 5 ("Z5"): \(x = (x_1, x_2, x_3)\) and \(x = (y_1, y_2, y_3)\) (see (1)), the Weighted Manhattan distance between this two codons is defined as:
$$d_w(x,y) = |x_1 - y_1|/5 + |x_2 - y_2| + |x_3 -y_3|/25$$
If the codon coordinates are given on "Z4", then the Weighted Manhattan distance is define as:
$$d_w(x,y) = |x_1 - y_1|/4 + |x_2 - y_2| + |x_3 -y_3|/16$$
Herein, we move to the generalized version given in reference (3), for which:
$$d_w(x,y) = |x_1 - y_1| w_1 + |x_2 - y_2| w_2 + |x_3 -y_3| w_3$$
where we use the vector of \(weight = (w_1, w_2, w_3)\).
Usage
codon_dist(x, y, ...)
# S4 method for DNAStringSet
codon_dist(
x,
weight = NULL,
group = c("Z4", "Z5"),
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
num.cores = 1L,
tasks = 0L,
verbose = FALSE
)
# S4 method for character
codon_dist(
x,
y,
weight = NULL,
group = c("Z4", "Z5"),
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
num.cores = 1L,
tasks = 0L,
verbose = FALSE
)
# S4 method for CodonGroup_OR_Automorphisms
codon_dist(
x,
weight = NULL,
group = c("Z4", "Z5"),
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
num.cores = 1L,
tasks = 0L,
verbose = FALSE
)
Arguments
- x, y
A character string of codon sequences, i.e., sequences of DNA base-triplets. If only 'x' argument is given, then it must be a
DNAStringSet-class
object.- ...
Not in use yet.
- weight
A numerical vector of weights to compute weighted Manhattan distance between codons. If \(weight = NULL\), then \(weight = (1/4,1,1/16)\) for \(group = "Z4"\) and \(weight = (1/5,1,1/25)\) for \(group = "Z5"\).
- group
A character string denoting the group representation for the given codon sequence as shown in reference (2-3).
- cube
A character string denoting one of the 24 Genetic-code cubes, as given in references (2-3).
- num.cores, tasks
Parameters for parallel computation using package
BiocParallel-package
: the number of cores to use, i.e. at most how many child processes will be run simultaneously (seebplapply
and the number of tasks per job (only for Linux OS).- verbose
If TRUE, prints the progress bar.
References
Sanchez R. Evolutionary Analysis of DNA-Protein-Coding Regions Based on a Genetic Code Cube Metric. Curr Top Med Chem. 2014;14: 407–417. https://doi.org/10.2174/1568026613666131204110022.
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560. PDF.
Examples
## Let's write two small DNA sequences
x = "ACGCGTGTACCGTGACTG"
y = "TGCGCCCGTGACGCGTGA"
codon_dist(x, y, group = "Z5")
#> [1] 1.64 1.28 1.32 1.24 1.28 1.48
## Alternatively, data can be vectors of codons, i.e., vectors of DNA
## base-triplets (including gaps simbol "-").
x = c("ACG","CGT","GTA","CCG","TGA","CTG","ACG")
y = c("TGC","GCC","CGT","GAC","---","TGA","A-G")
## Gaps are not defined on "Z4"
codon_dist(x, y, group = "Z4")
#> [1] 0.8750 0.4375 0.5000 0.3750 NA 0.6875 NA
## Gaps are considered on "Z5"
codon_dist(x, y, group = "Z5")
#> [1] 1.64 1.28 1.32 1.24 3.84 1.48 2.00
## Load an Automorphism-class object
data("autm", package = "GenomAutomorphism")
codon_dist(x = head(autm,20), group = "Z4")
#> [1] 0.0000 0.0000 0.0000 0.0000 0.1250 0.1250 0.0000 0.2500 0.0000 0.0000
#> [11] 0.0000 0.0000 0.0000 0.1875 0.0625 0.0000 0.0000 0.0000 0.0000 0.1250
## Load a pairwise alignment
data("aln", package = "GenomAutomorphism")
aln
#> DNAStringSet object of length 2:
#> width seq
#> [1] 51 ACCTATGTTGGTATT---GCGCTCCAACTCCTTGGCTCTAGCTCACTACAT
#> [2] 51 ATCTATGTTGGTATTACGACGCTCCAATTCCTTGGGTCC------CTCCTT
codon_dist(x = aln, group = "Z5")
#> [1] 2.00 0.00 0.00 0.00 0.00 2.32 0.40 0.00 0.00 0.40 0.00 0.04 0.08 3.28 2.84
#> [16] 0.04 3.00