This function computes the distance between aminoacids in terms of a statistic of the corresponding codons. The possible statistics are: 'mean', 'median', or some user defined function.
Usage
aminoacid_dist(aa1, aa2, ...)
# S4 method for character,character
aminoacid_dist(
aa1,
aa2,
weight = NULL,
stat = c("mean", "median", "user_def"),
genetic_code = "1",
group = c("Z4", "Z5"),
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
num.cores = 1L,
tasks = 0L,
verbose = FALSE
)
# S4 method for DNAStringSet,ANY
aminoacid_dist(
aa1,
weight = NULL,
stat = c("mean", "median", "user_def"),
group = c("Z4", "Z5"),
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
num.cores = 1L,
tasks = 0L,
verbose = FALSE
)
# S4 method for AAStringSet,ANY
aminoacid_dist(
aa1,
weight = NULL,
stat = c("mean", "median", "user_def"),
group = c("Z4", "Z5"),
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
num.cores = 1L,
tasks = 0L,
verbose = FALSE
)
# S4 method for CodonGroup_OR_Automorphisms,ANY
aminoacid_dist(
aa1,
weight = NULL,
stat = c("mean", "median", "user_def"),
group = c("Z4", "Z5"),
cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG",
"ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA",
"CTGA", "GACT", "GCAT", "TACG", "TCAG"),
num.cores = 1L,
tasks = 0L,
verbose = FALSE
)
Arguments
- aa1, aa2
A character string of codon sequences, i.e., sequences of DNA base-triplets. If only 'x' argument is given, then it must be a
DNAStringSet-class
object.- ...
Not in use yet.
- weight
A numerical vector of weights to compute weighted Manhattan distance between codons. If \(weight = NULL\), then \(weight = (1/4,1,1/16)\) for \(group = "Z4"\) and \(weight = (1/5,1,1/25)\) for \(group = "Z5"\) (see
codon_dist
).- stat
The name of some statistical function summarizing data like 'mean', 'median', or some user defined function ('user_def'). If \(stat = 'user_def'\), then function must have a logical argument named 'na.rm' addressed to remove missing (NA) data (see e.g.,
mean
).- genetic_code
A single string that uniquely identifies the genetic code to extract. Should be one of the values in the id or name2 columns of
GENETIC_CODE_TABLE
.- group
A character string denoting the group representation for the given codon sequence as shown in reference (2-3).
- cube
A character string denoting one of the 24 Genetic-code cubes, as given in references (2-3).
- num.cores, tasks
Parameters for parallel computation using package
BiocParallel-package
: the number of cores to use, i.e. at most how many child processes will be run simultaneously (seebplapply
and the number of tasks per job (only for Linux OS).- verbose
If TRUE, prints the progress bar.
Details
Only aminoacids sequences given in the following alphabet are accepted: "A","R","N","D","C","Q","E","G","H","I","L","K", "M","F","P", "S","T","W","Y","V", "", "-", and "X"; where symbols "" and "-" denote the presence a stop codon and of a gap, respectively, and letter "X" missing information, which are then taken as a gap.
The distance between any aminoacid and any of the non-aminoacid symbols is the ceiling of the greater distance found in the corresponding aminoacid distance matrix.
References
Sanchez R. Evolutionary Analysis of DNA-Protein-Coding Regions Based on a Genetic Code Cube Metric. Curr. Top. Med. Chem. 2014;14: 407–417. https://doi.org/10.2174/1568026613666131204110022.
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560. PDF.
Examples
## Write down to aminoacid sequences
x <- "A*LTHMC"
y <- "AAMTDM-"
aminoacid_dist(aa1 = x, aa2 = y)
#> A.A *.A L.M T.T H.D M.M C.-
#> 0.1041667 2.0000000 0.4791667 0.1041667 0.3125000 0.0000000 2.0000000
## Let's create an AAStringSet-class object
aa <- AAStringSet(c(x, y))
aminoacid_dist(aa1 = aa)
#> A.A *.A L.M T.T H.D M.M C.-
#> 0.1041667 2.0000000 0.4791667 0.1041667 0.3125000 0.0000000 2.0000000
## Let's select cube "GCAT" and group "Z5"
aminoacid_dist(aa1 = aa, group = "Z5", cube = "TCGA")
#> A.A *.A L.M T.T H.D M.M C.-
#> 0.06666667 4.00000000 0.50000000 0.06666667 0.22000000 0.00000000 4.00000000