This function applies numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids to DNA protein-coding or to aminoacid sequences. As results, DNA protein-coding or the aminoacid sequences are represented as numerical vectors which can be subject of further downstream statistical analysis and digital signal processing.
Usage
peptide_phychem_index(aa, ...)
# S4 method for character
peptide_phychem_index(
aa,
acc = NULL,
aaindex = NA,
userindex = NULL,
alphabet = c("AA", "DNA"),
genetic.code = getGeneticCode("1"),
no.init.codon = FALSE,
if.fuzzy.codon = "error",
...
)
# S4 method for DNAStringSet_OR_DNAMultipleAlignment
peptide_phychem_index(
aa,
acc = NULL,
aaindex = NA,
userindex = NULL,
alphabet = c("AA", "DNA"),
genetic.code = getGeneticCode("1"),
no.init.codon = FALSE,
if.fuzzy.codon = "error",
num.cores = 1L,
tasks = 0L,
verbose = FALSE,
...
)
Arguments
- aa
A character string, a
DNAStringSet
or aDNAMultipleAlignment
class object carrying the DNA pairwise alignment of two sequences.- ...
Not in use.
- acc
Accession id for a specified mutation or contact potential matrix.
- aaindex
Database where the requested accession id is locate and from where the aminoacid indices can be obtained. The possible values are: "aaindex2" or "aaindex3".
- userindex
User provided aminoacid indices. This can be a numerical vector or a matrix (20 x 20). If a numerical matrix is provided, then the aminoacid indices are computes as column averages.
- alphabet
Whether the alphabet is from the 20 aminoacid (AA) or four (DNA)/RNA base alphabet. This would prevent mistakes, i.e., the strings "ACG" would be a base-triplet on the DNA alphabet or simply the amino acid sequence of alanine, cysteine, and glutamic acid.
- genetic.code, no.init.codon, if.fuzzy.codon
The same as given in function translation.
- num.cores, tasks
Parameters for parallel computation using package
BiocParallel-package
: the number of cores to use, i.e. at most how many child processes will be run simultaneously (seebplapply
and the number of tasks per job (only for Linux OS).- verbose
If TRUE, prints the function log to stdout.
Value
Depending on the user specifications, a mutation or contact potential matrix, a list of available matrices (indices) ids or index names can be returned. More specifically:
- aa_mutmat:
Returns an aminoacid mutation matrix or a statistical protein contact potentials matrix.
- aa_index:
Returns the specified aminoacid physicochemical indices.
Details
If a DNA sequence is given, then it is assumed that it is a DNA base-triplet sequence, i.e., the base sequence must be multiple of 3.
Errors can be originated if the given sequences carry letter which are not from the DNA or aminoacid alphabet.
Author
Robersy Sanchez https://genomaths.com
Examples
## Let's create DNAStringSet-class object
base <- DNAStringSet(x = c( seq1 ='ACGTCATAAAGT',
seq2 = 'GTGTAATACAGT',
seq3 = 'TCCTCATAAGGT'))
## The stop condon 'TAA' yields NA
aa <- peptide_phychem_index(base, acc = "EISD840101")
aa
#> MatrixSeq with 3 rows (sequences) and 4 columns (aminoacids/codons):
#> -------
#> A1 A2 A3 A4
#> S1 -0.18 -0.26 NA -0.26
#> S2 0.54 NA 0.02 -0.26
#> S3 -0.26 -0.26 NA 0.16
#> -------
#> Slots: 'seqs', 'matrix', 'names', 'aaindex', 'phychem', 'accession
## Description of the physicochemical index
slot(aa, 'phychem')
#> [1] "EISD840101 Consensus normalized hydrophobicity scale (Eisenberg, 1984)"
## Get the aminoacid sequences. The stop codon 'TAA' is replaced by '*'.
slot(aa, 'seqs')
#> seq1 seq2 seq3
#> "TS*S" "V*YS" "SS*G"
aa <- peptide_phychem_index(base, acc = "MIYS850103", aaindex = "aaindex3")
aa
#> MatrixSeq with 3 rows (sequences) and 4 columns (aminoacids/codons):
#> -------
#> A1 A2 A3 A4
#> S1 -0.0045 0.0015 NA 0.0015
#> S2 0.0895 NA -0.0565 0.0015
#> S3 0.0015 0.0015 NA 0.0220
#> -------
#> Slots: 'seqs', 'matrix', 'names', 'aaindex', 'phychem', 'accession
## Description of the physicochemical index
slot(aa, 'phychem')
#> [1] "MIYS850103 Quasichemical energy of interactions in an average buried environment"