Skip to contents

This function applies numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids to DNA protein-coding or to aminoacid sequences. As results, DNA protein-coding or the aminoacid sequences are represented as numerical vectors which can be subject of further downstream statistical analysis and digital signal processing.

Usage

peptide_phychem_index(aa, ...)

# S4 method for character
peptide_phychem_index(
  aa,
  acc = NULL,
  aaindex = NA,
  userindex = NULL,
  alphabet = c("AA", "DNA"),
  genetic.code = getGeneticCode("1"),
  no.init.codon = FALSE,
  if.fuzzy.codon = "error",
  ...
)

# S4 method for DNAStringSet_OR_DNAMultipleAlignment
peptide_phychem_index(
  aa,
  acc = NULL,
  aaindex = NA,
  userindex = NULL,
  alphabet = c("AA", "DNA"),
  genetic.code = getGeneticCode("1"),
  no.init.codon = FALSE,
  if.fuzzy.codon = "error",
  num.cores = 1L,
  tasks = 0L,
  verbose = FALSE,
  ...
)

Arguments

aa

A character string, a DNAStringSet or a DNAMultipleAlignment class object carrying the DNA pairwise alignment of two sequences.

...

Not in use.

acc

Accession id for a specified mutation or contact potential matrix.

aaindex

Database where the requested accession id is locate and from where the aminoacid indices can be obtained. The possible values are: "aaindex2" or "aaindex3".

userindex

User provided aminoacid indices. This can be a numerical vector or a matrix (20 x 20). If a numerical matrix is provided, then the aminoacid indices are computes as column averages.

alphabet

Whether the alphabet is from the 20 aminoacid (AA) or four (DNA)/RNA base alphabet. This would prevent mistakes, i.e., the strings "ACG" would be a base-triplet on the DNA alphabet or simply the amino acid sequence of alanine, cysteine, and glutamic acid.

genetic.code, no.init.codon, if.fuzzy.codon

The same as given in function translation.

num.cores, tasks

Parameters for parallel computation using package BiocParallel-package: the number of cores to use, i.e. at most how many child processes will be run simultaneously (see bplapply and the number of tasks per job (only for Linux OS).

verbose

If TRUE, prints the function log to stdout.

Value

Depending on the user specifications, a mutation or contact potential matrix, a list of available matrices (indices) ids or index names can be returned. More specifically:

aa_mutmat:

Returns an aminoacid mutation matrix or a statistical protein contact potentials matrix.

aa_index:

Returns the specified aminoacid physicochemical indices.

Details

If a DNA sequence is given, then it is assumed that it is a DNA base-triplet sequence, i.e., the base sequence must be multiple of 3.

Errors can be originated if the given sequences carry letter which are not from the DNA or aminoacid alphabet.

Author

Robersy Sanchez https://genomaths.com

Examples

## Let's create DNAStringSet-class object
base <- DNAStringSet(x = c( seq1 ='ACGTCATAAAGT',
                            seq2 = 'GTGTAATACAGT',
                            seq3 = 'TCCTCATAAGGT'))

## The stop condon 'TAA' yields NA
aa <- peptide_phychem_index(base, acc = "EISD840101")
aa
#> MatrixSeq with 3 rows (sequences) and 4 columns (aminoacids/codons):
#> ------- 
#>       A1    A2   A3    A4
#> S1 -0.18 -0.26   NA -0.26
#> S2  0.54    NA 0.02 -0.26
#> S3 -0.26 -0.26   NA  0.16
#> ------- 
#> Slots: 'seqs', 'matrix', 'names', 'aaindex', 'phychem', 'accession

## Description of the physicochemical index
slot(aa, 'phychem')
#> [1] "EISD840101 Consensus normalized hydrophobicity scale (Eisenberg, 1984)"

## Get the aminoacid sequences. The stop codon 'TAA' is replaced by '*'.
slot(aa, 'seqs')
#>   seq1   seq2   seq3 
#> "TS*S" "V*YS" "SS*G" 


aa <- peptide_phychem_index(base, acc = "MIYS850103", aaindex = "aaindex3")
aa
#> MatrixSeq with 3 rows (sequences) and 4 columns (aminoacids/codons):
#> ------- 
#>         A1     A2      A3     A4
#> S1 -0.0045 0.0015      NA 0.0015
#> S2  0.0895     NA -0.0565 0.0015
#> S3  0.0015 0.0015      NA 0.0220
#> ------- 
#> Slots: 'seqs', 'matrix', 'names', 'aaindex', 'phychem', 'accession

## Description of the physicochemical index
slot(aa, 'phychem')
#> [1] "MIYS850103 Quasichemical energy of interactions in an average buried environment"