DNA base/codon sequence and coordinates represented on a given Abelian group.
Source:R/get_coord.R
get_coord.Rd
Given a string denoting a codon or base from the DNA (or RNA)
alphabet and a genetic-code Abelian group as given in reference (1), this
function returns an object from CodonGroup-class
carrying the
DNA base/codon sequence and coordinates represented on the given Abelian
group.
Arguments
- x
An object from a
BaseGroup-class
,CodonGroup-class
,DNAStringSet
orDNAMultipleAlignment
class carrying the DNA pairwise alignment of two sequences. Objects fromBaseGroup-class
andCodonGroup-class
are generated with functions:base_coord
andcodon_coord
, respectively.- ...
Not in use.
- output
See 'Value' section.
- base_seq
Logical. Whether to return the base or codon coordinates on the selected Abelian group. If codon coordinates are requested, then the number of the DNA bases in the given sequences must be multiple of 3.
- filepath
A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided.
- cube
A character string denoting one of the 24 Genetic-code cubes, as given in references (2 2 3).
- group
A character string denoting the group representation for the given base or codon as shown in reference (1).
- start, end, chr, strand
Optional parameters required to build a
GRanges-class
. If not provided the default values given for the function definition will be used.
Value
An object from CodonGroup-class
class is returned
when output = 'all'. This has two slots, the first one carrying a
list of matrices and the second one carrying the codon/base sequence
information. That is, if x is an object from
CodonGroup-class
class, then a list of matrices of codon
coordinate can be retrieved as x@CoordList and the information on the
codon sequence as x@SeqRanges.
if output = 'matrix.list', then an object from
MatrixList
class is returned.
Details
Symbols '-' and 'N' usually found in DNA sequence alignments to denote gaps and missing/unknown bases are represented by the number: '-1' on Z4 and '0' in Z5. In Z64 the symbol 'NA' will be returned for codons including symbols '-' and 'N'.
Although the CodonGroup-class
object returned by
functions codon_coord
and base_coord
are useful
to store genomic information, the base and codon coordinates are not given
on them as numeric magnitudes. Function get_coord
provides
the way to get the coordinates in a numeric object in object from and still
to preserve the base/codon sequence information.
Examples
## Load a pairwise alignment
data("aln", package = "GenomAutomorphism")
aln
#> DNAStringSet object of length 2:
#> width seq
#> [1] 51 ACCTATGTTGGTATT---GCGCTCCAACTCCTTGGCTCTAGCTCACTACAT
#> [2] 51 ATCTATGTTGGTATTACGACGCTCCAATTCCTTGGGTCC------CTCCTT
## DNA base representation in the Abelian group Z5
coord <- get_coord(
x = aln,
cube = "ACGT",
group = "Z5"
)
coord ## A list of vectors
#> CodonSeq object of length: 2
#> names(2): coord1 coord2
#> -------
#> Vector of length: 51
#> [1] 1 2 2 4 1 4 3 4 4 3 3 4 1 4 4 0 0 0 3 2 3 2 4 2 2 1 1 2 4 2 2 4 4 3 3 2 4 2
#> [39] 4 1 3 2 4 2 1 2 4 1 2 1 4
#> ...
#> <1 more numeric element(s)>
#> Two slots: 'CoordList' & 'SeqRanges'
#> -------
## Extract the coordinate list
coordList(coord)
#> $coord1
#> [1] 1 2 2 4 1 4 3 4 4 3 3 4 1 4 4 0 0 0 3 2 3 2 4 2 2 1 1 2 4 2 2 4 4 3 3 2 4 2
#> [39] 4 1 3 2 4 2 1 2 4 1 2 1 4
#>
#> $coord2
#> [1] 1 4 2 4 1 4 3 4 4 3 3 4 1 4 4 1 2 3 1 2 3 2 4 2 2 1 1 4 4 2 2 4 4 3 3 3 4 2
#> [39] 2 0 0 0 0 0 0 2 4 2 2 4 4
#>
## Extract the sequence list
seqRanges(coord)
#> GRanges object with 51 ranges and 2 metadata columns:
#> seqnames ranges strand | seq1 seq2
#> <Rle> <IRanges> <Rle> | <character> <character>
#> [1] 1 1 + | A A
#> [2] 1 2 + | C T
#> [3] 1 3 + | C C
#> [4] 1 4 + | T T
#> [5] 1 5 + | A A
#> ... ... ... ... . ... ...
#> [47] 1 47 + | T T
#> [48] 1 48 + | A C
#> [49] 1 49 + | C C
#> [50] 1 50 + | A T
#> [51] 1 51 + | T T
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## DNA codon representation in the Abelian group Z64
coord <- get_coord(
x = aln,
base_seq = FALSE,
cube = "ACGT",
group = "Z64"
)
coord
#> CodonSeq object of length: 2
#> names(2): coord1 coord2
#> -------
#> Vector of length: 17
#> [1] 17 15 59 43 51 NA 26 53 4 53 55 41 31 33 28 52 7
#> ...
#> <1 more numeric element(s)>
#> Two slots: 'CoordList' & 'SeqRanges'
#> -------
## Extract the coordinate list
coordList(coord)
#> $coord1
#> [1] 17 15 59 43 51 NA 26 53 4 53 55 41 31 33 28 52 7
#>
#> $coord2
#> [1] 49 15 59 43 51 18 18 53 4 61 55 42 29 NA NA 53 55
#>
## Extract the sequence list
seqRanges(coord)
#> GRanges object with 17 ranges and 2 metadata columns:
#> seqnames ranges strand | seq1 seq2
#> <Rle> <IRanges> <Rle> | <character> <character>
#> [1] 1 1 + | ACC ATC
#> [2] 1 2 + | TAT TAT
#> [3] 1 3 + | GTT GTT
#> [4] 1 4 + | GGT GGT
#> [5] 1 5 + | ATT ATT
#> ... ... ... ... . ... ...
#> [13] 1 13 + | TCT TCC
#> [14] 1 14 + | AGC ---
#> [15] 1 15 + | TCA ---
#> [16] 1 16 + | CTA CTC
#> [17] 1 17 + | CAT CTT
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths