Skip to contents

Given a string denoting a codon or base from the DNA (or RNA) alphabet and a genetic-code Abelian group as given in reference (1), this function returns an object from CodonGroup-class carrying the DNA base/codon sequence and coordinates represented on the given Abelian group.

Usage

get_coord(x, ...)

# S4 method for BaseGroup_OR_CodonGroup
get_coord(x, output = c("all", "matrix.list"))

# S4 method for DNAStringSet_OR_NULL
get_coord(
  x,
  output = c("all", "matrix.list"),
  base_seq = TRUE,
  filepath = NULL,
  cube = "ACGT",
  group = "Z4",
  start = NA,
  end = NA,
  chr = 1L,
  strand = "+"
)

Arguments

x

An object from a BaseGroup-class, CodonGroup-class, DNAStringSet or DNAMultipleAlignment class carrying the DNA pairwise alignment of two sequences. Objects from BaseGroup-class and CodonGroup-class are generated with functions: base_coord and codon_coord, respectively.

...

Not in use.

output

See 'Value' section.

base_seq

Logical. Whether to return the base or codon coordinates on the selected Abelian group. If codon coordinates are requested, then the number of the DNA bases in the given sequences must be multiple of 3.

filepath

A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided.

cube

A character string denoting one of the 24 Genetic-code cubes, as given in references (2 2 3).

group

A character string denoting the group representation for the given base or codon as shown in reference (1).

start, end, chr, strand

Optional parameters required to build a GRanges-class. If not provided the default values given for the function definition will be used.

Value

An object from CodonGroup-class class is returned when output = 'all'. This has two slots, the first one carrying a list of matrices and the second one carrying the codon/base sequence information. That is, if x is an object from CodonGroup-class class, then a list of matrices of codon coordinate can be retrieved as x@CoordList and the information on the codon sequence as x@SeqRanges.

if output = 'matrix.list', then an object from MatrixList class is returned.

Details

Symbols '-' and 'N' usually found in DNA sequence alignments to denote gaps and missing/unknown bases are represented by the number: '-1' on Z4 and '0' in Z5. In Z64 the symbol 'NA' will be returned for codons including symbols '-' and 'N'.

Although the CodonGroup-class object returned by functions codon_coord and base_coord are useful to store genomic information, the base and codon coordinates are not given on them as numeric magnitudes. Function get_coord provides the way to get the coordinates in a numeric object in object from and still to preserve the base/codon sequence information.

Examples

## Load a pairwise alignment
data("aln", package = "GenomAutomorphism")
aln
#> DNAStringSet object of length 2:
#>     width seq
#> [1]    51 ACCTATGTTGGTATT---GCGCTCCAACTCCTTGGCTCTAGCTCACTACAT
#> [2]    51 ATCTATGTTGGTATTACGACGCTCCAATTCCTTGGGTCC------CTCCTT

## DNA base representation in the Abelian group Z5
coord <- get_coord(
    x = aln,
    cube = "ACGT",
    group = "Z5"
)
coord ## A list of vectors
#> CodonSeq object of length: 2
#> names(2): coord1 coord2 
#> ------- 
#> Vector of length: 51 
#>  [1] 1 2 2 4 1 4 3 4 4 3 3 4 1 4 4 0 0 0 3 2 3 2 4 2 2 1 1 2 4 2 2 4 4 3 3 2 4 2
#> [39] 4 1 3 2 4 2 1 2 4 1 2 1 4
#> ...
#> <1 more numeric element(s)>
#> Two slots: 'CoordList' & 'SeqRanges'
#> ------- 

## Extract the coordinate list
coordList(coord)
#> $coord1
#>  [1] 1 2 2 4 1 4 3 4 4 3 3 4 1 4 4 0 0 0 3 2 3 2 4 2 2 1 1 2 4 2 2 4 4 3 3 2 4 2
#> [39] 4 1 3 2 4 2 1 2 4 1 2 1 4
#> 
#> $coord2
#>  [1] 1 4 2 4 1 4 3 4 4 3 3 4 1 4 4 1 2 3 1 2 3 2 4 2 2 1 1 4 4 2 2 4 4 3 3 3 4 2
#> [39] 2 0 0 0 0 0 0 2 4 2 2 4 4
#> 

## Extract the sequence list
seqRanges(coord)
#> GRanges object with 51 ranges and 2 metadata columns:
#>        seqnames    ranges strand |        seq1        seq2
#>           <Rle> <IRanges>  <Rle> | <character> <character>
#>    [1]        1         1      + |           A           A
#>    [2]        1         2      + |           C           T
#>    [3]        1         3      + |           C           C
#>    [4]        1         4      + |           T           T
#>    [5]        1         5      + |           A           A
#>    ...      ...       ...    ... .         ...         ...
#>   [47]        1        47      + |           T           T
#>   [48]        1        48      + |           A           C
#>   [49]        1        49      + |           C           C
#>   [50]        1        50      + |           A           T
#>   [51]        1        51      + |           T           T
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

## DNA codon representation in the Abelian group Z64
coord <- get_coord(
    x = aln,
    base_seq = FALSE,
    cube = "ACGT",
    group = "Z64"
)
coord
#> CodonSeq object of length: 2
#> names(2): coord1 coord2 
#> ------- 
#> Vector of length: 17 
#>  [1] 17 15 59 43 51 NA 26 53  4 53 55 41 31 33 28 52  7
#> ...
#> <1 more numeric element(s)>
#> Two slots: 'CoordList' & 'SeqRanges'
#> ------- 

## Extract the coordinate list
coordList(coord)
#> $coord1
#>  [1] 17 15 59 43 51 NA 26 53  4 53 55 41 31 33 28 52  7
#> 
#> $coord2
#>  [1] 49 15 59 43 51 18 18 53  4 61 55 42 29 NA NA 53 55
#> 

## Extract the sequence list
seqRanges(coord)
#> GRanges object with 17 ranges and 2 metadata columns:
#>        seqnames    ranges strand |        seq1        seq2
#>           <Rle> <IRanges>  <Rle> | <character> <character>
#>    [1]        1         1      + |         ACC         ATC
#>    [2]        1         2      + |         TAT         TAT
#>    [3]        1         3      + |         GTT         GTT
#>    [4]        1         4      + |         GGT         GGT
#>    [5]        1         5      + |         ATT         ATT
#>    ...      ...       ...    ... .         ...         ...
#>   [13]        1        13      + |         TCT         TCC
#>   [14]        1        14      + |         AGC         ---
#>   [15]        1        15      + |         TCA         ---
#>   [16]        1        16      + |         CTA         CTC
#>   [17]        1        17      + |         CAT         CTT
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths