Statistic of Genomic Regions

A function to estimate the summarized measures of a specified variable given in a GRanges object (a column from the metacolums of the GRanges object) after split the GRanges object into intervals.

getGRangesStat(
  GR,
  win.size = 1,
  step.size = 1,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  stat2 = NULL,
  stat3 = NULL,
  column = NULL,
  absolute = FALSE,
  absolute2 = FALSE,
  absolute3 = FALSE,
  select.strand = NULL,
  maxgap = -1L,
  minoverlap = 0L,
  select = "all",
  ignore.strand = TRUE,
  type = c("within", "start", "end", "equal", "any"),
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  naming = FALSE,
  na.rm = TRUE,
  num.cores = 1L,
  tasks = 0L,
  verbose = TRUE,
  ...
)

# S4 method for pDMP
getGRangesStat(
  GR,
  win.size = 1,
  step.size = 1,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  stat2 = NULL,
  stat3 = NULL,
  column = NULL,
  absolute = FALSE,
  absolute2 = FALSE,
  absolute3 = FALSE,
  select.strand = NULL,
  maxgap = -1L,
  minoverlap = 0L,
  select = "all",
  ignore.strand = TRUE,
  type = c("within", "start", "end", "equal", "any"),
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  naming = FALSE,
  na.rm = TRUE,
  num.cores = 1L,
  tasks = 0,
  verbose = TRUE,
  ...
)

# S4 method for InfDiv
getGRangesStat(
  GR,
  win.size = 1,
  step.size = 1,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  stat2 = NULL,
  stat3 = NULL,
  column = NULL,
  absolute = FALSE,
  absolute2 = FALSE,
  absolute3 = FALSE,
  select.strand = NULL,
  maxgap = -1L,
  minoverlap = 0L,
  select = "all",
  ignore.strand = TRUE,
  type = c("within", "start", "end", "equal", "any"),
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  naming = FALSE,
  na.rm = TRUE,
  num.cores = 1L,
  tasks = 0,
  verbose = TRUE,
  ...
)

# S4 method for list
getGRangesStat(
  GR,
  win.size = 1,
  step.size = 1,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  stat2 = NULL,
  stat3 = NULL,
  column = NULL,
  absolute = FALSE,
  absolute2 = FALSE,
  absolute3 = FALSE,
  select.strand = NULL,
  maxgap = -1L,
  minoverlap = 0L,
  select = "all",
  ignore.strand = TRUE,
  type = c("within", "start", "end", "equal", "any"),
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  naming = FALSE,
  na.rm = TRUE,
  num.cores = 1L,
  tasks = 0,
  verbose = TRUE,
  ...
)

Arguments

GR

A GRanges-class object or a GRangesList-class object carrying the variables of interest in the GRanges metacolumn(s).

win.size

An integer for the size of the windows/regions size of the intervals of genomics regions.

step.size

Interval at which the regions/windows must be defined

grfeatures

A GRanges object corresponding to an annotated genomic feature. For example, gene region, transposable elements, exons, intergenic region, etc. If provided, then parameters 'win.size' and step.size are ignored and the statistics are estimated for 'grfeatures'.

stat

Statistic used to estimate the summarized value of the variable of interest in each interval/window. Posible options are:

'mean':: The mean of values inside each region.
'gmean':: The geometric mean of values inside each region.
'median':: The median of values inside each region.
'density':: The density of values inside each region. That is, the sum of values found in each region divided by the width of the region.
'count':: Compute the number/count of positions with values greater than zero inside each regions.
'denCount':: The number of sites with value > 0 inside each region divided by the width of the region.
'sum':: The sum of values inside each region.

If GR have zero metacolum, then it is set stat = "count" and all the sites are included in the computation.

stat2, stat3

The same as for 'stat' argument. If provided, the statistic selected in 'stat2' and stat3 will be also reported.

column

Integer number denoting the column where the variable of interest is located in the metacolumn of the GRanges object. Default is 1L if the number of columns is greater than 1, otherwise NULL.

absolute

Optional. Logic (default: FALSE). Whether to use the absolute values of the variable provided. For example, the difference of methylation levels could take negative values (TV) and we would be interested on the sum of abs(TV), which is sum of the total variation distance.

absolute2, absolute3

The same as for 'absolute' argument, but applied when 'stat2' and 'stat3' are not null, respectively.

select.strand

Optional. If provided,'+' or '-', then the summarized statistic is computed only for the specified DNA chain.

maxgap, minoverlap, type

See findOverlaps-methods in the IRanges package for a description of these arguments.

ignore.strand

When set to TRUE, the strand information is ignored in the overlap calculations.

scaling

integer (default 1). Scaling factor to be used when stat = 'density'. For example, if scaling = 1000, then density * scaling denotes the sum of values in 1000 bp.

logbase

A positive number: the base with respect to which logarithms are computed when parameter 'entropy = TRUE' (default: logbase = 2).

missings

Whether to write '0' or 'NA' on regions where there is not data to compute the statistic.

naming

Logical value. If TRUE, the rows GRanges object will be given the names(grfeatures). Default is FALSE.

na.rm

Logical value. If TRUE, the NA values will be removed.

num.cores, tasks

Parameters for parallel computation using package BiocParallel-package: the number of cores to use, i.e. at most how many child processes will be run simultaneously (see bplapply and the number of tasks per job (only for Linux OS). Only used when signal is a list of GRanges-class object.

verbose

Logical. Default is TRUE. If TRUE, then the progress of the computational tasks is given.

Value

A GRanges-class object or a GRangesList-class object with the new genomic regions and their corresponding summarized statistic.

Details

This function split a Grange object into intervals genomic regions (GRs) of fixed size A summarized statistic (mean, median, geometric mean or sum) is calculated for the specified variable values from each region. Notice that if win.size == step.size, then non-overlapping windows are obtained.

Author

Robersy Sanchez (https://github.com/genomaths).

Examples

library(GenomicRanges)
set.seed(1)
gr <- GRanges(seqnames = Rle( c('chr1', 'chr2', 'chr3', 'chr4'),
            c(5, 5, 5, 5)),
            ranges = IRanges(start = 1:20, end = 1:20),
            strand = rep(c('+', '-'), 10),
            A = seq(1, 0, length = 20))
gr$B <- runif(20)
grs <- getGRangesStat(gr, win.size = 4, step.size = 4)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
grs
#> GRanges object with 8 ranges and 1 metadata column:
#>       seqnames    ranges strand |       sum
#>          <Rle> <IRanges>  <Rle> | <numeric>
#>   [1]     chr1       1-4      * |  3.684211
#>   [2]     chr1       5-8      * |  0.789474
#>   [3]     chr2       6-9      * |  2.631579
#>   [4]     chr2     10-13      * |  0.526316
#>   [5]     chr3     11-14      * |  1.578947
#>   [6]     chr3     15-18      * |  0.263158
#>   [7]     chr4     16-19      * |  0.526316
#>   [8]     chr4     20-23      * |  0.000000
#>   -------
#>   seqinfo: 4 sequences from an unspecified genome; no seqlengths

## Selecting the positive strand
grs <- getGRangesStat(gr, win.size = 4,
             step.size = 4, select.strand = '+')
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
grs
#> GRanges object with 4 ranges and 1 metadata column:
#>       seqnames    ranges strand |       sum
#>          <Rle> <IRanges>  <Rle> | <numeric>
#>   [1]     chr1       1-4      + |  1.894737
#>   [2]     chr1       5-8      + |  0.789474
#>   [3]     chr3     11-14      + |  0.842105
#>   [4]     chr3     15-18      + |  0.263158
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths

## Selecting the negative strand
grs <- getGRangesStat(gr, win.size = 4,
             step.size = 4, select.strand = '-')
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
grs
#> GRanges object with 4 ranges and 1 metadata column:
#>       seqnames    ranges strand |       sum
#>          <Rle> <IRanges>  <Rle> | <numeric>
#>   [1]     chr2       6-9      - |  1.368421
#>   [2]     chr2     10-13      - |  0.526316
#>   [3]     chr4     16-19      - |  0.315789
#>   [4]     chr4     20-23      - |  0.000000
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths

Arguments

Value

Details

See also

Author

Examples