A function to estimate the summarized measures of a specified variable given in a GRanges object (a column from the metacolums of the GRanges object) after split the GRanges object into intervals.
getGRangesStat(
GR,
win.size = 1,
step.size = 1,
grfeatures = NULL,
stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
stat2 = NULL,
stat3 = NULL,
column = NULL,
absolute = FALSE,
absolute2 = FALSE,
absolute3 = FALSE,
select.strand = NULL,
maxgap = -1L,
minoverlap = 0L,
select = "all",
ignore.strand = TRUE,
type = c("within", "start", "end", "equal", "any"),
scaling = 1000L,
logbase = 2,
missings = 0,
naming = FALSE,
na.rm = TRUE,
num.cores = 1L,
tasks = 0L,
verbose = TRUE,
...
)
# S4 method for pDMP
getGRangesStat(
GR,
win.size = 1,
step.size = 1,
grfeatures = NULL,
stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
stat2 = NULL,
stat3 = NULL,
column = NULL,
absolute = FALSE,
absolute2 = FALSE,
absolute3 = FALSE,
select.strand = NULL,
maxgap = -1L,
minoverlap = 0L,
select = "all",
ignore.strand = TRUE,
type = c("within", "start", "end", "equal", "any"),
scaling = 1000L,
logbase = 2,
missings = 0,
naming = FALSE,
na.rm = TRUE,
num.cores = 1L,
tasks = 0,
verbose = TRUE,
...
)
# S4 method for InfDiv
getGRangesStat(
GR,
win.size = 1,
step.size = 1,
grfeatures = NULL,
stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
stat2 = NULL,
stat3 = NULL,
column = NULL,
absolute = FALSE,
absolute2 = FALSE,
absolute3 = FALSE,
select.strand = NULL,
maxgap = -1L,
minoverlap = 0L,
select = "all",
ignore.strand = TRUE,
type = c("within", "start", "end", "equal", "any"),
scaling = 1000L,
logbase = 2,
missings = 0,
naming = FALSE,
na.rm = TRUE,
num.cores = 1L,
tasks = 0,
verbose = TRUE,
...
)
# S4 method for list
getGRangesStat(
GR,
win.size = 1,
step.size = 1,
grfeatures = NULL,
stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
stat2 = NULL,
stat3 = NULL,
column = NULL,
absolute = FALSE,
absolute2 = FALSE,
absolute3 = FALSE,
select.strand = NULL,
maxgap = -1L,
minoverlap = 0L,
select = "all",
ignore.strand = TRUE,
type = c("within", "start", "end", "equal", "any"),
scaling = 1000L,
logbase = 2,
missings = 0,
naming = FALSE,
na.rm = TRUE,
num.cores = 1L,
tasks = 0,
verbose = TRUE,
...
)
A GRanges-class
object or a
GRangesList-class
object carrying the
variables of interest in the GRanges metacolumn(s).
An integer for the size of the windows/regions size of the intervals of genomics regions.
Interval at which the regions/windows must be defined
A GRanges object corresponding to an annotated genomic feature. For example, gene region, transposable elements, exons, intergenic region, etc. If provided, then parameters 'win.size' and step.size are ignored and the statistics are estimated for 'grfeatures'.
Statistic used to estimate the summarized value of the variable of interest in each interval/window. Posible options are:
The mean of values inside each region.
The geometric mean of values inside each region.
The median of values inside each region.
The density of values inside each region. That is, the sum of values found in each region divided by the width of the region.
Compute the number/count of positions with values greater than zero inside each regions.
The number of sites with value > 0 inside each region divided by the width of the region.
The sum of values inside each region.
If GR have zero metacolum, then it is set stat = "count" and all the sites are included in the computation.
The same as for 'stat' argument. If provided, the statistic selected in 'stat2' and stat3 will be also reported.
Integer number denoting the column where the variable of interest is located in the metacolumn of the GRanges object. Default is 1L if the number of columns is greater than 1, otherwise NULL.
Optional. Logic (default: FALSE). Whether to use the absolute values of the variable provided. For example, the difference of methylation levels could take negative values (TV) and we would be interested on the sum of abs(TV), which is sum of the total variation distance.
The same as for 'absolute' argument, but applied when 'stat2' and 'stat3' are not null, respectively.
Optional. If provided,'+' or '-', then the summarized statistic is computed only for the specified DNA chain.
See
findOverlaps-methods
in the
IRanges package for a description of these arguments.
When set to TRUE, the strand information is ignored in the overlap calculations.
integer (default 1). Scaling factor to be used when stat = 'density'. For example, if scaling = 1000, then density * scaling denotes the sum of values in 1000 bp.
A positive number: the base with respect to which logarithms are computed when parameter 'entropy = TRUE' (default: logbase = 2).
Whether to write '0' or 'NA' on regions where there is not data to compute the statistic.
Logical value. If TRUE, the rows GRanges object will be given the names(grfeatures). Default is FALSE.
Logical value. If TRUE, the NA values will be removed.
Parameters for parallel computation using package
BiocParallel-package
: the number of cores to
use, i.e. at most how many child processes will be run simultaneously (see
bplapply
and the number of tasks per job (only
for Linux OS). Only used when signal is a list of
GRanges-class
object.
Logical. Default is TRUE. If TRUE, then the progress of the computational tasks is given.
A GRanges-class
object or a
GRangesList-class
object with the new genomic
regions and their corresponding summarized statistic.
This function split a Grange object into intervals genomic regions (GRs) of fixed size A summarized statistic (mean, median, geometric mean or sum) is calculated for the specified variable values from each region. Notice that if win.size == step.size, then non-overlapping windows are obtained.
library(GenomicRanges)
set.seed(1)
gr <- GRanges(seqnames = Rle( c('chr1', 'chr2', 'chr3', 'chr4'),
c(5, 5, 5, 5)),
ranges = IRanges(start = 1:20, end = 1:20),
strand = rep(c('+', '-'), 10),
A = seq(1, 0, length = 20))
gr$B <- runif(20)
grs <- getGRangesStat(gr, win.size = 4, step.size = 4)
#>
|
| | 0%
|
|= | 1%
|
|================== | 25%
|
|==================================================== | 75%
|
|======================================================================| 100%
grs
#> GRanges object with 8 ranges and 1 metadata column:
#> seqnames ranges strand | sum
#> <Rle> <IRanges> <Rle> | <numeric>
#> [1] chr1 1-4 * | 3.684211
#> [2] chr1 5-8 * | 0.789474
#> [3] chr2 6-9 * | 2.631579
#> [4] chr2 10-13 * | 0.526316
#> [5] chr3 11-14 * | 1.578947
#> [6] chr3 15-18 * | 0.263158
#> [7] chr4 16-19 * | 0.526316
#> [8] chr4 20-23 * | 0.000000
#> -------
#> seqinfo: 4 sequences from an unspecified genome; no seqlengths
## Selecting the positive strand
grs <- getGRangesStat(gr, win.size = 4,
step.size = 4, select.strand = '+')
#>
|
| | 0%
|
|= | 1%
|
|================== | 25%
|
|==================================================== | 75%
|
|======================================================================| 100%
grs
#> GRanges object with 4 ranges and 1 metadata column:
#> seqnames ranges strand | sum
#> <Rle> <IRanges> <Rle> | <numeric>
#> [1] chr1 1-4 + | 1.894737
#> [2] chr1 5-8 + | 0.789474
#> [3] chr3 11-14 + | 0.842105
#> [4] chr3 15-18 + | 0.263158
#> -------
#> seqinfo: 2 sequences from an unspecified genome; no seqlengths
## Selecting the negative strand
grs <- getGRangesStat(gr, win.size = 4,
step.size = 4, select.strand = '-')
#>
|
| | 0%
|
|= | 1%
|
|================== | 25%
|
|==================================================== | 75%
|
|======================================================================| 100%
grs
#> GRanges object with 4 ranges and 1 metadata column:
#> seqnames ranges strand | sum
#> <Rle> <IRanges> <Rle> | <numeric>
#> [1] chr2 6-9 - | 1.368421
#> [2] chr2 10-13 - | 0.526316
#> [3] chr4 16-19 - | 0.315789
#> [4] chr4 20-23 - | 0.000000
#> -------
#> seqinfo: 2 sequences from an unspecified genome; no seqlengths