R: Linear density of DMPs at a given genomic region

R Documentation

Linear density of DMPs at a given genomic region

Description

The linear density of DMPs in a given genomic region (GR) is defined according with the classical terminology in physics, i.e., as the measure of the physical quantity of any characteristic value per unit of length. In the current case, as the amount of DIMPs per nucleotide base.

Usage

dmpDensity(GR, column = 1, cut.col = 1, cutoff, Chr = NULL,
  start.pos = NULL, end.pos = NULL, int.size1 = NULL,
  int.size2 = NULL, breaks = NULL, scaling = TRUE, plot = FALSE,
  noDMP.dens = TRUE, xlabel = "Coordinate",
  ylabel = "Normalized density", col.dmp = "red", col.ndmp = "blue",
  yintercept = 0.25, col.yintercept = "magenta",
  type.yintercept = "dashed", dig.lab = 3)

Arguments

`GR`	A genomic GRanges object carrying the genomic region where the estimation of the DMP density will be accomplished.
`cut.col`	Integer denoting the GR metacolumn where the decision variable about whether a position is DMP is located. Default cut.col = 1.
`cutoff`	Cut value to decide wheter the value of the variable used to estimate the density is a DMP at each position. If missing, then cutoff is estimated as the first queantile greater than zero from the values given in the GR column cut.col.
`Chr`	A character string. Default NULL. If the GR object comprises several chromosomes, then one chromosome must be specified. Otherwise the density of first chromosome will be returned.
`start.pos, end.pos`	Start and end positions, respectively, of the GR where the density of DMPs will be estimated. Default NULL. If NULL densities will be estimated for the whole GR and the specified chromosome.
`int.size1, int.size2`	The interval/window size where the density of DMP and no DMPs are computed. Default Null.
`breaks`	Integer. Number of windows/intervals to split the GR. Deafult NULL. If provided, then it is applied to compute the densities of DMPs and no-DMPs. If 'int.size1', 'int.size2', and 'breaks' are NULL, then the breaks are computed as: `breaks <- min(150, max(start(x))/nclass.FD(start(x)), na.rm = TRUE)`, where function nclass.FD (`nclass`) applies Freedman-Diaconis algorithm.
`scaling`	Logic value to deside whether to perform the scaling of the estimated density values or not. Default is TRUE.
`plot`	Logic. Whether to produce a grahic or not. Default, plot = TRUE.
`noDMP.dens`	Logic whether to produce the graphics for no-DMP density. Default is TRUE
`xlabel`	X-axis label. Default xlabel = "Coordinate".
`ylabel`	Y-axis label. Default ylabel = "Normalized density".
`col.dmp`	Color for the density of DMPs in the graphic.
`col.ndmp`	Color for the density of no DMPs in the graphic.
`yintercept`	If plot == TRUE, this is the position for an horizantal line that intercept the y-axis. Default yintercept = 0.25.
`col.yintercept`	Color for the horizantal line 'yintercept'. Default col.yintercept = 'blue'
`type.yintercept`	Line type for the horizantal line 'yintercept'. Default type.yintercept = "dashed".
`dig.lab`	integer which is used when labels are not given. It determines the number of digits used in formatting the break numbers.

Details

Since the number of DIMPs along the DNA sequence vary, the local density of DMPs ρ_i at a fixed interval Δ l_i is defined by the quotient ρ_i = Δ DMP_i/Δ l_i is the amount of DIMPs at the fixed interval. Likewise the local density of non-DIMPs is defined as ρ_i = Δ nonDMP_i/Δ l_i. Notice that for a specified methylation context, e.g., CG, Δ CG_i - Δ DMP_i, where Δ CG is the amount CG positions at the given interval. The linear densities are normalized as ρ_i/ρ_max, where ρ_max is the maximum of linear density found in a given GR.

Value

If plot is TRUE will return a graphic with the densities of DMPs and and no DMPs. If plot is FALSE a data frame object with the density of DMPs and not DMPs will be returned.

Author(s)

Robersy Sanchez

Examples

set.seed(349)
## An auxiliary function to generate simulated hypothetical values from a
## variable with normal distribution
hypDT <- function(mean, sd, n, num.pos, noise) {
    h <- hist(rnorm(n, mean = mean, sd = sd), breaks = num.pos, plot = FALSE)
    hyp <- h$density * 60 + runif(length(h$density)) * noise
    return(hyp)
}

## To generate a matrix of values with variations introduced by noise
hyp <- hypDT(mean = 5, sd = 30, n = 10^5, noise = 4, num.pos = 8000)
## A GRanges object is built, which will carries the previous matrix on its
## meta-columns
l <- length(hyp)
starts <- seq(0, 30000, 3)[1:l]
ends <- starts
GR <- GRanges(seqnames = "chr1", ranges = IRanges(start = starts,
                end = ends))
mcols(GR) <- data.frame(signal = hyp)

# If plot is TRUE a grphic is printed. Otherwise data frame is returned.
p <- dmpDensity(GR, plot = FALSE)

# If ggplot2 package is installed, then graphic can customized using
# the returned data frame 'p':

# library(ggplot2)
## Auxiliar function to write scientific notation in the graphics
# fancy_scientific <- function(l) {
#   #'turn in to character string in scientific notation
#   l <- format( l, scientific = TRUE, digits = 1 )
#   l <- gsub("0e\\+00","0",l)
#   #'quote the part before the exponent to keep all the digits
#   l <- gsub("^(.*)e", "'\\1'e", l)
#   #'turn the 'e+' into plotmath format
#   l <- gsub("e", "%*%10^", l)
#   l <- gsub("[+]", "", l )
#   #'return this as an expression
#   parse(text=l)
# }
#
# max.pos = max(p$DMP.coordinate)
# ggplot(data=p) +
#   geom_line(aes(x=DMP.coordinate, y=DMP.density), color="red") +
#   geom_hline(aes(yintercept=0.25), linetype="dashed",
#              colour="blue", show.legend=FALSE ) +
#   geom_line(aes(x=coordinate, y=density), color="blue") +
#   xlab("Coordinate") + ylab("Normalized density") +
#   scale_y_continuous(breaks=c(0.00, 0.25, 0.50, 0.75, 1.00)) +
#   scale_x_continuous(breaks=c(0.00, 0.25 *max.pos, 0.50*max.pos,
#                               0.75*max.pos, max.pos),
#                      labels = fancy_scientific) +
#   expand_limits(y=0)

[Package MethylIT.utils version 0.3.1 ]