This function is addressed to read files with methylation count table data commonly generated after the alignment of BS-seq data or found in GEO database
readCounts2GRangesList(
filenames = NULL,
sample.id = NULL,
pattern = NULL,
remove = FALSE,
columns = c(seqnames = NULL, start = NULL, end = NULL, strand = NULL, fraction = NULL,
percent = NULL, mC = NULL, uC = NULL, coverage = NULL, context = NULL, signal = NULL),
chromosome.names = NULL,
chromosomes = NULL,
verbose = TRUE,
...
)
Character vector with the file names
Character vector with the names of the samples corresponding to each file
Chromosome name pattern. Users working on Linux OS can specify the reading of specific lines from each file by using regular expressions.
Logic (TRUE). Usually the supplementary files from GEO datasets are 'gz' compressed. File datasets must be decompressed to be read. The decompressed files are removed after read if this is set 'TRUE'.
Vector of integer numbers denoting the table columns that must be read. The numbers for 'seqnames' (chromosomes), 'start', and 'end' (if different from 'start') columns must be given. The possible fields are: 'seqnames' (chromosomes),'start', 'end', 'strand', 'fraction', percent' (methylation percentage), 'mC' (methylates cytosine), 'uC' (non methylated cytosine), 'coverage', and 'context' (methylation context). These column headers are not required to be in the files. An optional column named 'signal' can be used to include a relevant information about the methylation signal.
If provided, for each GRanges object, chromosome names will be changed to those provided in 'chromosome.names' applying seqlevels(x) <- chromosome.names'. This option permits to use all the functionality of the function 'seqlevels' defined from package 'GenomeInfoDb', which rename, add, and reorder the seqlevels all at once (see ?seqlevels).
If provided, it must be a character vector with the names of the chromosomes that you want to include in the final GRanges objects.
If TRUE, prints the function log to stdout
Additional parameters for 'fread' function from 'data.table' package
A list of GRanges objects
Read tables from files with a table methylation count data using the function fread from the package 'data.table' and and yields a list of GRanges objects with the information provided.
## Create a cov file with it's file name including 'gz'
## 'gz' (tarball extension)
filename <- './file.cov'
gr1 <- data.frame(chr = c('chr1', 'chr1'), post = c(1,2),
strand = c('+', '-'), ratio = c(0.9, 0.5),
context = c('CG', 'CG'), CT = c(20, 30))
filename <- './file.cov'
write.table(as.data.frame(gr1), file = filename,
col.names = TRUE, row.names = FALSE, quote = FALSE)
## Read the file. It does not work. Typing mistake: 'fractions'
LR <- try(readCounts2GRangesList(filenames = filename, remove = FALSE,
sample.id = 'test',
columns = c(seqnames = 1, start = 2,
strand = 3, fractions = 4,
context = 5, coverage = 6)),
silent = TRUE)
file.remove(filename) # Remove the file
#> [1] TRUE