Parallel Job Split — pjob_split • MethylIT

This function is addressed to set parallel jobs in a multicore server in analogous way as in a computer cluster. The purpose is to prevent nested parallel loops. So, each job can be able to implement independent parallel computation.

This function will be useful when working with methylomes from species with large chromosomes, e.g., human methylomes, where the computation must be done by chromosome. So, 'pjob_split' will permit the application of a 'divide-and-conquer' strategy.

The maximum number of cores in use will be the number of screen sessions multiplied by the number of cores used by the functions working in the execution of the file given in 'filename' argument.

pjob_split(filename, num.jobs = 1L, args = NULL, verbose = TRUE)

Arguments

filename: The path to the file where the script is found.
num.jobs: An integer. The number of jobs.
args: Additional arguments to pass to the script function. If NULL, still the job index is passed and users can use it in the downstream analysis.
verbose: If TRUE it will report the progress of the job.

Details

This function is set only for Linux OS, so far. The user must check the status of its job in a Linux terminal. For example, typing 'top' or 'htop', and searching for R job linked to their user names.

The Linux 'screen' app must be installed. Users can create their own scripts and must test whether the script is working or not before start any big computation.

For further details, please, the example.

Examples

## Save the script below in some place, say, '~/script4pjob_split.R'
if (FALSE) {
    ## Argument passed from the system terminal to the script execution
    ## In the current case chromosomes
    arg <- commandArgs(trailingOnly = TRUE)
    k <- seq(arg[1], arg[2], 1)

    library(MethylIT)

    ## Most of the time chromosome names are just numbers (as frequentely
    ## found in Arabidopsis). In human there are numbers and letters ('X'
    ## and 'Y')
    chr <- c("chr1", "chr2", "chr3", "chr4")

    ## Loading information divergence dataset provided with MethylIT
    data(hdiv) # This would be just the path to a folder
    paths <- "~"


    d <- c("Weibull2P", "Gamma2P")
    for (h in k) {

        ## A plausible example:
        ## load(paste0(paths, "/filename.RData"))
        ## -------------------------------------
        ## Here, we want to compute the best fitted model per chromosome.
        ## In the current case, we assume that each sample come from
        ## Arabidopsis with five chromosomes.

        ## Select chromosome 'h'
        hd <- lapply(hdiv, function(x) {
            seqlevels(x, pruning.mode = "coarse") <- chr[h]
            return(x)
        }, keep.attr = TRUE)


        gof_jd <- gofReport(
            HD = hd,
            model = d,
            column = 10L,
            num.cores = 10L,
            alt_models = TRUE,
            r.cv = TRUE,
            output = "all",
            verbose = FALSE)

        save(gof_jd,
            file = paste0(paths, "/gof_jd_test_", chr[h],".RData"))
    }
}

## Next, make sure that your script works by typing in a
## terminal: 'Rscript ~/script4pjob_split.R 1'. Next, run:
if (FALSE) {
 pjob_split( filename = "~/script4pjob_split.R",
             args = NULL,
             num.jobs = 4L,
             verbose = TRUE)
}
## Four 'RData' files with the names 'gof_jd_test_chr[i].RData'
## (i= 1,...4) must be created.
## If the computation is taken enough time, then in the terminal you can
## type: 'screen -r' to see the computations taking place. If all the
## sessions are still working, then you will see them named as:
## PID.s18 to PID.s23, where PID is process id. A specific screen session
## can be retrieved typing, e.g.: 'screen -r PID.s18'.
## To remove the saved files:
if (FALSE) {
 k <- seq(job.start, job.end, 1)
 for (h in k) {
    file.remove(paste0(paths, "/gof_jd_test_", chr[h], ".RData"))
 }
}