Regression Test for Count

Perform Poisson and Negative Binomial regression analysis to compare the counts from different groups, treatment and control. The difference between functions 'countTest2' and 'countTest' resides in the estimation of the prior weights used in Negative Binomial generalized linear model.

countTest2(
  DS,
  num.cores = 1,
  countFilter = TRUE,
  CountPerBp = NULL,
  minCountPerIndv = 3,
  maxGrpCV = NULL,
  FilterLog2FC = TRUE,
  pAdjustMethod = "BH",
  pvalCutOff = 0.05,
  MVrate = 0.98,
  Minlog2FC = 0.5,
  test = c("Wald", "LRT"),
  scaling = 1L,
  tasks = 0L,
  saveAll = FALSE,
  verbose = TRUE
)

Arguments

DS: A 'glmDataSet' object, which is created with function glmDataSet.
num.cores, tasks: Parameters for parallel computation using package BiocParallel-package: the number of cores to use, i.e. at most how many child processes will be run simultaneously (see bplapply and the number of tasks per job (only for Linux OS).
countFilter: whether or not to filter the counts according to the minimum count per region per each individual/sample, which is setting by 'minCountPerIndv'.
CountPerBp: for each group the count per bp must be equal or greater than CountPerBp. The filter is applied if 'CountPerBp' is given and if 'x' DESeqDataSet object has the rowRanges as a GRanges object on it
minCountPerIndv: each gene or region must have more than 'minCountPerIndv' counts (on average) per individual in at least one group.
maxGrpCV: A numerical vector. Maximum coefficient of variance for each group. Default maxGrpCV = NULL. The numbers 'maxGrpCV1' and 'maxGrpCV2' will be taken as the maximun variances values permitted in control and in treatment groups, respectively. If only 'maxGrpCV1' is provided, then maxGrpCV = c('maxGrpCV1', 'maxGrpCV1'). This parameter is addressed to prevent testing regions where intra-group variations are very large, e.g.: control = c(1,0,1,1) and treatment = c(1, 0, 1, 40). The coefficient of variance for the treatment group is 1.87, very high. The generalized linear regression analysis would yield statistical significant group differences, but evidently there is something wrong in one of the treatment samples. We would try the application of further statistical smoothing approach, but we prefer to leave the user decide which regions to test.
FilterLog2FC: if TRUE, the results are filtered using the minimum absolute value of log2FoldChanges observed to accept that a gene in the treatment is differentially expressed in respect to the control
pAdjustMethod: method used to adjust the results; default: BH
pvalCutOff: cutoff used then a p-value adjustment is performed.
MVrate: Minimum Mean/Variance rate.
Minlog2FC: minimum logarithm base 2 of fold changes.
test: A character string matching one of 'Wald' or 'LRT'. If test = 'Wald', then the p-value of the Wald test for the coefficient of the independent variable (treatment group) will be reported. If test = 'LRT', then the p-value from a likelihood ratio test given by anova function from stats packages will be the reported p-value for the group comparison when the best fitted model is the negative binomial. As suggested for glm, if best fitted model is Poisson or quasi-Poisson, then the best test is 'Chi-squared' or 'F-test', respectively. So, for the sake of simplicity, the corresponding suitable test will be applied when test = 'LRT'.
scaling: integer (default 1). Scaling factor estimate the signal density as: scaling x 'DMP-Count-Per-Bp'. For example, if scaling = 1000, then signal density denotes the number of DMPs in 1000 bp.
saveAll: if TRUE all the temporal results are returned
verbose: if TRUE, prints the function log to stdout

Value

a data frame or GRanges object (if the DS contain the GRanges information for each gene) with the test results and original count matrix, plus control and treatment signal densities and their variation.

Details

A pairwise group comparison, control versus treatment, is performed. The experimental design settings must be introduced using function glmDataSet to provide dataset (DS) object.

Examples

set.seed(133) # Set a seed
## A GRanges object with the count matrix in the metacolumns is created
countData <- matrix(sample.int(200, 500, replace = TRUE), ncol = 4)
colnames(countData) <- c('A1','A2','B1','B2')

start <- seq(1, 25e4, 2000)
end <- start + 1000
chr <- c(rep('chr1', 70), rep('chr2', 55))
GR <- GRanges(seqnames = chr, IRanges(start = start, end = end))
mcols(GR) <- countData

## Gene IDs
names(GR) <- paste0('gene', 1:length(GR))

## An experiment design is set.
colData <- data.frame(condition = factor(c('A','A','B','B')),
                    c('A1','A2','B1','B2'), row.names = 2)

## A RangedGlmDataSet is created
ds <- glmDataSet(GR = GR, colData = colData)

## The gneralized linear model pairwise group comparison, group 'A'
## ('control') versus 'B' (treatment) is performed.
countTest2(ds, num.cores = 1L, maxGrpCV = c(0.4, 0.4), verbose = FALSE)
#> GRanges object with 4 ranges and 12 metadata columns:
#>           seqnames        ranges strand |        A1        A2        B1
#>              <Rle>     <IRanges>  <Rle> | <integer> <integer> <integer>
#>    gene69     chr1 136001-137001      * |        57        67       117
#>    gene98     chr2 194001-195001      * |        34        39       116
#>   gene101     chr2 200001-201001      * |       166       168       117
#>   gene114     chr2 226001-227001      * |        33        34       111
#>                  B2    log2FC scaled.deviance      pvalue        model
#>           <integer> <numeric>       <numeric>   <numeric>  <character>
#>    gene69       102  0.568790         26.6591 4.17025e-07 Neg.Binomial
#>    gene98       151  1.296789         82.6185 9.57148e-18 Neg.Binomial
#>   gene101        71 -0.574699         13.0830 3.00075e-04 Neg.Binomial
#>   gene114       158  1.390019         64.7083 7.92421e-15 Neg.Binomial
#>              adj.pval CT.SignalDensity TT.SignalDensity SignalDensityVariation
#>             <numeric>        <numeric>        <numeric>              <numeric>
#>    gene69 5.56034e-07        0.0619381        0.1093906              0.0474525
#>    gene98 3.82859e-17        0.0364635        0.1333666              0.0969031
#>   gene101 3.00075e-04        0.1668332        0.0939061             -0.0729271
#>   gene114 1.58484e-14        0.0334665        0.1343656              0.1008991
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths

Arguments

Value

Details

See also

Examples