Principal components (PCs) are estimated from the predictor variables provided as input data. Next, the individual coordinates in the selected PCs are used as predictors in the logistic regression.

Logistic regression using Principal Components from PCA as predictor variables

pcaLogisticR(
  formula = NULL,
  data = NULL,
  n.pc = 1,
  scale = FALSE,
  center = FALSE,
  tol = 1e-04,
  max.pc = NULL
)

predict.pcaLogisticR(
  object,
  newdata,
  type = c("class", "posterior", "pca.ind.coord", "all"),
  ...
)

Arguments

formula

Same as in 'glm' from package 'stats'. One term carrying interaction between two variables can be introduced (with notation as indicated in formula function).

data

Same as in 'glm' from package 'stats'.

n.pc

Number of principal components to use in the logistic.

scale

Same as in 'prcomp' from package 'prcomp'.

center

Same as in 'prcomp' from package 'prcomp'.

tol

Same as in 'prcomp' from package 'prcomp'.

max.pc

Same as in parameter 'rank.' from package 'prcomp'.

object

To use with function 'predict'. A 'pcaLogisticR' object containing a list of two objects: 1) an object of class inheriting from 'glm' and 2) an object of class inheriting from 'prcomp'.

newdata

To use with function 'predict'. New data for classification prediction

type

To use with function 'predict'. The type of prediction required: 'class', 'posterior', 'pca.ind.coord', or 'all'. If type = 'all', function 'predict.pcaLogisticR' ('predict') returns a list with: 1) 'class': individual classification. 2) 'posterior': probabilities for the positive class. 3) 'pca.ind.coord': PC individual coordinate. Each element of this list can be requested independently using parameter 'type'.

...

Not in use.

Value

Function 'pcaLogisticR' returns an object ('pcaLogisticR' class) containing a list of two objects:

  1. 'logistic': an object of class 'glm' from package 'stats'.

  2. 'pca': an object of class 'prcomp' from package 'stats'.

  3. reference.level: response level used as reference.

  4. positive.level: response level that corresponds to a 'positive' result. When type = 'response', the probability vector returned correspond to the probabilities of each individual to be a result, i.e., the probability to belong to the class of positive level.

For information on how to use these objects see ?glm and ?prcomp.

Details

The principal components (PCs) are obtained using the function prcomp, while the logistic regression is performed using function glm, both functions from R package 'stats'. The current application only use basic functionalities from the mentioned functions. As shown in the example, 'pcaLogisticR' function can be used in general classification problems.

Examples

data(iris)
data <- iris[ iris$Species != 'virginica', ]
data$Species <- droplevels(data$Species)
formula <- Species ~ Petal.Length + Sepal.Length + Petal.Width
pca.logistic <- pcaLogisticR(formula = formula,
                            data = data, n.pc = 2, scale = TRUE,
                            center = TRUE, max.pc = 2)
set.seed(123)
newdata <- iris[sample.int(150, 40), 1:4]
newdata.prediction <- predict(pca.logistic, newdata, type = 'all')