Optimal Subsampling Method for Softmax (multinomial logistic) Regression Model
Source:R/softmax_main_function.R
ssp.softmax.Rd
Draw subsample from full dataset and fit softmax(multinomial logistic) regression model on the subsample. Refer to vignette for a quick start.
Usage
ssp.softmax(
formula,
data,
subset,
n.plt,
n.ssp,
criterion = "MSPE",
sampling.method = "poisson",
likelihood = "MSCLE",
constraint = "summation",
control = list(...),
contrasts = NULL,
...
)
Arguments
- formula
A model formula object of class "formula" that describes the model to be fitted.
- data
A data frame containing the variables in the model. Denote \(N\) as the number of observations in
data
.- subset
An optional vector specifying a subset of observations from
data
to use for the analysis. This subset will be viewed as the full data.- n.plt
The pilot subsample size (first-step subsample size). This subsample is used to compute the pilot estimator and estimate the optimal subsampling probabilities.
- n.ssp
The expected size of the optimal subsample (second-step subsample). For
sampling.method = 'withReplacement'
, The exact subsample size isn.ssp
. Forsampling.method = 'poisson'
,n.ssp
is the expected subsample size.- criterion
The criterion of optimal subsampling probabilities. Choices include
optA
,optL
,MSPE
(default),LUC
anduniform
.MSPE
Minimizes the mean squared prediction error between subsample estimator and full data estimator.optA
Minimizes the trace of the asymptotic covariance matrix of the subsample estimator.optL
Minimizes the trace of a transformation of the asymptotic covariance matrix, which reduces computational costs thanoptA
.LUC
Local uncertainty sampling method, serving as a baseline subsampling strategy. See Wang and Kim (2022).uniform
Assigns equal subsampling probability \(\frac{1}{N}\) to each observation, serving as a baseline subsampling strategy.
- sampling.method
The sampling method to use. Choices include
withReplacement
andpoisson
(default).withReplacement
draws exactlyn.ssp
subsamples from size \(N\) full dataset with replacement, using the specified subsampling probabilities.poisson
draws observations independently by comparing each subsampling probability with a realization of uniform random variable \(U(0,1)\). Differences between methods:Sample size:
withReplacement
draws exactlyn.ssp
subsamples whilepoisson
draws subsamples with expected sizen.ssp
, meaning the actual size may vary.Memory usage:
withReplacement
requires the entire dataset to be loaded at once, whilepoisson
allows for processing observations sequentially (will be implemented in future version).Estimator performance: Theoretical results show that the
poisson
tends to get a subsample estimator with lower asymptotic variance compared to thewithReplacement
- likelihood
A bias-correction likelihood function is required for subsample since unequal subsampling probabilities introduce bias. Choices include
weighted
andMSCLE
(default).weighted
Applies a weighted likelihood function where each observation is weighted by the inverse of its subsampling probability.MSCLE
It uses a conditional likelihood, where each element of the likelihood represents the density of \(Y_i\) given that this observation was drawn.
- constraint
The constraint for identifiability of softmax model. Choices include
baseline
andsummation
(default). The baseline constraint assumes the coefficient for the baseline category are \(0\). Without loss of generality, we set the category \(Y=0\) as the baseline category so that \(\boldsymbol{\beta}_0=0\). The summation constraint \(\sum_{k=0}^{K} \boldsymbol{\beta}_k\) is also used in the subsampling method for the purpose of calculating subsampling probability. These two constraints lead to different interpretation of coefficients but are equal for computing \(P(Y_{i,k} = 1 \mid \mathbf{x}_i)\). The estimation of coefficients returned byssp.softmax()
is under baseline constraint.- control
A list of parameters for controlling the sampling process. There are two tuning parameters
alpha
andb
. Default islist(alpha=0, b=2)
.alpha
\(\in [0,1]\) is the mixture weight of the user-assigned subsampling probability and uniform subsampling probability. The actual subsample probability is \(\pi = (1-\alpha)\pi^{opt} + \alpha \pi^{uni}\). This protects the estimator from extreme small subsampling probability. The default value is 0.b
is a positive number which is used to constaint the poisson subsampling probability.b
close to 0 results in subsampling probabilities closer to uniform probability \(\frac{1}{N}\).b=2
is the default value. See relevant references for further details.
- contrasts
An optional list. It specifies how categorical variables are represented in the design matrix. For example,
contrasts = list(v1 = 'contr.treatment', v2 = 'contr.sum')
.- ...
A list of parameters which will be passed to
nnet::multinom()
.
Value
ssp.softmax returns an object of class "ssp.softmax" containing the following components (some are optional):
- model.call
The original function call.
- coef.plt
The pilot estimator. See Details for more information.
- coef.ssp
The estimator obtained from the optimal subsample.
- coef
The weighted linear combination of
coef.plt
andcoef.ssp
, under baseline constraint. The combination weights depend on the relative size ofn.plt
andn.ssp
and the estimated covariance matrices ofcoef.plt
andcoef.ssp.
We blend the pilot subsample information into optimal subsample estimator since the pilot subsample has already been drawn. The coefficients and standard errors reported by summary arecoef
and the square root ofdiag(cov)
.- coef.plt.sum
The pilot estimator under summation constrraint.
coef.plt.sum = G %*% as.vector(coef.plt)
.- coef.ssp.sum
The estimator obtained from the optimal subsample under summation constrraint.
coef.ssp.sum = G %*% as.vector(coef.ssp)
.- coef.sum
The weighted linear combination of
coef.plt
andcoef.ssp
, under summation constrraint.coef.sum = G %*% as.vector(coef)
.- cov.plt
The covariance matrix of
coef.plt
.- cov.ssp
The covariance matrix of
coef.ssp
.- cov
The covariance matrix of
coef.cmb
.- cov.plt.sum
The covariance matrix of
coef.plt.sum
.- cov.ssp.sum
The covariance matrix of
coef.ssp.sum
.- cov.sum
The covariance matrix of
coef.sum
.- index.plt
Row indices of pilot subsample in the full dataset.
- index.ssp
Row indices of of optimal subsample in the full dataset.
- N
The number of observations in the full dataset.
- subsample.size.expect
The expected subsample size.
- terms
The terms object for the fitted model.
Details
A pilot estimator for the unknown parameter \(\beta\) is required because MSPE, optA and
optL subsampling probabilities depend on \(\beta\). There is no "free lunch" when determining optimal subsampling probabilities. For softmax regression, this
is achieved by drawing a size n.plt
subsample with replacement from full
dataset with uniform sampling probability.
References
Yao, Y., & Wang, H. (2019). Optimal subsampling for softmax regression. Statistical Papers, 60, 585-599.
Han, L., Tan, K. M., Yang, T., & Zhang, T. (2020). Local uncertainty sampling for large-scale multiclass logistic regression. Annals of Statistics, 48(3), 1770-1788.
Wang, H., & Kim, J. K. (2022). Maximum sampled conditional likelihood for informative subsampling. Journal of machine learning research, 23(332), 1-50.
Yao, Y., Zou, J., & Wang, H. (2023). Optimal poisson subsampling for softmax regression. Journal of Systems Science and Complexity, 36(4), 1609-1625.
Yao, Y., Zou, J., & Wang, H. (2023). Model constraints independent optimal subsampling probabilities for softmax regression. Journal of Statistical Planning and Inference, 225, 188-201.
Examples
# softmax regression
d <- 3 # dim of covariates
K <- 2 # K + 1 classes
G <- rbind(rep(-1/(K+1), K), diag(K) - 1/(K+1)) %x% diag(d)
N <- 1e4
beta.true.baseline <- cbind(rep(0, d), matrix(-1.5, d, K))
beta.true.summation <- cbind(rep(1, d), 0.5 * matrix(-1, d, K))
set.seed(1)
mu <- rep(0, d)
sigma <- matrix(0.5, nrow = d, ncol = d)
diag(sigma) <- rep(1, d)
X <- MASS::mvrnorm(N, mu, sigma)
prob <- exp(X %*% beta.true.summation)
prob <- prob / rowSums(prob)
Y <- apply(prob, 1, function(row) sample(0:K, size = 1, prob = row))
n.plt <- 500
n.ssp <- 1000
data <- as.data.frame(cbind(Y, X))
colnames(data) <- c("Y", paste("V", 1:ncol(X), sep=""))
head(data)
#> Y V1 V2 V3
#> 1 2 -0.3756189 -0.17727086 -0.9816025
#> 2 2 0.2912939 0.60753208 -0.4489936
#> 3 2 -1.0530547 0.02079337 -1.0146024
#> 4 0 0.1854791 2.45385260 1.2682922
#> 5 0 0.8687332 0.21941612 -0.2810234
#> 6 1 -0.8336174 -0.32556141 -0.8505501
formula <- Y ~ . -1
WithRep.MSPE <- ssp.softmax(formula = formula,
data = data,
n.plt = n.plt,
n.ssp = n.ssp,
criterion = 'MSPE',
sampling.method = 'withReplacement',
likelihood = 'weighted',
constraint = 'baseline')
#> [1] "Message from nnet::multinom: "
#> # weights: 12 (6 variable)
#> initial value 549.306144
#> iter 10 value 367.125951
#> final value 365.836516
#> converged
#> [1] "Message from nnet::multinom: "
#> # weights: 12 (6 variable)
#> initial value 11677047.319684
#> iter 10 value 7149711.444798
#> iter 20 value 6901692.808182
#> final value 6901684.550943
#> converged
summary(WithRep.MSPE)
#> Model Summary
#>
#>
#> Call:
#>
#> ssp.softmax(formula = formula, data = data, n.plt = n.plt, n.ssp = n.ssp,
#> criterion = "MSPE", sampling.method = "withReplacement",
#> likelihood = "weighted", constraint = "baseline")
#>
#> Subsample Size:
#>
#> 1 Total Sample Size 10000
#> 2 Expected Subsample Size 1000
#> 3 Actual Subsample Size 1000
#> 4 Unique Subsample Size 927
#> 5 Expected Subample Rate 10%
#> 6 Actual Subample Rate 10%
#> 7 Unique Subample Rate 9.27%
#>
#> Coefficients:
#>
#> [,1] [,2]
#> V1 -1.455068 -1.406786
#> V2 -1.647116 -1.492484
#> V3 -1.248385 -1.435806
#>
#> Std. Errors:
#>
#> [,1] [,2]
#> V1 0.1127325 0.1122913
#> V2 0.1133850 0.1108925
#> V3 0.1202163 0.1178532