Qingkai Dong

I am a PhD student in Statistics at the University of Connecticut, advised by HaiYing Wang and Jun Yan. My work centers on massive-data subsampling, model averaging, and survival analysis, with a steady interest in building practical statistical software.

Current Focus

Efficient statistical modeling for large-scale data, software development, and real-world application.

Software

Maintainer of the R package subsampling on CRAN.

Research Interests

Subsampling methods for scalable statistical modeling

I study principled subsampling strategies that preserve statistical efficiency while reducing computational cost. This includes work on generalized linear models, quantile regression, rare events, and rare binary features.

Survival analysis and model averaging

My background includes accelerated failure time models, additive hazards models, and frequentist model averaging methods for censored outcomes.

Methodological research motivated by applied scientific settings

I am especially interested in work where theoretical development, computation, and application remain tightly connected, including clinical trial design and other collaborative data problems.

My Research Projects

Statistical Methodology

Preserving Rare Features in Big Data Regression: Balanced Subsampling

Research Assistant • Aug 2023 - Present • University of Connecticut

I am developing methodology for regression settings with rare binary covariates, including theory clarifying why estimation becomes unstable and a balanced subsampling framework that improves rare-feature representation without pilot sampling.

Software

R package subsampling

University of Connecticut

I implemented scalable subsampling methods in R for generalized linear models, generalized linear models with rare features, softmax regression, rare event logistic regression, and quantile regression, with accompanying documentation and reproducible examples.

Academia-Industry Collaboration

Adaptive clinical trial design with Servier

Research Assistant • Mar 2024 - Dec 2025 • Servier Pharmaceuticals and University of Connecticut

I coauthored work on predictive-modeling-assisted interim analysis for censored time-to-event trials, including covariate-informed prediction for censored participants and evaluation metrics for conditional power accuracy and futility decisions.

Applied Collaboration

SDOH, frailty, and accelerated aging in breast cancer patients

Research Assistant • Jul 2024 - Aug 2024

I contributed data preprocessing and statistical analysis on All of Us data to study associations among social determinants of health, frailty, and accelerated aging.

Applied Collaboration

Sensor-response analysis for nitroaromatic compounds

Research Project • Mar 2024 - May 2024

I analyzed fluorescence response data from porphyrinoid sensors using clustering and statistical summaries to support compound differentiation and sensor selection.

Cross-Disciplinary Collaboration

Time series explainability and LLM-enabled semantic interpretation

Collaboration • 2025 - Present

I am contributing to a survey project on time series explainability with an emphasis on LLM-enabled semantic explanations, including benchmark curation and related research synthesis; the manuscript is currently under review.

Publications

Selected papers and links to PDFs.

Weighted Least Squares Model Averaging for Accelerated Failure Time Models

Dong Q., Liu B., Zhao H. • Computational Statistics and Data Analysis, 2023

The Jackknife Model Averaging of Accelerated Failure Time Model with Current Status Data

Zhao H., Liu B., Dong Q., Zhang X. • Acta Mathematicae Applicatae Sinica, 2023

In Chinese.

A Variable Selection Method for the Additive Hazards Model with Current Status Data

Zhao H., Dong Q. • Journal of Systems Science and Mathematical Sciences, 2022

In Chinese.

Education

Formal training and academic background.

University of Connecticut

PhD in Statistics • Sep 2023 - Present • Storrs, CT

Advisors: HaiYing Wang and Jun Yan.

Zhongnan University of Economics and Law

MS in Mathematical Statistics • Sep 2020 - Jun 2023 • Wuhan, China

Thesis: Variable Selection and Model Averaging Methods of Accelerated Failure Time Models.

Qingdao University

BS in Applied Statistics • Sep 2016 - Jun 2020 • Qingdao, China

Thesis: Automatic Marking System based on Convolutional Neural Networks.

Teaching

Courses

  • Principal Instructor, STAT 3675Q Statistical Computing, University of Connecticut, Aug 2025 - May 2026.
  • Teaching Assistant, Probability Theory, Zhongnan University of Economics and Law, Sep 2020 - Jan 2021.

Service and Awards

Presentations

  • New England Rare Disease Statistics Workshop, Boston, MA, poster presentation, Oct 2025.
  • Dahshu Data Science Symposium, Storrs, CT, poster presentation, Oct 2025.

Service and Awards

  • Reviewer for Sankhya B, Statistical Papers, and Journal of Systems Science and Mathematical Sciences.
  • Predoc Fellowship, University of Connecticut, Jan 2025.
  • First-class Scholarship, Zhongnan University of Economics and Law, 2020 and 2022.

Contact

Documents, profiles, and a few quick facts.

Programming R, Python, Matlab, C++, LaTeX
Languages English, Chinese