Gintare Karolina Dziugaite
In deep learning, networks are trained with stochastic gradient descent to zero training error. Despite deep networks interpolating the training data, they often perform well in practice. In this talk, I will present some of my recent work on interpolating predictors. I will discuss some of the roadblocks that arise in using uniform bounds to explain performance of complex models. I will then describe a new role that uniform bounds may play in studying interpolating predictors by (i) defining uniform convergence when the model complexity grows with sample size; (ii) studying generalization error of an interpolating predictor in terms of a “derandomized” surrogate hypothesis, where a predictor is partially derandomized or rerandomized, e.g., fit to the training data but with modified label noise. As an application of “derandomized” surrogate analysis, I will present our results on over-parameterized linear regression. This is joint work with Jeffrey Negrea and Daniel M. Roy.