Hey there, great piece. You cleared things up for me while trying to read this paper: https://arxiv.org/pdf/1904.05835.pdf. In there, an experimentation of knowledge transfer from a CNN to an MLP was done for the CIFAR-10 dataset. It’s called Variational Information Distillation (VID). I suggest you take some look at it. It looks like an improvement over the approach you described.
Well done!!!