Deep Generative Model Applications

This section organizes the research work of deep generative models on various categories of data.

  • Images

    • Kingma, D. P., and Welling, M. (2014), “Auto-Encoding Variational Bayes,” in Proceedings of the International Conference on Learning Representations.
    • Gregor, K., Besse, F., Rezende, D. J., Danihelka, I., and Wierstra, D. (2016), “Towards conceptual compression,” in Advances In Neural Information Processing Systems, pp. 3549–3557.
    • Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., and Welling, M. (2016), “Improved variational inference with inverse autoregressive flow,” in Advances in Neural Information Processing Systems, pp. 4743–4751.
    • Gulrajani, I., Kumar, K., Ahmed, F., Taiga, A. A., Visin, F., Vazquez, D., and Courville, A. (2016), “Pixelvae: A latent variable model for natural images,” arXiv preprint arXiv:1611.05013.
    • Bachman, P. (2016), “An Architecture for Deep, Hierarchical Generative Models,” in Advances in Neural Information Processing Systems 29, eds. D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Curran Associates, Inc., pp. 4826–4834.
    • Maaløe, L., Fraccaro, M., Liévin, V., and Winther, O. (2019), “BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling,” arXiv:1902.02102 [cs, stat].
  • Scenes

    • S. A. Eslami, N. Heess, T. Weber, Y. Tassa, D. Szepesvari, and G. E. Hinton, “Attend, infer, repeat: Fast scene understanding with generative models,” in Advances in Neural Information Processing Systems, 2016, pp. 3225–3233.
    • D. J. Rezende, S. A. Eslami, S. Mohamed, P. Battaglia, M. Jaderberg, and N. Heess, “Unsupervised learning of 3d structure from images,” in Advances in Neural Information Processing Systems, 2016, pp. 4996–5004.
    • S. A. Eslami et al., “Neural scene representation and rendering,” Science, vol. 360, no. 6394, pp. 1204–1210, 2018.
  • Audio

    • Speech and human voices
      • K. Akuzawa, Y. Iwasawa, and Y. Matsuo, “Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder,” arXiv:1804.02135 [cs, eess], Apr. 2018.
      • W.-N. Hsu, Y. Zhang, and J. Glass, “Learning Latent Representations for Speech Generation and Transformation,” arXiv:1704.04222 [cs, stat], Apr. 2017.
    • Mehri, S., Kumar, K., Gulrajani, I., et al.2016. SampleRNN: An unconditional end-to-end neural audio generation model. arXiv preprint arXiv:1612.07837.
      • Oord, A. van den, Dieleman, S., Zen, H., et al.2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.
  • Video

    • Kalchbrenner, N., van den Oord, A., Simonyan, K., et al.2017. Video pixel networks. Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org, 1771–1779.
    • Finn, C., Goodfellow, I., and Levine, S.2016. Unsupervised learning for physical interaction through video prediction. Advances in neural information processing systems, 64–72.
  • Chemistry or physics

    • R. Gómez-Bombarelli et al., “Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules,” ACS Cent. Sci., vol. 4, no. 2, pp. 268–276, Feb. 2018.
    • M. M. Sultan, H. K. Wayment-Steele, and V. S. Pande, “Transferable neural networks for enhanced sampling of protein dynamics,” Journal of chemical theory and computation, vol. 14, no. 4, pp. 1887–1894, 2018.
    • C. X. Hernández, H. K. Wayment-Steele, M. M. Sultan, B. E. Husic, and V. S. Pande, “Variational encoding of complex dynamics,” Physical Review E, vol. 97, no. 6, p. 062412, 2018.
  • Biological or medical

  • Robotics and controlling

    • H. van Hoof, N. Chen, M. Karl, P. van der Smagt, and J. Peters, “Stable reinforcement learning with autoencoders for tactile and visual data,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 3928–3934.
    • T. Inoue, S. Chaudhury, G. De Magistris, and S. Dasgupta, “Transfer learning from synthetic to real images using variational autoencoders for robotic applications,” arXiv:1709.06762 [cs], Sep. 2017.
    • D. Park, Y. Hoshi, and C. C. Kemp, “A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1544–1551, 2018.