not really known
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

258 lines
11 KiB

  1. % ************* proposal.tex *************
  2. % authors: Jeffery Russell, Ryan M, Kyle R
  3. %
  4. % Project Proposal for CSCI-431
  5. % Initial Draft March 6, 2020
  6. \documentclass[12pt,
  7. reprint,
  8. %superscriptaddress,
  9. %groupedaddress,
  10. %unsortedaddress,
  11. %runinaddress,
  12. %frontmatterverbose,
  13. %preprint,
  14. %preprintnumbers,
  15. %nofootinbib,
  16. %nobibnotes,
  17. %bibnotes,
  18. amsmath,amssymb,
  19. aps,
  20. %pra,
  21. %prb,
  22. %rmp,
  23. %prstab,
  24. %prstper,
  25. %floatfix,
  26. ]{revtex4-2}
  27. \usepackage{graphicx}% Include figure files
  28. \usepackage{dcolumn}% Align table columns on decimal point
  29. \usepackage{bm}% bold math
  30. %\usepackage{hyperref}% add hypertext capabilities
  31. %\usepackage[mathlines]{lineno}% Enable numbering of text and display math
  32. %\linenumbers\relax % Commence numbering lines
  33. %\usepackage[showframe,%Uncomment any one of the following lines to test
  34. %%scale=0.7, marginratio={1:1, 2:3}, ignoreall,% default settings
  35. %%text={7in,10in},centering,
  36. %%margin=1.5in,
  37. %%total={6.5in,8.75in}, top=1.2in, left=0.9in, includefoot,
  38. %%height=10in,a5paper,hmargin={3cm,0.8in},
  39. %]{geometry}
  40. \begin{document}
  41. \preprint{APS/123-QED}
  42. \title{A Comparison of Different GANs for Generating Handwritten Digits on MNIST}
  43. \thanks{Submitted as a CSCI-431 assignment at RIT}%
  44. \author{Jeffery B. Russell}
  45. \email{jeffery@jrtechs.net, jxr8142@rit.edu}
  46. \affiliation{%
  47. Fourth Year Computer Science Student at RIT\\
  48. CUBRC Research Assistant\\
  49. RITlug President
  50. }%
  51. \author{Ryan Missel}
  52. \email{rxm7244@rit.edu}
  53. \affiliation{%
  54. Fifth Year Computer Science Student at RIT\\
  55. CASCI Research Assistant
  56. }%
  57. \author{Kyle Rivenburgh}
  58. \email{ktr5669@rit.edu}
  59. \affiliation{%
  60. Fifth Year Computer Science Student at RIT\\
  61. }%
  62. \date{\today}% It is always \today, today,
  63. % but any date may be explicitly specified
  64. \begin{abstract}
  65. Generative Adversarial Networks have emerged as a powerful and customizable class of machine learning algorithms within the past half a decade. They learn the distribution of a dataset for the purposes of generating realistic synthetic samples. It is an active field of research with massive improvements yearly, addressing fundamental limitations of the class and improving on the quality of generated figures. GANs have been successfully applied to music synthesis, face generation, and text-to-image translation.
  66. Within this work, we will look at a variety of GAN architectures and how they compare qualitatively on the popular MNIST dataset. We will explore how differing architectures affect time of convergence, quality of the resulting images, and complexity in training. The theoretical justifications and shortcomings of each methodology will be explored in detail, such that an intuition can be formed on choosing the right architecture for a problem.
  67. \begin{description}
  68. \item[Keywords]
  69. Computer Vision, Generative Adversarial Networks, Machine Learning, MNIST
  70. \end{description}
  71. \end{abstract}
  72. \maketitle
  73. %\tableofcontents
  74. \section{\label{sec:background}Background}
  75. % discuss the background of Neural networks
  76. Neural networks (NN) were first developed by Bernard Widrow and Marcian Hoff of Stanford in 1959 under the name of MADALINE (Multiple Adaptive Linear Element) \cite{widrow1962generalization}.
  77. Neural networks were designed with inspiration taken from biological neurons in human brains.
  78. Artificial neurons aggregate information from other neurons and fire off a signal depending on the strengths of previous inputs, which is analogous with how human neurons operate.
  79. Neural networks falls into the categorization of supervised learning in artificial intelligence (AI).
  80. Under supervised learning, the algorithm needs to be fed in labeled data in order to make future classifications/predictions.
  81. This is opposed to unsupervised learning which needs no training data -- an example of unsupervised learning would be clustering.
  82. % good fello dissertation on GAN
  83. GANs were first proposed by Ian J. Goodfellow in his PhD dissertation in 2014 at the Université de Montréal \cite{goodfellow2014generative}.
  84. The proposed architecture is a dual neural network system, in which a generative model learns to generate realistic samples from a distribution in order to compete against a discriminator that classifies fake images.
  85. These models are trained in tandem with one another, both learning from random initialization how to best one another.
  86. A successful result of training is when the Nash Equilibrium between the two models is found.
  87. This occurs when the generator has learned the distribution of the data well enough to the point that the discriminator is only as good as random chance.
  88. \begin{figure}[h!]
  89. \centering
  90. \includegraphics[width=9cm]{gan-arch.jpg}
  91. \caption{Architecture of a GAN}
  92. \label{fig:jupyter_server}
  93. \end{figure}
  94. % current state of the art
  95. Since the advent of GANs in 2014, they have vastly improved and have blown up in the AI research field.
  96. State of the art research in GANs is currently focusing at applications in video and voice data.
  97. \subsection{\label{sec:applications}Applications}
  98. GANs have been applied to many problems \cite{overviewDocument}.\\
  99. A sampling of some of the problems are listed below.
  100. \begin{itemize}
  101. \item Image Generation
  102. \item Music Generation
  103. \item Style Transfer
  104. \item Video Prediction
  105. \item Super Resolution
  106. \item Text to Image
  107. \end{itemize}
  108. \subsection{\label{sec:level2}Deep Convolutional Generative Adversarial Network}
  109. Deep Convolutional Generative Adversarial Networks, DCGAN for short, is an architectural modification on the original GAN, in which the generator and discriminator models are reflections of one another.
  110. The makeup of each network is a multi-layer, deep convolutional neural network. The idea behind this architecture is that by reflecting the network structure between the two, the computational capacities of each network to learn their respective tasks is equal \cite{radford2015unsupervised}.
  111. In doing this, it should stabilize competitive learning between the two agents and result in an smoother learning, avoiding cases of one network dominance.
  112. \subsection{\label{sec:wgan}Wasserstein Generative Adversarial Networks}
  113. Wasserstein Generative Adversarial Networks, or WGANs for short, were an improvement on the vanilla GAN proposed by Martin Arjovsky, et al in 2017 \cite{arjovsky2017wasserstein}.
  114. The motivation behind this work is modifying the task of the discriminator in order to stabilize the training between the networks.
  115. Instead of having a simple binary classifier that predicts whether an image is real or fake, the discriminator is modified to output the likelihood estimate of the "realness" or "fakeness" of an image.
  116. The theoretical idea is that this continuous estimation incentivizes the generator to minimize the distance between the distribution of its generated images and the real images more than the standard discriminator design.
  117. Empirically, this design has shown greater results over the standard GAN architecture in terms of training and architecture stability, as well as being more robust to hyper-parameter configurations.
  118. \section{\label{sec:goals}Goals}
  119. % project overview... what we are doing
  120. This project applies three different GAN architectures to generating handwritten images from the MNIST dataset.
  121. We are going to compare: vanilla GANs, DCGANs, and WGANs.
  122. Using the results of the three different architectures we wish to judge the performance based on three performance criteria:
  123. \begin{itemize}
  124. \item Perceived Quality of Images
  125. \item Time required to train
  126. \item Training data required
  127. \end{itemize}{}
  128. % MNIST data set
  129. The Modified National Institute of Standards and Technology database (MNIST database) is a dataset comprising of seventy thousand handwritten digits.
  130. Sixty thousand of those images are partitioned for training and the remaining ten thousand are left for testing and validation.
  131. We are using the MNIST dataset because it is the de facto standard when it comes to machine learning on images.
  132. \subsection{\label{sec:researchQuestions}Research Questions}
  133. \begin{itemize}
  134. \item Which GAN architecture performs best on the MNIST dataset?
  135. \item What are the quantitative differences between these architectures in terms of stability of training, and quality of the results?
  136. \item How does required training time and convergence rate differ between GAN architectures?
  137. \end{itemize}
  138. % ------------------------------------------------ Implementation -----
  139. \section{\label{sec:implementation}Implementation}
  140. % go over how each algorithm was implemented,
  141. % possibly link to github with code
  142. \subsection{\label{sec:impVanilla}Vanilla Generative Adversarial Network}
  143. % section covering basic GAN implementation
  144. \subsection{\label{sec:impDCGAN}Deep Generative Adversarial Network}
  145. % section covering code used to run DCGAN
  146. \subsection{\label{sec:impWGAN}Wasserstein Generative Adversarial Network}
  147. % section covering WGAN code
  148. %---------------------------------------------- end implementation
  149. %---------------------------------------------- experiment --------------
  150. \section{\label{sec:exp}Experiments}
  151. % goes over the tests ran in the experiment
  152. This section goes over in depth the experiments ran in this project and the results produced from them.
  153. \subsection{\label{sec:dataSet}Data Set}
  154. % describe the mnist data set
  155. \subsection{\label{sec:expQuality}Quality}
  156. % simple test where we show our best outputs from each gan
  157. \subsection{\label{sec:expTime}Time for Training}
  158. % time for each generation? Sorta wishy washy on this one
  159. \subsection{\label{sec:expData}Quantity of Training Data}
  160. % vary the amount of training data available to the gans
  161. %---------------------------------------- end experiment ----------------
  162. \section{\label{sec:exp}Conclusions}
  163. % high level conclusion of results and future work
  164. This project paves a useful survey and comparison of three popular GAN architectures. Based on the results we can conclude that....
  165. Future work for this project would entail researching more GAN architectures like Conditional GANS (CGANS), Least Square GANs (LSGAN), Auxiliary Classifier GAN (ACGAN), and Info GANS (infoGAN) \cite{cGAN, lsgan, acgan, infogan},. Another avenue of research would be to examine how the results of our experiments on the MNIST dataset hold up against different data-sets.
  166. Since this is such a new algorithm in the field of Artificial intelligence, people are still actively doing a ton of research in GANs pushing them at the forefront of cutting edge. As GANs become more widely used in the public and private sector, we are sure to see a lot more research into the applications of GANs.
  167. \section{Acknowledgment}
  168. This project was submitted as a RIT CSCI-431 project for professor Sorkunlu's class.
  169. % bibliography via bibtex
  170. \bibliographystyle{ACM-Reference-Format}
  171. \bibliography{proposal}
  172. \end{document}
  173. %
  174. % ****** End of file proposal.tex ******