“我比Goodfellow提前三年想到了GAN”-大数据文摘的财新博客-财新网

作者：魏子敏

2014年的一晚，Ian Goodfellow和一个刚刚毕业的博士生一起喝酒庆祝。在蒙特利尔一个酒吧，一些朋友希望他能帮忙看看手头上一个棘手的项目：计算机如何自己生成图片。

研究人员已经使用了神经网络（模拟人脑的神经元网络的一种算法），作为生成模型来创造合理的新数据。但结果往往不尽人意。计算机生成的人脸图像通常不是模糊不清，就是缺耳少鼻。

Ian Goodfellow朋友们提出的方案是对那些组成图片的元素进行复杂的统计分析以帮助机器自己生成图片。这需要进行大量的数据运算，Ian Goodfellow告诉他们这根本行不通。

边喝啤酒边思考问题时，他突然有了一个想法。如果让两个神经网络相互对抗会出现什么结果呢？他的朋友对此持怀疑态度。

当他回到家，女朋友已经熟睡，他决定马上实验自己的想法。那天他一直写代码写到凌晨，然后进行测试。第一次运行就成功了！

那天晚上他提出的方法现在叫做GAN，即生成对抗网络（generative adversarial network）。

通过使用两个神经网络的相互对抗，Ian Goodfellow创造了一个强大的AI工具——生成对抗网络GAN（generative adversarial network）。现在，该方法已经在机器学习领域产生了巨大的影响，也让他的创造者Goodfellow成为了人工智能界的重要人物。

GAN的诞生故事早已为技术圈所熟知，但是，产生这样奇妙对抗想法的似乎不止Ian Goodfellow一人。

比如另一位机器学习领袖Jurgen Schmidhuber就声称早些时候已经做过类似的工作。

NIPS 2016上有的相关争论：

https://media.nips.cc/nipsbooks/nipspapers/paper_files/nips27/reviews/1384.html

今天，一篇2010年的博文亦在reddit上引发热议。这是一篇非常简短的文章，但是很精确的提出了GAN的基本想法，其中附带的一张图片更是直接表示出了GAN的部署方式。

https：//web.archive.org/web/20120312111546/http：//yehar.com：80 / blog / ？p = 167

这篇帖子引发了大量讨论，不少人觉得遗憾，称，如果小哥能更重视一下自己的这个想法，“他可能才会成为那个改变世界的人。”

当然，也有人表示，有这样的想法很重要，但真的付诸实践才行，并且，2010年的硬件条件或许也还无法支撑让GAN大火的一些应用。甚至拿出来哥伦布发现新大陆的例子表示，“哥伦布可能是第一个发现者，但一定有很多人早就预言过'也许在大西洋有一些岛屿'？”

事实上，这篇博客的作者Olli Niemitalo的心态其实比吃瓜群众要好很多，Olli是位来自芬兰的电器工程师，在2017年的一篇帖子了，他叙述了自己在刚刚发现GAN的心路历程：“2017年5月，我在YouTube看到了Ian Goodfellow的相关教程，made my day! 我之前写下的只是一个基本的想法，并且已经做了很多工作来使它取得良好的效果。这个演讲回答了我曾经遇到过的问题以及更多问题。”

从这篇博客作者的个人主页可以看出，Olli本身也是位思维活跃并且乐于提出新想法的“宝藏男孩”，从2007年开始，他在博客中记下了从“能唱歌的自行车刹车“到”永不迟到的手表“等超多自己的想法，当然其中也包括了这个“GAN”的雏形。

正如Goodfellow所说，“如果你有一个觉得可行的想法，也具有领域知识能够认识到它切实有效，那么你的想法才会真的价值。我提出GAN只花了大约1个小时，写论文花了2个星期。这绝对是一个“99％灵感，1％汗水”的故事，但是在那之前我花了4年时间在相关主题上攻读博士学位。”

最后，欢迎看看这个比Goodfellow早三年提出的GAN的简短想法。

A method for training artificial neural networks to generate missing data within a variable context. As the idea is hard to put in a single sentence, I will use an example:

An image may have missing pixels (let's say, under a smudge). How can one restore the missing pixels, knowing only the surrounding pixels? One approach would be a "generator" neural network that, given the surrounding pixels as input, generates the missing pixels.

But how to train such a network? One can't expect the network to exactly produce the missing pixels. Imagine, for example, that the missing data is a patch of grass. One could teach the network with a bunch of images of lawns, with portions removed. The teacher knows the data that is missing, and could score the network according to the root mean square difference (RMSD) between the generated patch of grass and the original data. The problem is that if the generator encounters an image that is not part of the training set, it would be impossible for the neural network to put all the leaves, especially in the middle of the patch, in exactly the right places. The lowest RMSD error would probably be achieved by the network filling the middle area of the patch with a solid color that is the average of the color of pixels in typical images of grass. If the network tried to generate grass that looks convincing to a human and as such fulfills its purpose, there would be an unfortunate penalty by the RMSD metric.

My idea is this (see figure below): Train simultaneously with the generator a classifier network that is given, in random or alternating sequence, generated and original data. The classifier then has to guess, in the context of the surrounding image context, whether the input is original (1) or generated (0). The generator network is simultaneously trying to get a high score (1) from the classifier. The outcome, hopefully, is that both networks start out really simple, and progress towards generating and recognizing more and more advanced features, approaching and possibly defeating human's ability to discern between the generated data and the original. If multiple training samples are considered for each score, then RMSD is the correct error metric to use, as this will encourage the classifier network to output probabilities.

如果你对GAN的诞生故事感兴趣，也可以看大数据文摘的相关报道：

GAN之父Ian Goodfellow ：那个赋予机器想象力的人类

话题：