论文复现赛：对抗攻击

更新时间：2026-03-27 13:28:37

taptap官方最新客户端2026

类型：生活服务
大小：28.8m
语言：简体中文
评分：

查看详情

论文复现赛：对抗攻击

本文在PaddlePaddle上实现的Towards Deep Learning Models Resistant to Adversarial Attacks复现实验，使用MNIST数据集进行对抗训练，并运用Min-Max策略与FGSM、PGD生成对抗样本，结果精度显著优于原文献。详细分析了超参数对模型性能的影响及框架结构差异，为对抗防御提供参考方案。

论文复现：Towards Deep Learning Models Resistant to Adversarial Attacks

一、简介

对抗防御：为了应对精准的数值型攻击，研究人员探索了生成错误示例的方法，旨在削弱机器学习模型的准确性。这便是所谓的对抗性保护策略，其目标是开发技术来检测并修正这些类型的攻击，确保系统的安全性得到保障。

根据条件不同，对抗攻击主要分为以下几种类型：黑盒攻击与白盒攻击：黑盒攻击指的是攻击者仅了解模型的输入输出而无法访问其内部结构和参数。在这种情况下，攻击者生成的对抗样本仅在测试期间暴露给目标系统，并且在训练过程中可能不涉及任何真实数据。白盒攻击则完全不同于此；它假设攻击者对目标模型有深入的理解，包括其架构、训练方法以及使用的数据集。有目标攻击与无目标攻击：无目标攻击的目标是导致网络出错，而目标攻击不仅需要实现这一效果，还要求指定特定的错误情况。前者通常是随机生成对抗样本以最小化系统的准确率；后者则在确保系统功能正常的同时，试图产生某个具体的错误。基于梯度的方法、优化方法以及其他：这些是多种不同的攻击实现方式，它们通过改变输入数据的微小扰动来推动模型预测结果向正确的方向偏移。这可以包括但不限于随机梯度下降法（SGD）、生成对抗网络（GANs）等技术的应用。总体而言，这种分类不仅有助于区分不同类型的对抗攻击，也提供了策略性的思路来进行防御和应对这些挑战。

本文探讨了神经网络对抗鲁棒性的优化策略，旨在构建一种防御所有潜在威胁的技术。

对抗攻击是源于神经网络输入信息敏感性的问题，当输入信息与训练样本分布特征有显著偏差时，网络可能产生错误预测。通过精修改变输入信息，可导致网络给出极不可能的结果，如下图所示：尽管人眼无法察觉这些细微差异，但足以干扰网络的正常运作。

采用基于原始样本生成对抗样本的方法，接着利用对抗样本进行风险期望的计算。

这种鞍点问题，作者称之为Max-Min问题。Max求解的是，在固定网络参数的情况下，找出原始样本的一个偏移量，使得Loss函数在局部取得最大值，即为求出此时的对抗样本；Min则是在得到对抗样本之后，根据梯度下降法，使该对抗样本的期望风险最小化。

通过优化模型参数，采用常规梯度下降法；核心在于发现样本的高仿品，解决最大值难题。

寻找最优的对抗样本，常用的有两种方法：Fast Gradient Sign Method（FGSM）和Projected Gradient Decent（PGD）方法。

有趣的一个发现是，模型越大的话，通过这种方法得到的结果反而更好。而我认为模型对对抗攻击比较脆弱的主要原因是过度拟合。但是文章提供的结论却显示出了一个完全不同的原因：正如上图所示，模型复杂实际上能够提高其鲁棒性。所以，我们可以看到在面对复杂的模型时，其实会更加安全。

通过这种方法，文本中提到的方法实质上是将大量的数据元素融入到训练过程中，其中ε值越大代表使用的样本越多，因此模型的复杂度也随之增加。

论文链接：Towards Deep Learning Models Resistant to Adversarial Attacks

二、复现精度

基于paddlepaddle深度学习框架，对文献算法进行复现后，汇总各测试条件下的测试精度，如下表所示。任务本项目精度原文献精度 PGD-steps100-restarts20-sourceA 92.555% 89.3% PGD-steps100-restarts20-sourceA‘ 97.3955% 95.7% PGD-steps40-restarts1-sourceB 97.22% 96.4%

超参数配置如下：超参数名设置值 lr 0.0003 batch_size 128 epochs 50 alpha 0.01 steps 100 steps 100/40 restarts 20 epsilon 0.3

三、数据集

本项目使用的是MNIST数据集。

MNIST数据集在机器学习领域极为经典，它由张训练图像与张测试图像构成，每幅图像为素的灰阶手写字母图片。

该数据集由美国国家标准与技术研究所（NIST）发起整理，包括了来自不同来源的人手写数字图片，其中包括高中生和人口普查局工作人员的数据比例为各半。其主要目的是在于通过算法手段实现对这些手写数字的识别。

数据集链接：MNIST

四、环境依赖

硬件： x86 cpu NVIDIA GPU

框架： PaddlePaddle = 2.1.2

其他依赖项： numpy==1.19.3 matplotlib==3.3.4

五、快速开始

1、执行以下命令启动训练：

python train.py --net robust --seed 0

建立robust network，运行完毕后，模型参数文件保存在./checkpoints/MNIST/目录下，手动将该文件保存至./checkpoints/MNIST_Robust_Model目录下。 In []

%cd /home/aistudio/TDLMRAAI_paddle/ !python train.py --net robust --seed 0登录后复制

python train.py --net robust --seed 10建立不同参数的network，用于黑盒攻击A

构建多样化的网络并完成后，模型参数存于./checkpoints/MNIST/路径，手动复制至./checkpoints/MNIST_BlackboxA路径。

%cd /home/aistudio/TDLMRAAI_paddle/ !python train.py --net robust --seed 10登录后复制

/home/aistudio/TDLMRAAI_paddle /opt/conda/envs/python35-padd from collections import /opt/conda/envs/python35-padd from collections import /opt/conda/envs/python35-padd from collections import Sized /home/aistudio/TDLMRAAI_paddl This call to matplotlib.use() been chosen; matplotlib.use() or matplotlib.backends The backend was *originally* File "train.py", import helper File "/home/aistudio/T import matplotlib.pyplot as plt File "/opt/conda/envs/ from matplotlib.backends File "/opt/conda/envs/ line for line in matplotlib.use('Agg') Time passed [minutes]: 0.00. Time passed [minutes]: 0.00. Time passed [minutes]: 0.00. Time passed [minutes]: 0.00. Time passed [minutes]: 0.00. Time passed [minutes]: 0.00. Time passed [minutes]: 0.00. Time passed [minutes]: 0.00. Time passed [minutes]: 0.00. Time passed [minutes]: 0.00. W0817 15:12:37.464154 W0817 15:12:37.468314 Time passed [minutes]: 0.11. Time passed [minutes]: 0.11. Full Train(training /opt/conda/envs/python35-padd tensor.grad will return warnings.warn(warning_msg) /opt/conda/envs/python35-padd Deprecated in NumPy if data.dtype == np.object: Time passed [minutes]: 2.50. Time passed [minutes]: 4.85. Time passed [minutes]: 7.27. Time passed [minutes]: 9.65. Time passed [minutes]: 11.98. Time passed [minutes]: 14.32. Time passed [minutes]: 16.65. Time passed [minutes]: 18.98. Time passed [minutes]: 21.33. Time passed [minutes]: 23.75. Time passed [minutes]: 26.16. Time passed [minutes]: 28.60. Time passed [minutes]: 30.99. Time passed [minutes]: 33.43. Time passed [minutes]: 35.83. Time passed [minutes]: 38.21. Time passed [minutes]: 40.60. Time passed [minutes]: 43.01. Time passed [minutes]: 45.42. Time passed [minutes]: 47.80. Time passed [minutes]: 50.19. Time passed [minutes]: 52.60. Time passed [minutes]: 54.99. Time passed [minutes]: 57.41. Time passed [minutes]: 59.80. Time passed [minutes]: 62.17. Time passed [minutes]: 64.56. Time passed [minutes]: 66.95. Time passed [minutes]: 69.33. Time passed [minutes]: 71.73. Time passed [minutes]: 74.19. Time passed [minutes]: 76.72. Time passed [minutes]: 79.12. Time passed [minutes]: 81.50. Time passed [minutes]: 83.87. Time passed [minutes]: 86.32. Time passed [minutes]: 88.70. Time passed [minutes]: 91.14. Time passed [minutes]: 93.64. Time passed [minutes]: 96.08. Time passed [minutes]: 98.48. Time passed [minutes]: 100.91. Time passed [minutes]: 103.34. Time passed [minutes]: 105.77. Time passed [minutes]: 108.19. Time passed [minutes]: 113.07. Time passed [minutes]: 115.47. Time passed [minutes]: 117.85. Time passed [minutes]: 120.26. Time passed [minutes]: 120.26. Time passed [minutes]: 120.27. Time passed [minutes]: 120.27. Time passed [minutes]: 120.38. Time passed [minutes]: 120.38. Time passed [minutes]: 120.99. Time passed [minutes]: 120.99. Time passed [minutes]: 120.99. Time passed [minutes]: 120.99. Time passed [minutes]: 120.99. le120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working MutableMapping le120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working Iterable, Mapping le120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working e/helper.py:13: UserWarning: has no effect because the backend has already must be called *before* pylab, matplotlib.pyplot, is imported for the first time. set to 'module://ipykernel.pylab.backend_inline' by the following code: line 9, in <module> DLMRAAI_paddle/helper.py", line 2, in <module> python35-paddle120-env/lib/python3.7/site-packages/matplotlib/pyplot.py", line 71, in <module> import pylab_setup python35-paddle120-env/lib/python3.7/site-packages/matplotlib/backends/__init__.py", line 16, in <module> traceback.format_stack() execution date (d/m/y): 17/08/2021, 15:12:33 Dataset name: MNIST checkpoints folder: ./checkpoints/MNIST save checkpoints: True load checkpoints: False results folder: ./results_folder/MNIST show results: False save results: False seed: 10 execution device: CUDAPlace(0) 2714 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1 2714 device_context.cc:422] device: 0, cuDNN Version: 7.6. Train model on MNIST-ORIGIN-NET on all training dataset) with selected hp: {'lr': 0.0003, 'batch_size': 128, 'alpha': 0.01, 'steps': 100, 'epsilon': 0.3} le120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py:382: UserWarning: Warning: the tensor value of the gradient. This is an incompatible upgrade for tensor.grad API. It's return type changes from numpy.ndarray in version 2.0 to paddle.Tensor in version 2.1.0. If you want to get the numpy value of the gradient, you can use :code:`x.grad.numpy()` le120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:125: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations Epoch 1. Train adversarial accuracy: 0.054538.Train accuracy: 0.21, Train loss: 2.57 Epoch 2. Train adversarial accuracy: 0.130375.Train accuracy: 0.43, Train loss: 2.29 Epoch 3. Train adversarial accuracy: 0.286514.Train accuracy: 0.76, Train loss: 2.05 Epoch 4. Train adversarial accuracy: 0.368876.Train accuracy: 0.87, Train loss: 1.81 Epoch 5. Train adversarial accuracy: 0.429538.Train accuracy: 0.90, Train loss: 1.60 Epoch 6. Train adversarial accuracy: 0.486385.Train accuracy: 0.92, Train loss: 1.42 Epoch 7. Train adversarial accuracy: 0.530661.Train accuracy: 0.93, Train loss: 1.28 Epoch 8. Train adversarial accuracy: 0.568308.Train accuracy: 0.94, Train loss: 1.16 Epoch 9. Train adversarial accuracy: 0.600074.Train accuracy: 0.94, Train loss: 1.08 Epoch 10. Train adversarial accuracy: 0.622018.Train accuracy: 0.94, Train loss: 1.01 Epoch 11. Train adversarial accuracy: 0.643901.Train accuracy: 0.95, Train loss: 0.96 Epoch 12. Train adversarial accuracy: 0.665317.Train accuracy: 0.95, Train loss: 0.90 Epoch 13. Train adversarial accuracy: 0.683136.Train accuracy: 0.95, Train loss: 0.86 Epoch 14. Train adversarial accuracy: 0.696867.Train accuracy: 0.96, Train loss: 0.82 Epoch 15. Train adversarial accuracy: 0.718922.Train accuracy: 0.96, Train loss: 0.77 Epoch 16. Train adversarial accuracy: 0.743626.Train accuracy: 0.96, Train loss: 0.71 Epoch 17. Train adversarial accuracy: 0.764692.Train accuracy: 0.96, Train loss: 0.66 Epoch 18. Train adversarial accuracy: 0.781033.Train accuracy: 0.96, Train loss: 0.61 Epoch 19. Train adversarial accuracy: 0.794921.Train accuracy: 0.96, Train loss: 0.58 Epoch 20. Train adversarial accuracy: 0.808735.Train accuracy: 0.96, Train loss: 0.54 Epoch 21. Train adversarial accuracy: 0.819841.Train accuracy: 0.97, Train loss: 0.51 Epoch 22. Train adversarial accuracy: 0.835027.Train accuracy: 0.97, Train loss: 0.47 Epoch 23. Train adversarial accuracy: 0.850030.Train accuracy: 0.97, Train loss: 0.43 Epoch 24. Train adversarial accuracy: 0.868759.Train accuracy: 0.97, Train loss: 0.38 Epoch 25. Train adversarial accuracy: 0.879603.Train accuracy: 0.97, Train loss: 0.35 Epoch 26. Train adversarial accuracy: 0.885767.Train accuracy: 0.97, Train loss: 0.33 Epoch 27. Train adversarial accuracy: 0.890181.Train accuracy: 0.97, Train loss: 0.32 Epoch 28. Train adversarial accuracy: 0.894540.Train accuracy: 0.97, Train loss: 0.31 Epoch 29. Train adversarial accuracy: 0.899015.Train accuracy: 0.97, Train loss: 0.30 Epoch 30. Train adversarial accuracy: 0.900337.Train accuracy: 0.98, Train loss: 0.29 Epoch 31. Train adversarial accuracy: 0.903568.Train accuracy: 0.98, Train loss: 0.28 Epoch 32. Train adversarial accuracy: 0.905889.Train accuracy: 0.98, Train loss: 0.27 Epoch 33. Train adversarial accuracy: 0.906944.Train accuracy: 0.98, Train loss: 0.27 Epoch 34. Train adversarial accuracy: 0.908571.Train accuracy: 0.98, Train loss: 0.27 Epoch 35. Train adversarial accuracy: 0.908443.Train accuracy: 0.98, Train loss: 0.26 Epoch 36. Train adversarial accuracy: 0.909648.Train accuracy: 0.98, Train loss: 0.26 Epoch 37. Train adversarial accuracy: 0.911075.Train accuracy: 0.98, Train loss: 0.26 Epoch 38. Train adversarial accuracy: 0.912408.Train accuracy: 0.98, Train loss: 0.26 Epoch 39. Train adversarial accuracy: 0.912608.Train accuracy: 0.98, Train loss: 0.25 Epoch 40. Train adversarial accuracy: 0.912519.Train accuracy: 0.98, Train loss: 0.25 Epoch 41. Train adversarial accuracy: 0.914440.Train accuracy: 0.98, Train loss: 0.25 Epoch 42. Train adversarial accuracy: 0.915734.Train accuracy: 0.98, Train loss: 0.24 Epoch 43. Train adversarial accuracy: 0.917128.Train accuracy: 0.98, Train loss: 0.24 Epoch 44. Train adversarial accuracy: 0.917283.Train accuracy: 0.98, Train loss: 0.24 Epoch 45. Train adversarial accuracy: 0.919504.Train accuracy: 0.98, Train loss: 0.23 Epoch 47. Train adversarial accuracy: 0.920781.Train accuracy: 0.98, Train loss: 0.23 Epoch 48. Train adversarial accuracy: 0.922358.Train accuracy: 0.98, Train loss: 0.22 Epoch 49. Train adversarial accuracy: 0.923474.Train accuracy: 0.99, Train loss: 0.22 Epoch 50. Train adversarial accuracy: 0.924007.Train accuracy: 0.99, Train loss: 0.22 training selected hyperparams: {'lr': 0.0003, 'batch_size': 128, 'alpha': 0.01, 'steps': 100, 'epsilon': 0.3} %accuracy on attack: 0.9287109375 with hp: {'epsilon': 0.3} FGSM attack selected hyperparams: {'epsilon': 0.3} %accuracy on attack: 0.8938802083333334 with hp: {'alpha': 0.01, 'steps': 100, 'epsilon': 0.3} PGD attack selected hyperparams: {'alpha': 0.01, 'steps': 100, 'epsilon': 0.3} TEST SCORES of MNIST-ORIGIN-NET with PGD adversarial training: accuracy on test: 0.9807 accuracy on FGSM constructed examples: 0.946 accuracy on PGD constructed examples: 0.9163 save network to ./checkpoints/MNIST/MNIST-ORIGIN-NET with PGD adversarial training.pdparams登录后复制

python train.py --net diff_arch --seed 0建立不同架构的network，用于黑盒攻击B

构建多层网络，完成训练后，模型参数存于./checkpoints/MNIST目录，需手动移动到./checkpoints/MNIST_BlackboxB目录。

%cd /home/aistudio/TDLMRAAI_paddle/ !python train.py --net diff_arch --seed 0登录后复制

2、执行以下命令进行评估

python test.py --method white --num_restarts 20用于白盒测试，对应PGD-steps100-restarts20-sourceA In []

%cd /home/aistudio/TDLMRAAI_paddle/ !python test.py --method white --num_restarts 20登录后复制

python test.py --method blackA --num_restarts 20用于网络架构已知，参数未知的黑盒测试，对应PGD-steps100-restarts20-sourceA’ In []

%cd /home/aistudio/TDLMRAAI_paddle/ !python test.py --method blackA --num_restarts 20登录后复制

python test.py --method blackB --num_restarts 1用于网络架构未知黑盒测试，对应PGD-steps40-restarts1-sourceB In []

%cd /home/aistudio/TDLMRAAI_paddle/ !python test.py --method blackB --num_restarts 1登录后复制

六、实验数据比较及复现心得

1、实验数据比较

在不同超参数配置下，模型的收敛效果和达到的精度指标存在显著差异。以下是各种配置下的实验结果对比，有助于进行准确的比较分析。

（1）学习率：

优化器一致采用Adam，原论文学习率为在多次实验中调整后发现学习率使模型更快收敛。不同学习率对样本精准度与损失函数有显著影响，观察图表可得最佳结果。

（2）训练时，是否加入原始样本：

根据本项目参考的repo描述，加入原始样本对模型精度有积极影响。我们在相同超参数下进行了实验对比，结果显示：测试项 | 加入原始样本 | 不使用原始样本 ---|---|--- 测试集ACC | | FGSM攻击ACC | | PGD攻击ACC | | 由此可见，加入原始样本有助于提升模型的测试准确性。

原文献中算法在训练期间仅关注生成的对抗样本，不包含实际输入的数据。研究发现，使用更多的原始数据可以小幅提高真实的识别准确性（提升了）。然而，这可能会导致攻击样本准确性的轻微降低。

但总体上，对指标影响很小。

（3）initialer

（4）batch_size

（5）epoch轮次

在本次项目中，我们设定的epochs数量为在此过程中，损失值及对抗性正确率趋于平稳，表明模型正逐渐趋向于收敛点，如图所示。

2、复现心得

本次论文复现由于作者未提供原始模型参考实现，仅提供了部分基于TensorFlow的部分代码，难度较大。前期研读了文献并上网查找相关资料和相似实现代码。在复现过程中，学到了大量领域的知识，并对对抗攻击的研究有了更深入的理解。

纸上得来终觉浅，绝知此事要躬行。古人诚不欺我！

在深入理解文献理论的同时，我们还对PyTorch和PaddlePaddle这两个框架进行了细致比较。例如，在模型初始化方面，PyTorch使用了Kaiming Normal初始化方法；而Paddle则采用了Xavier Normal初始化方法。虽然这种微小的初始值设置差异可能不会立即显现出来，但在项目实施中可能会带来一些意外的结果。

本项目最初采用的是paddle的默认初始化方法Xavier_normal，但在进行一段时间后发现网络收敛速度过慢且精确度不高。为了解决这个问题，我们选择改用kaiming_normal，并调整了网络参数以达到与PyTorch一致的效果。通过实验对比，使用新的初始化方式后，不仅提升了网络的训练效率和精度，还完美符合论文中的指标要求。

七、模型信息

训练完成后，模型保存在checkpoints目录下。

训练数据存储于results_folder，测试数据存放在test_results_folder。发布者：hrdwsong；日期：框架版本：Paddle 应用场景：对抗训练；支持硬件：GPU、CPU；Github地址：https：//github.com/hrdwsong/TDLMRA-Paddle

以上就是论文复现赛：对抗攻击的详细内容，更多请关注其它相关文章！