DeepFool 算法

这玩意儿我们再熟悉不过了。本质上是白盒攻击,基于L2范数的无定向攻击算法。对于多分类的情况,实际上是在每次迭代的过程中,计算出能够将样本点推离当前决策边界的最小欧式距离D。现在给出其伪代码表示

ART DeepFool 攻击源码阅读笔记-ShaoBaoBaoEr's Blog

ART源码阅读

构造函数

    def __init__(self, classifier, max_iter=100, epsilon=1e-6, nb_grads=10, batch_size=1):
  • classifier: 带入的模型
  • max_iter : 最大迭代次数
  • epsilon : 论文中的超参数overshoot。(我记得论文里默认的是.02来着)
  • nb_grads : DEBUG参数?(个人理解)每次Log中打印的top前x个类别
  • batch_size : 一批数据的大小,这个无需考虑

生成过程

准备工作

    def generate(self, x, y=None, **kwargs):

假定每批数量为1,以CIFAR10为例子。则输入x张量的尺寸为 1 x 32 x 32 x 3。之后会对代码进行一些删减。

x_adv = x.astype(ART_NUMPY_DTYPE)
preds = self.classifier.predict(x)

# Determine the class labels for which to compute the gradients
use_grads_subset = self.nb_grads < self.classifier.nb_classes()
if use_grads_subset:
    # TODO compute set of unique labels per batch
    grad_labels = np.argsort(-preds, axis=1)[:, : self.nb_grads]
    labels_set = np.unique(grad_labels)
else:
    labels_set = np.arange(self.classifier.nb_classes())
sorter = np.arange(len(labels_set))

# Pick a small scalar to avoid division by 0
tol = 10e-8

似乎这是一段没有完成的代码。总结来说做了4件事儿

  • 传建一个空的x_adc
  • 获取x的预测标签preds
  • 获取总的类别个数
  • 定义平滑参数,防止除0

为了简化理解,我们假定nb_grads == self.classifier.nb_classes()。之后会对代码进行一些删减。

batch = x_adv
# Get predictions and gradients for batch
f_batch = preds
fk_hat = np.argmax(f_batch, axis=1)  # one hot -> sparse
# Compute gradients for all classes
grd = self.classifier.class_gradient(batch)

经过删减,上述代码计算除了梯度的信息grd。这在后文中很重要。

迭代过程

之后,梯度信息计算完毕,进入到迭代过程中去。

# Get current predictions
active_indices = np.arange(len(batch))
current_step = 0

while active_indices.size > 0 and current_step < self.max_iter:
    # Compute difference in predictions and gradients only for selected top predictions
    labels_indices = sorter[np.searchsorted(labels_set, fk_hat, sorter=sorter)]
    grad_diff = grd - grd[np.arange(len(grd)), labels_indices][:, None]
    f_diff = f_batch[:, labels_set] - f_batch[np.arange(len(f_batch)), labels_indices][:, None]

说实话这里的操作不是很看得明白,应该是计算了
$$\begin{aligned}
&\boldsymbol{w}_{k}^{\prime} \leftarrow \nabla f_{k}\left(\boldsymbol{x}_{i}\right)-\nabla f_{\hat{k}\left(\boldsymbol{x}_{0}\right)}\left(\boldsymbol{x}_{i}\right)\\
&f_{k}^{\prime} \leftarrow f_{k}\left(\boldsymbol{x}_{i}\right)-f_{\hat{k}\left(\boldsymbol{x}_{0}\right)}\left(\boldsymbol{x}_{i}\right)
\end{aligned}$$

这两个部分

while active_indices.size > 0 and current_step < self.max_iter:
    ...
    # Choose coordinate and compute perturbation
    norm = np.linalg.norm(grad_diff.reshape(len(grad_diff), len(labels_set), -1), axis=2) + tol
    value = np.abs(f_diff) / norm 
    value[np.arange(len(value)), labels_indices] = np.inf
    l_var = np.argmin(value, axis=1) # ===== A

    absolute1 = abs(f_diff[np.arange(len(f_diff)), l_var])
    draddiff = grad_diff[np.arange(len(grad_diff)), l_var].reshape(len(grad_diff), -1)
    pow1 = pow(np.linalg.norm(draddiff, axis=1), 2,) + tol
    r_var = absolute1 / pow1 # ===== B

    r_var = r_var.reshape((-1,) + (1,) * (len(x.shape) - 1))
    r_var = r_var * grad_diff[np.arange(len(grad_diff)), l_var] # ===== C

之后的计算比较复杂,慢慢来看

  • A:计算出$\hat{l} \leftarrow \arg \min {k \neq \hat{k}\left(\boldsymbol{x}{0}\right)} \frac{\left|f_{k}^{\prime}\right|}{\left|\boldsymbol{w}{k}^{\prime}\right|{2}}$
    • l_var 记录了最佳的位置
  • B: 计算出$\boldsymbol{r}{i} \leftarrow \frac{\left|f{i}^{\prime}\right|}{\left|\boldsymbol{w}{i}^{\prime}\right|{2}^{2}} $
  • C: 计算出$\boldsymbol{r}{i} \leftarrow \frac{\left|f{i}^{\prime}\right|}{\left|\boldsymbol{w}{i}^{\prime}\right|{2}^{2}} \boldsymbol{w}_{\hat{l}}^{\prime}$
while active_indices.size > 0 and current_step < self.max_iter:
    ...
    # Add perturbation and clip result
    if hasattr(self.classifier, "clip_values") and self.classifier.clip_values is not None:
        batch[active_indices] = np.clip(
            batch[active_indices] + r_var[active_indices],
            self.classifier.clip_values[0],
            self.classifier.clip_values[1],
        )
    else:
        batch[active_indices] += r_var[active_indices]

应用扰动,将扰动添加到x_adv并clip。

while active_indices.size > 0 and current_step < self.max_iter:
    ...
    # Recompute prediction for new x
    f_batch = self.classifier.predict(batch)
    fk_i_hat = np.argmax(f_batch, axis=1)
    # Recompute gradients for new x
    grd = self.classifier.class_gradient(batch)
    # Stop if misclassification has been achieved
    active_indices = np.where(fk_i_hat == fk_hat)[0]

    current_step += 1

重新标签,梯度。

后处理过程

后处理需要应用overshoot参数。过程如下

# Apply overshoot parameter
x_adv1 = x_adv[batch_index_1:batch_index_2]
x_adv2 = (1 + self.epsilon) * (batch - x_adv[batch_index_1:batch_index_2])
x_adv[batch_index_1:batch_index_2] = x_adv1 + x_adv2
if hasattr(self.classifier, "clip_values") and self.classifier.clip_values is not None:
    np.clip(
        x_adv[batch_index_1:batch_index_2],
        self.classifier.clip_values[0],
        self.classifier.clip_values[1],
        out=x_adv[batch_index_1:batch_index_2],
    )

显而易见,在全部跑完后,对扰动乘上1+overshoot。并返回结果