一起读《动手学深度学习（PyTorch版）》- 权重衰减（weight decay）

LitchiCheng 发表于 2024-10-27 19:15

<div class='showpostmsg'><article data-content="[{"type":"block","id":"3060-1621846615933","name":"paragraph","data":{},"nodes":[{"type":"text","id":"p5PQ-1621846617594","leaves":[{"text":"weight-decay，用于解决过拟合问题，其中用到范数，将权重进行欧几里得范数，得到惩罚函数为 Sum(w^2) / 2，要保证权重向量比较小，最常用方法是将其范数作为惩罚项加到最小化损失的问题中，将原来的训练目标最小化训练标签上的预测损失，调整为最小化预测损失和惩罚项之和。现在，如果权重向量增长的太大，优化算法会更集中于最小化权重范数","marks":[]}]}],"state":{}}]">
<p>weight-decay，用于解决过拟合问题，其中用到范数，将权重进行欧几里得范数，得到惩罚函数为 Sum(w^2) / 2，要保证权重向量比较小，最常用方法是将其范数作为惩罚项加到最小化损失的问题中，将原来的训练目标最小化训练标签上的预测损失，调整为最小化预测损失和惩罚项之和。现在，如果权重向量增长的太大，优化算法会更集中于最小化权重范数</p>

<p>  </p>

<article data-content="[{"type":"block","id":"64Ob-1730025844954","name":"paragraph","data":{},"nodes":[{"type":"text","id":"RyZl-1730025844953","leaves":[{"text":"损失函数","marks":[]}]}],"state":{}}]">
<p>损失函数</p>

<p>  </p>

<article data-content="[{"type":"block","id":"xWiX-1730026079009","name":"paragraph","data":{},"nodes":[{"type":"text","id":"Jzc6-1730026079008","leaves":[{"text":"预测损失和惩罚项之和","marks":[]}]}],"state":{}}]">
<p>预测损失和惩罚项之和</p>

<pre>
<code>import torch
import torchvision
from torch.utils import data
from torchvision import transforms
import matplotlib.pyplot as plt
from torch import nn

def get_dataloader_workers():
return 6

class Accumulator:
def __init__(self, n) -> None:
   self.data = *n

def add(self, *args):
   # args is a tupe
   self.data =

def reset(self):
   self.data = * len(self.data)

def __getitem__(self, idx):
   return self.data

def set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend):
axes.set_xlabel(xlabel)
axes.set_ylabel(ylabel)
axes.set_xscale(xscale)
axes.set_yscale(yscale)
axes.set_xlim(xlim)
axes.set_ylim(ylim)
if legend:
   axes.legend(legend)
   axes.grid()

class Animator:
def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None,
xscale='linear', yscale='linear',fmts=('-','m--','g-.','r:'), nrows=1, ncols=1, figsize=(3.5, 2.5)):
   if legend is None:
         legend = []
   self.fig, self.axes = plt.subplots(nrows, ncols, figsize=figsize)
   if nrows * ncols == 1:
         self.axes =
   self.config_axes = lambda: set_axes(self.axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend)
   self.X, self.Y, self.fmts = None, None, fmts

def add(self, x, y):
   if not hasattr(y, "__len__"):
         y=
   n = len(y)
   if not hasattr(x, "__len__"):
         x = * n
   if not self.X:
         self.X = [[] for _ in range(n)]
   if not self.Y:
         self.Y = [[] for _ in range(n)]
   for i, (a,b) in enumerate(zip(x, y)):
         if a is not None and b is not None:
            self.X.append(a)
            self.Y.append(b)
   self.axes.cla()
   for x, y, fmt in zip(self.X, self.Y, self.fmts):
         self.axes.plot(x, y, fmt)
   self.config_axes()

def evaluate_loss(net, data_iter, loss):
metric = Accumulator(2)
for X, y in data_iter:
   out = net(X)
   y = y.reshape(out.shape)
   l = loss(out, y)
   metric.add(l.sum(), l.numel())
return metric / metric

def load_array(data_arrays, batch_size, is_train=True):
dataset = data.TensorDataset(*data_arrays)
return data.DataLoader(dataset, batch_size, shuffle=is_train, num_workers=get_dataloader_workers())

def synthetic_data(w, b, num_examples):
X = torch.normal(0, 1, (num_examples, len(w)))
y = torch.matmul(X, w) + b
y += torch.normal(0, 0.01, y.shape)
return X, y.reshape((-1, 1))

n_train, n_test, num_inputs, batch_size = 20, 100, 200, 5
true_w, true_b = torch.ones((num_inputs, 1)) * 0.01, 0.05
train_data = synthetic_data(true_w, true_b, n_train)
train_iter = load_array(train_data, batch_size)
test_data = synthetic_data(true_w, true_b, n_test)
test_iter = load_array(test_data, batch_size, is_train=False)

def train_concise(wd):
net = nn.Sequential(nn.Linear(num_inputs, 1))
for param in net.parameters():
   param.data.normal_()
loss = nn.MSELoss(reduction='none')
num_epochs, lr = 100, 0.003
trainer = torch.optim.SGD([
   {"params":net.weight,'weight_decay': wd},
   {"params":net.bias}], lr=lr)
animator = Animator(xlabel='epochs', ylabel='loss', yscale='log',
                        xlim=, legend=['train', 'test'])
for epoch in range(num_epochs):
   for X, y in train_iter:
         trainer.zero_grad()
         l = loss(net(X), y)
         l.mean().backward()
         trainer.step()
   if (epoch + 1) % 5 == 0:
         animator.add(epoch + 1,
                     (evaluate_loss(net, train_iter, loss),
                     evaluate_loss(net, test_iter, loss)))
print('weight', net.weight.norm().item())

train_concise(0)
plt.show()</code></pre>

<p>  </p>

<article data-content="[{"type":"block","id":"neBp-1730026310109","name":"paragraph","data":{},"nodes":[{"type":"text","id":"dWm6-1730026310108","leaves":[{"text":"可以看到，当不启用权重衰减时，测试loss基本不下降，也就是过拟合的现象","marks":[]}]}],"state":{}}]">
<p>可以看到，当不启用权重衰减时，测试loss基本不下降，也就是过拟合的现象</p>

<pre>
<code>train_concise(5)</code></pre>

<p>  </p>

<article data-content="[{"type":"block","id":"9z0p-1730026506587","name":"paragraph","data":{},"nodes":[{"type":"text","id":"mgOS-1730026506585","leaves":[{"text":"启用权重衰减，控制量为5时，可以看到实际的测试loss也是逐渐减小的","marks":[]}]}],"state":{}}]">
<p>启用权重衰减，控制量为5时，可以看到实际的测试loss也是逐渐减小的</p>

<p>  </p>

<article data-content="[{"type":"block","id":"niHE-1730027318588","name":"paragraph","data":{},"nodes":[{"type":"text","id":"rDgj-1730027318586","leaves":[{"text":"继续增强权重衰减的效果","marks":[]}]}],"state":{}}]">
<p>继续增强权重衰减的效果</p>
</article>
</article>
</article>
</article>
</article>
</article>
</div><script> var loginstr = '<div class="locked">查看本帖全部内容，请<a href="javascript:;" style="color:#e60000" class="loginf">登录</a>或者<a href="https://bbs.eeworld.com.cn/member.php?mod=register_eeworld.php&action=wechat" style="color:#e60000" target="_blank">注册</a></div>';

if(parseInt(discuz_uid)==0){
(function($){
var postHeight = getTextHeight(400);
$(".showpostmsg").html($(".showpostmsg").html());
$(".showpostmsg").after(loginstr);
$(".showpostmsg").css({height:postHeight,overflow:"hidden"});
})(jQuery);
} </script><script type="text/javascript">(function(d,c){var a=d.createElement("script"),m=d.getElementsByTagName("script"),eewurl="//counter.eeworld.com.cn/pv/count/";a.src=eewurl+c;m.parentNode.insertBefore(a,m)})(document,523)</script>

页: [1]

电子工程世界-论坛's Archiver

一起读《动手学深度学习（PyTorch版）》- 权重衰减（weight decay）