在 PyTorch 中使用 TensorBoard 进行实验跟踪和超参数调整
介绍
在 PyTorch 中使用 TensorBoard 跟踪实验并调整超参数
实验跟踪涉及记录和监控机器学习实验数据,TensorBoard 是可视化和分析这些数据的有用工具。它可以帮助研究人员了解实验行为、比较模型并做出明智的决策。
超参数调整是寻找影响模型学习的配置设置最佳值的过程。示例包括学习率、批量大小和隐藏层数量。适当的调整可以提高模型性能和泛化能力。
超参数调整策略包括手动搜索、网格搜索、随机搜索、贝叶斯优化和自动化技术。这些方法系统地探索和评估不同的超参数值。
您可以在调整过程中使用准确性或均方误差等评估指标来评估模型性能。有效的超参数调整可以改善未见数据的模型结果。
在本博客中,我们将看到使用网格搜索、FashionMNIST 数据集和自定义 VGG 模型进行超参数调整。请继续关注有关其他调整算法的未来博客。
让我们开始!
安装并导入依赖项
首先在 Jupyter 或 Google Colab 上打开一个新的 Python 笔记本。在代码块中编写这些命令以安装和导入依赖项。
%pip install -q torchinfo torchmetrics tensorboard
import torch
import torchvision
import os
from torchvision.transforms import Resize, Compose, ToTensor
import matplotlib.pyplot as plt
from torchinfo import summary
import torchmetrics
from tqdm.auto import tqdm
from torch.utils.tensorboard import SummaryWriter
加载数据集和DataLoader
BATCH_SIZE = 64
if not os.path.exists("data"): os.mkdir("data")
train_transform = Compose([Resize((64,64)),
ToTensor()
])
test_transform = Compose([Resize((64,64)),
ToTensor()
])
training_dataset = torchvision.datasets.FashionMNIST(root = "data",
download = True,
train = True,
transform = train_transform)
test_dataset = torchvision.datasets.FashionMNIST(root = "data",
download = True,
train = False,
transform = test_transform)
train_dataloader = torch.utils.data.DataLoader(training_dataset,
batch_size=BATCH_SIZE,
shuffle=True,
)
test_dataloader = torch.utils.data.DataLoader(test_dataset,
batch_size = BATCH_SIZE,
shuffle = False,
)
- 在这里,我们将批处理大小设置为 64。通常,您会希望选择 GPU 可以处理且不会出现错误的最大批处理大小
cuda out of memory
。 - 我们定义将图像转换为张量的变换。
- 我们从 torchvision 数据集中内置的 FashionMNIST 数据集启动训练数据集和测试数据集。我们将
root
文件夹设置为data
文件夹,download
因为True
我们要下载数据集以及train
训练True
数据和False
测试数据。 - 接下来,我们定义训练和测试数据加载器。
我们可以使用此命令查看训练和测试数据集中有多少图像。
print(f"Number of Images in test dataset is {len(test_dataset)}")
print(f"Number of Images in training dataset is {len(training_dataset)}")
[!output]
测试数据集中的图像数量为 10000
训练数据集中的图像数量为 60000
创建 TinyVGG 模型
我正在使用此自定义模型演示实验跟踪。但您可以使用您选择的任何模型。
class TinyVGG(nn.Module):
"""
A small VGG-like network for image classification.
Args:
in_channels (int): The number of input channels.
n_classes (int): The number of output classes.
hidden_units (int): The number of hidden units in each convolutional block.
n_conv_blocks (int): The number of convolutional blocks.
dropout (float): The dropout rate.
"""
def __init__(self, in_channels, n_classes, hidden_units, n_conv_blocks, dropout):
super().__init__()
self.in_channels = in_channels
self.out_features = n_classes
self.dropout = dropout
self.hidden_units = hidden_units
# Input block
self.input_block = nn.Sequential(
nn.Conv2d(in_channels=in_channels, out_channels=hidden_units, kernel_size=3, padding=0, stride=1),
nn.Dropout(dropout),
nn.ReLU(),
)
# Convolutional blocks
self.conv_blocks = nn.ModuleList([
nn.Sequential(
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, padding=0, stride=1),
nn.Dropout(dropout),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
) for _ in range(n_conv_blocks)
])
# Classifier
self.classifier = nn.Sequential(
nn.Flatten(),
nn.LazyLinear(out_features=256),
nn.Dropout(dropout),
nn.Linear(in_features=256, out_features=64),
nn.Linear(in_features=64, out_features=n_classes),
)
def forward(self, x):
"""
Forward pass of the network.
Args:
x (torch.Tensor): The input tensor.
Returns:
torch.Tensor: The output tensor.
"""
x = self.input_block(x)
for conv_block in self.conv_blocks:
x = conv_block(x)
x = self.classifier(x)
return x
定义训练和测试功能
def train_step(dataloader, model, optimizer, criterion, device, train_acc_metric):
"""
Perform a single training step.
Args:
dataloader (torch.utils.data.DataLoader): The dataloader for the training data.
model (torch.nn.Module): The model to train.
optimizer (torch.optim.Optimizer): The optimizer for the model.
criterion (torch.nn.Module): The loss function for the model.
device (torch.device): The device to train the model on.
train_acc_metric (torchmetrics.Accuracy): The accuracy metric for the model.
Returns:
The accuracy of the model on the training data.
"""
for (X, y) in tqdm.tqdm(dataloader):
# Move the data to the device.
X = X.to(device)
y = y.to(device)
# Forward pass.
y_preds = model(X)
# Calculate the loss.
loss = criterion(y_preds, y)
# Calculate the accuracy.
train_acc_metric.update(y_preds, y)
# Backpropagate the loss.
loss.backward()
# Update the parameters.
optimizer.step()
# Zero the gradients.
optimizer.zero_grad()
return train_acc_metric.compute()
def test_step(dataloader, model, device, test_acc_metric):
"""
Perform a single test step.
Args:
dataloader (torch.utils.data.DataLoader): The dataloader for the test data.
model (torch.nn.Module): The model to test.
device (torch.device): The device to test the model on.
test_acc_metric (torchmetrics.Accuracy): The accuracy metric for the model.
Returns:
The accuracy of the model on the test data.
"""
for (X, y) in tqdm.tqdm(dataloader):
# Move the data to the device.
X = X.to(device)
y = y.to(device)
# Forward pass.
y_preds = model(X)
# Calculate the accuracy.
test_acc_metric.update(y_preds, y)
return test_acc_metric.compute()
TensorBoard 摘要编写器
def create_writer(
experiment_name: str, model_name: str, conv_layers, dropout, hidden_units
) -> SummaryWriter:
"""
Create a SummaryWriter object for logging the training and test results.
Args:
experiment_name (str): The name of the experiment.
model_name (str): The name of the model.
conv_layers (int): The number of convolutional layers in the model.
dropout (float): The dropout rate used in the model.
hidden_units (int): The number of hidden units in the model.
Returns:
SummaryWriter: The SummaryWriter object.
"""
timestamp = str(datetime.now().strftime("%d-%m-%Y_%H-%M-%S"))
log_dir = os.path.join(
"runs",
timestamp,
experiment_name,
model_name,
f"{conv_layers}",
f"{dropout}",
f"{hidden_units}",
).replace("\\", "/")
return SummaryWriter(log_dir=log_dir)
超参数调优
在这里,您可以看到几个超参数 - 学习率、Epoch 数、优化器类型、卷积层数、dropout 和隐藏单元数。我们可以首先固定学习率和epoch数,并尝试找到最佳的卷积层数、dropout数和隐藏单元数。一旦我们有了这些,我们就可以调整纪元数和学习率。
# Fixed Hyper Parameters/
EPOCHS = 10
LEARNING_RATE = 0.0007
"""
This code performs hyperparameter tuning for a TinyVGG model.
The hyperparameters that are tuned are the number of convolutional layers, the dropout rate, and the number of hidden units.
The results of the hyperparameter tuning are logged to a TensorBoard file.
"""
experiment_number = 0
# hyperparameters to tune
hparams_config = {
"n_conv_layers": [1, 2, 3],
"dropout": [0.0, 0.25, 0.5],
"hidden_units": [128, 256, 512],
}
for n_conv_layers in hparams_config["n_conv_layers"]:
for dropout in hparams_config["dropout"]:
for hidden_units in hparams_config["hidden_units"]:
experiment_number += 1
print(
f"\nTuning Hyper Parameters || Conv Layers: {n_conv_layers} || Dropout: {dropout} || Hidden Units: {hidden_units} \n"
)
# create the model
model = TinyVGG(
in_channels=1,
n_classes=len(training_dataset.classes),
hidden_units=hidden_units,
n_conv_blocks=n_conv_layers,
dropout=dropout,
).to(device)
# create the optimizer and loss function
optimizer = torch.optim.Adam(params=model.parameters(), lr=LEARNING_RATE)
criterion = torch.nn.CrossEntropyLoss()
# create the accuracy metrics
train_acc_metric = torchmetrics.Accuracy(
task="multiclass", num_classes=len(training_dataset.classes)
).to(device)
test_acc_metric = torchmetrics.Accuracy(
task="multiclass", num_classes=len(training_dataset.classes)
).to(device)
# create the TensorBoard writer
writer = create_writer(
experiment_name=f"{experiment_number}",
model_name="tiny_vgg",
conv_layers=n_conv_layers,
dropout=dropout,
hidden_units=hidden_units,
)
model.train()
# train the model
for epoch in range(EPOCHS):
train_step(
train_dataloader,
model,
optimizer,
criterion,
device,
train_acc_metric,
)
test_step(test_dataloader, model, device, test_acc_metric)
writer.add_scalar(
tag="Training Accuracy",
scalar_value=train_acc_metric.compute(),
global_step=epoch,
)
writer.add_scalar(
tag="Test Accuracy",
scalar_value=test_acc_metric.compute(),
global_step=epoch,
)
# add the hyperparameters and metrics to TensorBoard
writer.add_hparams(
{
"conv_layers": n_conv_layers,
"dropout": dropout,
"hidden_units": hidden_units,
},
{
"train_acc": train_acc_metric.compute(),
"test_acc": test_acc_metric.compute(),
},
)
这将需要一段时间才能运行,具体取决于您的硬件。
检查 TensorBoard 中的结果
如果您使用的是 Google Colab 或 Jupyter Notebooks,则可以使用此命令查看 TensorBoard Dashboard。
%load_ext tensorboard
%tensorboard --logdir=runs
由此,现在您可以找到最佳的超参数。
就是这样。这就是使用 TensorBoard 调整超参数的方法。在这里,为了简单起见,我们使用了网格搜索,但您可以对其他调整算法使用类似的方法,并使用 TensorBoard 来查看这些算法的实时执行情况。
- 点赞
- 收藏
- 关注作者
评论(0)