Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

+1

warmup_stepsSFTTrainer 中的一个重要参数,它的主要作用是控制学习率预热的步骤数。预热步骤的目的是在训练的初期阶段逐步增加学习率,从而减少模型训练初期的不稳定性,避免梯度爆炸或梯度消失等问题。

warmup_steps 的作用

  1. 逐步增加学习率:在训练的初期阶段,学习率从一个较低的值逐步增加到预设的学习率。这有助于模型在开始训练时更加稳定。
  2. 减少训练初期的不稳定性:通过逐步增加学习率,可以避免在训练初期由于学习率过高导致的梯度爆炸或梯度消失问题。
  3. 提高模型收敛速度:适当的预热步骤可以帮助模型更快地找到优化路径,从而提高训练效率。

如何配置 warmup_steps

配置 warmup_steps 时,需要考虑以下几个因素:

  1. 总训练步数warmup_steps 的值通常是总训练步数的一个比例。例如,如果总训练步数是 10000 步,warmup_steps 可以设置为 1000 步,即总步数的 10%3
  2. 具体任务和数据集:不同的任务和数据集可能需要不同的预热步骤数。可以通过实验来确定最适合的 warmup_steps 值。
  3. 学习率策略:如果使用了学习率调度器(如线性调度器),warmup_steps 的配置也需要与之匹配。

示例代码

以下是一个配置 warmup_steps 的示例代码:

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir='checkpoints_hf_sft',
    overwrite_output_dir=True,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    fp16=True,
    torch_compile=True,
    evaluation_strategy='steps',
    prediction_loss_only=True,
    eval_accumulation_steps=1,
    learning_rate=0.00006,
    weight_decay=0.01,
    adam_beta1=0.9,
    adam_beta2=0.95,
    warmup_steps=1000,  # 设置预热步骤数
    eval_steps=4000,
    save_steps=4000,
    save_total_limit=4,
    dataloader_num_workers=4,
    max_steps=12000,
    optim='adamw_torch_fused'
)

总结

warmup_steps 是一个用于控制学习率预热步骤数的参数,通过逐步增加学习率,可以提高模型训练的稳定性和效率。配置 warmup_steps 时需要考虑总训练步数、具体任务和数据集等因素,并通过实验来确定最适合的值125


[1] https://wqw547243068.github.io/dist
[2] https://datascience.stackexchange.com/questions/55991/in-the-context-of-deep-learning-what-is-training-warmup-steps
[3] https://github.com/huggingface/transformers/issues/6673
[4] https://blog.csdn.net/gzroy/article/details/132521808
[5] https://juejin.cn/post/7372007780570693669
[6] https://huggingface.co/docs/trl/en/sft_trainer
[7] https://discuss.pytorch.org/t/how-to-specify-training-arguments-for-huggingface-transformer-using-skorch/165740
[8] https://medium.com/@wuxiongwei/%E5%AE%9E%E6%88%98-llama-3-516f92c262be
[9] https://stackoverflow.com/questions/77792137/how-to-fix-the-learning-rate-for-huggingface%C2%B4s-trainer
[10] https://juejin.cn/post/7362119848661041215

Views: 5

+1

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注