DiscoverDaily Paper CastSafeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging

Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging

Update: 2024-12-31
Share

Description

🤗 Upvotes: 6 | cs.CL



Authors:

Hua Farn, Hsuan Su, Shachi H Kumar, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee



Title:

Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging



Arxiv:

http://arxiv.org/abs/2412.19512v1



Abstract:

Fine-tuning large language models (LLMs) for downstream tasks is a widely adopted approach, but it often leads to safety degradation in safety-aligned LLMs. Currently, many solutions address this issue by incorporating additional safety data, which can be impractical in many cases. In this paper, we address the question: How can we improve downstream task performance while preserving safety in LLMs without relying on additional safety data? We propose a simple and effective method that maintains the inherent safety of LLMs while enhancing their downstream task performance: merging the weights of pre- and post-fine-tuned safety-aligned models. Experimental results across various downstream tasks, models, and merging methods demonstrate that this approach effectively mitigates safety degradation while improving downstream task performance, offering a practical solution for adapting safety-aligned LLMs.

Comments 
In Channel
1.58-bit FLUX

1.58-bit FLUX

2024-12-3122:59

loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging

Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging

Jingwen Liang, Gengyu Wang