Continual Learning via Sparse Memory Finetuning
Description
This research paper proposes a novel approach to address catastrophic forgetting in large language models (LLMs) during continual learning, introducing sparse memory finetuning. This method utilizes memory layer models, which are designed for sparse updates, by selectively training only the memory slots that are highly activated by new knowledge relative to existing information, using a TF-IDF ranking score. The authors demonstrate that this technique achieves new knowledge acquisition comparable to full finetuning and LoRA, but with substantially less degradation of previously acquired capabilities on held-out question-answering benchmarks. The results suggest that leveraging sparsity in memory layers is a highly promising strategy for enabling LLMs to continually accumulate knowledge over time.




