This website requires JavaScript.

Do Localization Methods Actually Localize Memorized Data in LLMs?

Ting-Yun ChangJesse ThomasonRobin Jia
Nov 2023
0被引用
0笔记
摘要原文
Large language models (LLMs) can memorize many pretrained sequences verbatim. This paper studies if we can locate a small set of neurons in LLMs responsible for memorizing a given sequence. While the concept of localization is often mentioned in prior work, methods for localization have never been systematically and directly evaluated; we address this with two benchmarking approaches. In our INJ Benchmark, we actively inject a piece of new information into a small subset of LLM weights and measure whether localization methods can identify these "ground truth" weights. In the DEL Benchmark, we study localization of pretrained data that LLMs have already memorized; while this setting lacks ground truth, we can still evaluate localization by measuring whether dropping out located neurons erases a memorized sequence from the model. We evaluate five localization methods on our two benchmarks, and both show similar rankings. All methods exhibit promising localization ability, especially for pruning-based methods, though the neurons they identify are not necessarily specific to a single memorized sequence.
展开全部
机器翻译
AI理解论文&经典十问
图表提取
参考文献
发布时间 · 被引用数 · 默认排序
被引用
发布时间 · 被引用数 · 默认排序
社区问答