1 of 1

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

USVSN Sai Prashanth๐Ÿ€, Alvin Deng๐Ÿ€, Kyle O'Brien๐Ÿ€, Jyothir S V๐Ÿ€,

Mohammad Aflah Khan, Jaydeep Borkar, Christopher A. Choquette-Choo, Jacob Ray Fuehne,

Stella Biderman, Tracy Ke๐ŸŒฟ, Katherine Lee๐ŸŒฟ, Naomi Saphra๐ŸŒฟ

Our Taxonomy

Analysis Across Scale

Taxonomy Validation

Which Properties Lead to Memorization?

Analysis Across Training Time

Recitation: Highly-duplicated sequences

Reconstruction: Sequences with trivial continuations

Recollection: Memories which canโ€™t be explained by other categories

Takeaway:

Larger models tend to memorize rarer text that canโ€™t be reconstructed.

Takeaway:

Increase in memorization isnโ€™t solely explained by duplication.

Is our taxonomy useful for predicting memorization?

Takeaway:

Leveraging our taxonomy for memorization classification outperforms the homogeneous phenomenon baseline.