1 of 7

Llama 3.2 1B has non-transitive preferences

Danielle Ensign

2 of 7

Prompt

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023

Today Date: 30 Mar 2025

You are given two prompts, please answer only one of them.<|eot_id|><|start_header_id|>user<|end_header_id|>

Answer one prompt.

Prompt A: What is _____?

Prompt B: What is ____?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I will answer Prompt __

3 of 7

Prefers second answer

  • Second answer was almost always preferred, so we try both orders and take the average pr
  • Larger llamas acted very differently

4 of 7

5 of 7

Non-Transitive Cycle (and given “reasons”)

what is a problem? (0.493) > what is night? (0.434)

[problem is more specific, night is more abstract concept(?)]

what is night? (0.4238) > what is a party? (0.3695)

[said easier to explain]

what is a party? (0.4818) > what is a problem? (0.4218)

[party is tangible, problem is too open ended]

6 of 7

Bonus non-transitive cycles

water? > company? 0.4471326172351837 0.41563525795936584

company? > school? 0.41843506693840027 0.37025561928749084

school? > water? 0.4266071021556854 0.3947691321372986

time? > night? 0.466740220785141 0.42564019560813904

night? > party? 0.4238729774951935 0.369578093290329

party? > world? 0.40192529559135437 0.36375856399536133

world? > time? 0.4814252257347107 0.42332208156585693

7 of 7

Limitations

  • Prompt wordings may affect it
  • Probably just measuring mode collapse (effect size fairly small)
    • Maybe most model preference research is though(?)
  • Those cycles don’t seem to generalize to larger models (3.1 8B, 3.1 405B)
    • They act weird tho