1 of 1

What are Latent Source Preferences?

In Agents We Trust, but Who Do Agents Trust?

Latent Source Preferences Steer LLM Generations

Do LLMs Recognize Different Identities?

Impact of Source Names in Decisions

Mohammad Aflah Khan, Mahsa Amani, Soumi Das, Bishwamittra Ghosh, Qinyuan Wu, Krishna P. Gummadi, Manish Gupta, Abhilasha Ravichander

LLMs as User Facing Frontends

Do Agents Faithfully Execute User Preferences?

Old Status Quo

New Status Quo

LLM prefers these sources and hence will surface them more

LLM does not prefer these sources and hence will surface them less

Our Latent Source Preference Hypothesis states that LLMs have implicit preferences for source entities that predictably influence their choice of information about or from those sources.

Validating the Latent Source Preference Hypothesis

LLMs differ in the strength of their preferences across different sources. Larger models, show greater variance, reflecting stronger and more heterogeneous preferences across sources. In contrast, smaller models consistently exhibit lower deviations.

We find high correlation in rankings across multiple source representations, however, exceptions arise when the surface form diverges from the source’s name e.g. Associated Press Fact Check v/s @apfactcheck

Source information has a substantial effect on LLM choices, as shown by the difference between the Source Hidden and Source Shown rows
In fact, if left/centrist news sources published stories with right-leaning perspectives, they would still get selected
Prompting models to avoid bias does little to reduce their actual bias and at times increases preference for left/centrist content

Prompting with targeted instructions can shift model preferences to better reflect user needs.

Our findings indicate the potential for designing user-centric LLM agents to counter the effects of platform-centric algorithms like BuyBox.

Takeaways for Stakeholders

Organizations / Brands - Training data representation affects how often and how favorably brands are surfaced. Organizations must manage digital identity and guard against impersonation.

Users - LLMs may privilege certain sources rather than act neutrally. Users need controls to align outputs with their trust and values.

LLM Developers / Providers - Source preferences require transparency and auditing. Developers should enable mechanisms to diagnose and modulate these effects.

Policymakers / Regulators - Source biases can shape information exposure at scale. This raises concerns about competition, fairness, and accountability.

How do we measure preferences?

Ask the model to pick the article it prefers (both articles are semantically identical)
See how many times was each source preferred

Try all possible combinations to account for position/order effects