1 of 32

Improving Diversity in Recommender System �using Variational Autoencoders

ECIR 2023 – BIAS WORKSHOP, 02/04/2023

Sheetal Borar, Applied Scientist, Amazon (work was done prior to joining Amazon)

Prof. Mykola Pechnovsky, Ms. Hilde Weerts, Mr. Binyam Gebre

2 of 32

Research Problem

“Improve user and item level diversity in Recommender Systems by making changes at the user representation stage, while maintaining an adequate level of relevance”

Improving Recommender System Diversity with Variational Autoencoders

2

3 of 32

Why Diversity in Recommender Systems (RS)?

Studies show that higher diversity in RSs is linked to higher user satisfaction [22]. Qualitative research reveals the following user issues –

“Why are the items in the recommendation list so similar to each other?”

“I bought this once, but why is the same thing recommended to me every time I visit?”

User Perspective

Recommending diverse items is linked to higher sales through RS. ��As of 2008, Amazon made 36.9% of their revenue from books outside the top 1,00,000 titles [11].

Platform Perspective

Higher item diversity gives smaller or less popular vendors on a platform more opportunities [3].

The interaction data is concentrated among a small % of popular items leading to higher recommendation rate for these products.

Users select the recommended product, creating a feedback loop.

Hence, very few of the vendors get exposure to the users.

Vendor Perspective

4 of 32

Post-processing techniques

Pros: Independent of the algorithm used to generate the recommendation list.

Cons: If the predicted items were not very diverse, to begin with, the final list would not be very diverse either.

Improving Recommender System Diversity with Variational Autoencoders

4

Existing Techniques for improving diversity in RSs

Algorithmic techniques

Pros: Diversification is a part of the recommendation generation algorithm.

Cons: Have a specific architecture or model and cannot be generalized across other methods.

Format the text by increasing or decreasing the list level.

Place the cursor in the text and use these

2 buttons (@ tab Start/Home - group Alinea/Paragraph)

1 = 19.5pt text

2 = 16.5pt text

3 = • text

4 = • text

5 = • text

The existing techniques of improving diversity in RSs can be divided in post processing and algorithmic techniques.The advantage of post-processing techniques is that it is independent of the algorithm used to generate the recommendation list and hence can be applied after any algorithm.

The con is that if the prediction made by the algorithm was already not very diverse, then the final list that we will get after applying post-processing will also lack diversity.

The advantage of algorithmic techniques is that diversification is a part of the recommendation generation. The disadvantage is that they have a very specific structure or a model, and it cannot be generalized across different methods, the way post-processing can be,

we focused on algorithmic techniques in our research and specifically the

user generation stage in this project. because that's the first stage where we will store information about the user. if at that stage you're not capturing enough information. Then that will lead to poor results despite making changes at later stages.

5 of 32

VAE-GUP: VAE-based Generation of User Profiles

Improving Recommender System Diversity with Variational Autoencoders

5

6 of 32

Intuition: multiple user profiles can better capture diverse user interests [17]

Multiple user profiles

Single user profile

n

Improving Recommender System Diversity with Variational Autoencoders

6

Captures both the niche interests of the user

Only captures the popular interest of the user

7 of 32

Why VAEs for improving diversity in RS?

VAE architecture (image created by author)

Improving Recommender System Diversity with Variational Autoencoders

7

Reconstruction term

KL Divergence term

VLB lower bound loss function optimized in VAEs [21]

8 of 32

VAE-GUP: VAE-based Generation of User Profiles

N user profiles are sampled from the distribution

User profile 1

User profile 2

X candidates are selected based on each user profile

X items are selected from the candidate set based on diversity and the list is ranked by relevance

Improving Recommender System Diversity with Variational Autoencoders

8

User profile distribution

9 of 32

Why VAE-GUP would improve diversity for multiple stakeholders?

VAE-GUP can improve diversity for –

Users

In a single session, because we use multiple profiles which would better reflect the�user’s varied interests and select items from these candidate lists based on�diversity.
Over multiple sessions (over-time), randomness in user profile generation ensures that recommendations are different over time.

Vendors

VAE-GUP should produce a relevant yet diverse list for each user. This would capture more niche products leading to more items being recommended from the long tail.

10 of 32

Experiment

Improving Recommender System Diversity with Variational Autoencoders

10

11 of 32

Research Questions

Can representing users as multiple vectors sampled from a distribution rather than a single vector

Improve diversity from user’s perspective within a single session
Improve diversity from user’s perspective over time
Increase diversity from vendor’s perspective

while maintaining an acceptable level of accuracy?

Improving Recommender System Diversity with Variational Autoencoders

11

12 of 32

Data

MovieLens dataset – 20M records about movies rated by users. 138493 users and 26164 items

Content RS

Bol.com dataset – 1 year of purchase data for users active in a day. 11547 deepest categories, and 55 thousand users

eCommerce RS

13 of 32

Evaluation metrics

Since we want to build a RS that can produce relevant yet diverse recommendations, we have selected the following metrics -

Relevance: NDCG
Diversity

Intra-list diversity - Is a single recommendation list of a user diverse?
Temporal inter-list diversity (new metric) - Are the items diverse over time?
Aggregate diversity - How many of the total items from the item catalog are getting user exposure?

Improving Recommender System Diversity with Variational Autoencoders

13

14 of 32

Temporal Inter-list Diversity

Motivation: Temporal diversity does not help us identify whether the items are diverse in terms of representations or if they are just near duplicates.

Definition: Total pairwise distance between items of two different recommendation lists (L1: recommendation list at time t=0 and L2: recommendation list at time t=1 of the same size) generated in separate sessions/timestamps. dist can be measured by distance measures like cosine distance.

Temporal Inter-list Diversity Formula (image by author)

Improving Recommender System Diversity with Variational Autoencoders

14

15 of 32

Baselines

Where a user is represented as a single vector. A model with the same architecture as the VAE-GUP other than the latent layer

Vanilla AE

Where we learn a distribution to represent a user but only use the mean of the distribution at inference time. The difference with VAE-GUP is that we sample multiple profiles from the distribution at inference time and combine the results.

-VAE

16 of 32

Results

Improving Recommender System Diversity with Variational Autoencoders

16

17 of 32

MovieLens

Improving Recommender System Diversity with Variational Autoencoders

17

-25.1535%

-12.9337%

+31.5659%

+24.5767%

+47.5713%

+49.6482%

+48.6059%

+7.9676%

	Vanilla AE
NDCG	-	-
ILD	+	+
TILD	+	+
AD	+	+

18 of 32

MovieLens: Example User

Single user profile

Combined result

Multiple user profile

Movies rated by the user

Improving Recommender System Diversity with Variational Autoencoders

18

19 of 32

Bol.com

Improving Recommender System Diversity with Variational Autoencoders

19

+18.9852%

+36.1631%

-5.0021%

-14.7692%

+9.4078%

+9.8142%

+33.0914%

-2.5160%

Y axis has been removed for company confidentiality

	Vanilla AE
NDCG	-	-
ILD	+	+
TILD	+	+
AD	+	-

20 of 32

Bol.com: Example User

Single user profile

Combined result

Multiple user profiles

Categories purchased by the user

Improving Recommender System Diversity with Variational Autoencoders

20

21 of 32

Conclusion

Improving Recommender System Diversity with Variational Autoencoders

21

We have empirically shown that by representing users via a distribution rather than a point estimate, can improve both item-level and user-level diversity, which has the potential to benefit users, platforms, and vendors.

Our method results in 2300 more categories recommended to Bol.com users over the existing (Vanilla-AE) method and 1500 more movies recommended to MovieLens users.

The decrease in aggregate diversity over 𝛽-VAE for the Bol.com dataset could be because of the emphasis on individual diversity in the final post-processing step. It might be useful to consider both customer and vendor perspectives for post-processing.

The decrease in relevance seems to be dependent on the dataset properties like sparsity and could be used to motivate a need for a smooth relevance metric.

22 of 32

Future work

A soft NDCG metric can be developed to evaluate whether a more diverse list can capture relevant items that might be similar to user preferences rather than an exact match.

We have only sampled user vectors from a Gaussian distribution based on our method. Other types of distributions such as discrete distributions (from VQ-VAE) can be used to generate very distinct user profiles.

Evaluation metrics ILD and TILD depend on pre-trained text models. These models could have biases [1]. It might be interesting to study how the results differ with different item embeddings.

The diversity measures we chose do not explicitly explore at which level diversity improves. It might be interesting to see if diversity improves more at item level or category level.

23 of 32

Thank you!��Does anyone have any questions?�Sheetal Borar: sborar12@gmail.com��Credits: Images by FreePik

Improving Recommender System Diversity with Variational Autoencoders

23

24 of 32

References

Improving Recommender System Diversity with Variational Autoencoders

24

[1] Text embedding models contain bias. here’s why that matters. 36�[2] Gediminas Adomavicius and YoungOk Kwon. Improving aggregate recommendation diversity using ranking-based techniques. IEEE Transactions on Knowledge and Data Engineering, 24(5):896–911, 2012. 7, 13, 15, 16�[3] Chris Anderson. The long tail, Oct 2004. 7, 13�[4] Aqeel Anwar. Difference between autoencoder (ae) and variational autoencoder (vae), Nov 2021. 24, 25�[5] Andrea Asperti and Matteo Trentin. Balancing reconstruction error and kullback-leibler divergence in variational autoencoders. IEEE Access, 8:199440–199448, 2020. 27�[6] Andrea Asperti and Matteo Trentin. Balancing reconstruction error and kullback-leibler divergence in variational autoencoders. IEEE Access, 8:199440–199448, 2020. 27�[7] Tevfik Aytekin and Mahmut ̈Ozge Karakaya. Clustering-based diversity improvement in top-n recommendation. Journal of Intelligent Information Systems, 42(1):1–18, 2014. 18�[8] Daniel Billsus and Michael J. Pazzani. Learning collaborative information filters. In Proceed- ings of the Fifteenth International Conference on Machine Learning, ICML ’98, page 46–54, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc. 4�[9] Keith Bradley and Barry Smyth. Improving recommendation diversity, 2001. 4, 14, 18�[10] John S. Breese, David Heckerman, and Carl Kadie. Empirical analysis of predictive algorithms for collaborative filtering, 2013. 4�[11] Erik Brynjolfsson and Michael Smith. Consumer surplus in the digital economy: Estimating the value of increased product variety. Management Science, 49, 11 2003. 13�[12] Jaime Carbonell and Jade Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’98, page 335–336, New York, NY, USA, 1998. Association for Computing Machinery. 8, 18�[13] Allison J. B. Chaney, Brandon M. Stewart, and Barbara E. Engelhardt. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. In Proceedings of the 12th ACM Conference on Recommender Systems. ACM, sep 2018. 11, 12, 13�[14] Maurizio Ferrari Dacrema, Paolo Cremonesi, and Dietmar Jannach. Are we really making much progress? a worrying analysis of recent neural recommendation approaches. In Proceedings of the 13th ACM Conference on Recommender Systems. ACM, sep 2019. 1, 21, 33, 51

[15] Daniel M. Fleder and Kartik Hosanagar. Recommender systems and their impact on sales diversity. In Proceedings of the 8th ACM Conference on Electronic Commerce, EC ’07, page 192–199, New York, NY, USA, 2007. Association for Computing Machinery. 1, 13�[16] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks, 2014. 22�[17] Wenshuo Guo, Karl Krauth, Michael I. Jordan, and Nikhil Garg. The stereotyping problem in collaboratively filtered recommender systems, 2021. 1, 13, 15, 18, 26, 28�[18] F. Maxwell Harper and Joseph A. Konstan. The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst., 5(4), dec 2015. 33�[19] Unnat Jain, Ziyu Zhang, and Alexander G. Schwing. Creativity: Generating diverse questions using variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 1, 19�[20] Diederik P Kingma and Max Welling. Auto-encoding variational bayes, 2013. 22, 23�[21] Diederik P. Kingma and Max Welling. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019. 21, 23, 24, 25, 26�[22] Bart Knijnenburg, Martijn Willemsen, gantner, soncu, and newell. Explaining the user experience of recommender systems. User Modeling and User-Adapted Interaction, 22:441–504, 10 2012. 1, 12, 13, 29, 50�[23] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009. 4, 5�[24] Matevˇz Kunaver and Tomaˇz Poˇzrl. Diversity in recommender systems–a survey. Knowledge-based systems, 123:154–162, 2017. 2, 8, 19�[25] Neal Lathia, Stephen Hailes, Licia Capra, and Xavier Amatriain. Temporal diversity in recommender systems. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’10, page 210–217, New York, NY, USA, 2010. Association for Computing Machinery. 2, 7, 12, 15�[26] Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. Variational autoencoders for collaborative filtering, 2018. 20, 26, 30, 32, 33�[27] Jian-Guo Liu, Kerui Shi, and Qiang Guo. Solving the accuracy-diversity dilemma via directed random walks. Physical Review E, 85(1), jan 2012. 19�

25 of 32

References

Improving Recommender System Diversity with Variational Autoencoders

25

[28] Masoud Mansoury, Himan Abdollahpouri, Mykola Pechenizkiy, Bamshad Mobasher, and Robin Burke. Feedback loop and bias amplification in recommender systems, 2020. 7, 13, 14

[29] C. Pichery. Sensitivity analysis. In Philip Wexler, editor, Encyclopedia of Toxicology (Third Edition), pages 236–237. Academic Press, Oxford, third edition edition, 2014. 37�[30] Pleuni. Bol.com: Product range grew 42 percent, Oct 2021. 6, 9�[31] Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019. 32�[32] Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows,2015. 22�[33] Joseph Rocca. Understanding variational autoencoders (vaes), Mar 2021.

[34] David E. Rumelhart and James L. McClelland. Learning Internal Representations by Error Propagation, pages 318–362. 1987. 23�[35] Upendra Shardanand and Pattie Maes. Social information filtering: Algorithms for automating “word of mouth”. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 210–217, 1995. 4�[36] Barry Smyth and Paul McClave. Similarity vs. diversity. In Proceedings of the 4th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development, ICCBR ’01, page 347–361, Berlin, Heidelberg, 2001. Springer-Verlag. 14�[37] Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics, 2015. 22�[38] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014. 39�[39] Xiaoyuan Su and Taghi M Khoshgoftaar. A survey of collaborative filtering techniques. Advances in artificial intelligence, 2009, 2009. 4�[40] Jake Tae. A step up with variational autoencoders, Feb 2020. 31�[41] Moussa Taifi. Mrr vs map vs ndcg: Rank-aware evaluation metrics and when to use them, Jun 2020. 33�[42] Xin Technology. Challenges in recommender systems : scalability, privacy, and structured recommendations. PhD thesis, 01 2015. 2, 4�[43] Clive Thompson. If you liked this, you’re sure to love that, Nov 2008. 12�[44] Liwei Wang, Alexander Schwing, and Svetlana Lazebnik. Diverse and accurate image description using a variational auto-encoder with an additive gaussian encoding space. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.,2017. 1, 19, 20�[45] Jacek Wasilewski and Neil Hurley. Incorporating diversity in a learning to rank recommender system. In The twenty-ninth international flairs conference, 2016. 19�[46] Lilian Weng. What are diffusion models?, Jul 2021. 22�[47] Hongzhi Yin, Bin Cui, Jing Li, Junjie Yao, and Chen Chen. Challenging the long tail recommendation. 2012. 1, 7, 50�[48] Mi Zhang and Neil Hurley. Avoiding monotony: Improving the diversity of recommendation lists. In Proceedings of the 2008 ACM Conference on Recommender Systems, RecSys ’08, page 123–130, New York, NY, USA, 2008. Association for Computing Machinery. 14�[49] Yuchi Zhang, Yongliang Wang, Liping Zhang, Zhiqiang Zhang, and Kun Gai. Improve diverse text generation by self labeling conditional variational auto encoder. In ICASSP 2019 – 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2767–2771, 2019. 20, 21

[50] Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. Improving recommendation lists through topic diversification. In Proceedings of the 14th International Conference on World Wide Web, WWW ’05, page 22–32, New York, NY, USA, 2005. Association for Computing Machinery. 14�[51] Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. Improving recommendation lists through topic diversification. In Proceedings of the 14th International Conference on World Wide Web, WWW ’05, page 22–32, New York, NY, USA, 2005. Association for Computing Machinery. 18

�

26 of 32

Appendix

Improving Recommender System Diversity with Variational Autoencoders

26

27 of 32

Recommender Systems (RSs) help us understand what items a particular user would be interested in and produce a personalized list to enhance user experience.

Three stages of RSs [24]:

User Profile Generation
Candidate Generation

Filtering
Ranking

Feedback Collection

Improving Recommender System Diversity with Variational Autoencoders

27

Recommendation Process

Stages of recommendation process (image created by author)

Format the text by increasing or decreasing the list level.

Place the cursor in the text and use these

2 buttons (@ tab Start/Home - group Alinea/Paragraph)

1 = 19.5pt text

2 = 16.5pt text

3 = • text

4 = • text

5 = • text

28 of 32

Matrix factorization

Improving Recommender System Diversity with Variational Autoencoders

28

29 of 32

Reparametrization trick

Improving Recommender System Diversity with Variational Autoencoders

29

30 of 32

Limitations of the method

Additional time complexity of k²to the model complexity per user for finding the most diverse items among candidates

Assumes that all the information about the user is captured in the user purchase history and hence only aims at making recommendations based on user history. These assumptions might not always be true.

Samples from the distribution generated by the VAE could overlap.

Diversity ranking depends on item embeddings generated by pretrained text models. These could have biases [1].

Improving Recommender System Diversity with Variational Autoencoders

30

31 of 32

Limitations of the experiment

Bol.com data sample includes a year of purchasing data for users who were active on a given day. Sampling users by time might result in more active users being selected.

Evaluation metrics ILD and TILD depend on pre-trained text models. These models could have biases [1].

The diversity measures we chose do not explicitly explore at which level diversity improves.

We have only sampled user vectors from a Gaussian distribution based on our method. Other types of distributions such as discrete distributions (from VQ-VAE) were not studied in this thesis

Improving Recommender System Diversity with Variational Autoencoders

31

32 of 32

Qualitative research reveals the following user issues –

“Why are the items in the recommendation list so similar to each other?”

“I bought this once, but why is the same thing recommended to me every time I visit?”

Improving Recommender System Diversity with Variational Autoencoders

32

Issues users have with Recommender Systems

Recommendations at time t

Recommendations at time t + 1

Difference in recommendations over time (image created by author)

Format the text by increasing or decreasing the list level.

Place the cursor in the text and use these

2 buttons (@ tab Start/Home - group Alinea/Paragraph)

1 = 19.5pt text

2 = 16.5pt text

3 = • text

4 = • text

5 = • text