1 of 13

Bayesian Modeling for the Social Sciences II:

HLMs, GLMs, and LLMs

Marc Ratkovic

Chair of Social Data Science

Professor of Political Science and Data Science

University of Mannheim

2 of 13

Agenda

  • Topics Covered
  • What is the Connection?
  • Specific Topics 1
  • Specific Topics 2
  • Integrated Topics
  • Software: Bayesian Component
  • Software: LLM Component 1
  • Software: LLM Component 2
  • Tasks

3 of 13

Topics Covered

Bayesian Statistics

Bayesian Machine Learning & LLMs

Exploring fundamental principles, including prior and posterior distributions, Bayes' theorem, and probability models.

Applying Bayesianism to machine learning, including neural networks and large language models like GPT-4

4 of 13

What is the Connection?

Bayesianism and Complex Models

  • Bayesian methods are essential for handling complex statistical models due to their flexibility in incorporating prior information.
  • We start with simpler models and progressively build up to more complex regressions like Hierarchical Linear Models (HLMs), LASSO, and scaling models.
  • The course eventually covers neural networks and deep learning techniques, showcasing the broad applicability of Bayesian methods in modern machine learning.

5 of 13

Specific Topics 1

Probability

Conjugate Priors

Fundamental to Bayesian inference, probability defines the likelihood of events based on prior knowledge.

Used in Bayesian statistics to simplify calculations, as the posterior distributions are the same type as the prior distributions.

6 of 13

Specific Topics 2

Linear and Generalized Linear Models

Sparse Regression and IRT Models

Covering standard linear regression and extending to models that handle non-normal error distributions and link functions.

Exploring Lasso and Horseshoe for sparse regression, and scaling and IRT models for handling large datasets.

7 of 13

Integrated Topics

Coding in BRMS and STAN

Model Validation

Implementing Bayesian models using BRMS and STAN, focusing on syntax and functionality.

Ensuring model accuracy with convergence checks, posterior predictive checks, and model selection.

8 of 13

Software: Bayesian Component

Setup and Tools

Exercises and Models

Setting up R/STAN/BRMS locally and via Docker for Bayesian analysis.

Inference on a sample mean, effect of priors, linear and generalized linear models, HLMs, scaling models in IDEAL and BRMS.

9 of 13

Software: LLM Component 1

GitHub Registration

Hugging Face and Docker Hub

Students will register at GitHub to manage code versions and collaborate on projects.

Students will register at Hugging Face for NLP models and Docker Hub for containerized applications.

10 of 13

Software: LLM Component 2

Prompt Engineering

Talking with a PDF

Design prompts to guide LLMs in generating relevant and accurate responses.

Implement NLP techniques to interact with the content of PDF documents.

11 of 13

Build a Chatbot

Sampling and Scaling

Advanced Regression and MRP

Agents and Fine-Tuning

Simple Regression and Diagnostics

Learn to design and implement chatbots using LangFlow, focusing on prompt engineering and interaction with text data to enable effective conversations.

Explore advanced sampling techniques like Gibbs Sampler and Hamiltonian Monte Carlo (HMC) and understand scaling methods for item response theory models.

Working with more advanced regression techniques, like hierarchical, sparse, and multilevel models, with an application to survey analysis.

Delve into fine-tuning large language models and creating intelligent agents to improve model adaptability and enable task-specific performance enhancements.

Understand simple regression models and diagnostics.

Tasks

Course Overview

12 of 13

Conclusion

Thank you!

13 of 13

Grades

  • Final exam
    • Open book/computer/notes
    • Math and Coding
  • Tutorial assignments (80% or above required for exam)
  • In-class Quizzes (10 mins, beginning of class, 80% or above required for exam)