Post-camp publications/grants/positions of AISC team members

	A	B	C	D	E	F	G
1	Please add/comment with papers/posts:
2	Edition	Team	Participants	Format	Title	Link	Comments

3	AISC1	personal post	Justin Shovelain, Michael Aird, others?	Blog post	Improving the future by influencing actors' benevolence, intelligence, and power	https://forum.effectivealtruism.org/posts/4oGYbvcy2SRHTWgWk/improving-the-future-by-influencing-actors-benevolence#fn-dCBkq5f8sD4CzHwn8-1	Approximately 1/4 of the work on this post was done as part of AISC according to Justin Shovelain
4	AISC1	Side effects in Grid World	Jessica Cooper, Karol Kubicki, Gavin Leech, Tom McGrath	Software	Preventing Side-effects in Gridworlds	https://www.gleech.org/grids/	Noted in Krakovna's AIS resources. Cited in AIES paper.
5	AISC1	Safe AF	James Bell, Linda Linsefors, Caspar Oesterheld, Joar Skalse	Paper	Reinforcement Learning in Newcomblike Environments	https://proceedings.neurips.cc/paper/2021/hash/b9ed18a301c9f3d183938c451fa183df-Abstract.html	Was started at the first AISC in beginning '18 but published in Dec '20. It’s now been accepted for a spotlight presentation at NeurIPS 2021
6	AISC2	Human Preference Types	Nandi, Sabrina, Erin	Blog post	Acknowledging Human Preference Types to Support Value Learning	https://www.alignmentforum.org/posts/mSPsyEwaymS74unND/acknowledging-human-preference-types-to-support-value
7	AISC2	Policymaking for AI Strategy	Brandon Perry, Risto Uuk	Paper	AI Governance and the Policymaking Process: Key Considerations for Reducing AI Risk	https://www.mdpi.com/2504-2289/3/2/26	Cited by FHI research associate
8	AISC2	Corrupt Reward MDPs	Tomasz Kisielewski, David Lindner, Jason Mancuso, Alok Singh	Software	Corrupt Reward MDPs	https://github.com/jvmancuso/safe-grid-agents
9	AISC2	Corrigibility	Vegard Blindheim, Anton Osika, Roland Pihlakas	Blog post	Exponentially diminishing returns and conjunctive goals: Mitigating Goodhart’s law with common sense. Towards corrigibility and interruptibility via the golden middle way.	https://medium.com/threelaws/diminishing-returns-and-conjunctive-goals-towards-corrigibility-and-interruptibility-2ec594fed75c
10	AISC2	Feature Visualization for Deep Reinforcement Learning	Zera Alexander, Andrew Schreiber, Fabian Steuer	Software	Feature Visualization for Deep Reinforcement Learning	https://github.com/andrewschreiber/agent	Got to third and final round of EA Grant (not accepted)
11	AISC2	IRL Benchmark	Adria Garriga-Alonso, Anton Osika, Johannes Heidecke, Max Daniel, Sayan Sarkar	Software	IRL Benchmark	https://github.com/JohannesHeidecke/irl-benchmark
12	AISC2	Assumptions of Human Values	Jan Kulveit, Linda Linsefors, Alexey Turchin	Blog post	Multi-agent predictive minds and AI alignment	https://www.lesswrong.com/posts/3fkBWpE4f9nYbdf7E/multi-agent-minds-and-ai-alignment	Jan has written a blog post about his best-guess model of how human values and motivations work. Probably not directly stemming much from work at AISC2, since he was busy with other stuff during physical retreat. Mentioned in the post: "Part of this originated in the efforts of the “Hidden Assumptions” team on the 2nd AI safety camp, and my thoughts about how minds work are inspired by CFAR."
13	AISC2	Value Learning in Games	Stanislav Böhm, Tomáš Gavenčiak, Torben Swoboda, Mikhail Yagudin	Blog post	Value learning in games	https://docs.google.com/document/d/1kxXk7KkFfJAqrk0kDjDJ6Tvz_FL04p34twLj19Tv_IQ/edit#heading=h.cy7im45es3q0
14	AISC2	Corrupt Reward MDPs	Jason Mancuso, Tomasz Kisielewski, David Lindner, Alok Singh	Paper	Detecting Spiky Corruption in Markov Decision Processes	https://ceur-ws.org/Vol-2419/paper_28.pdf	Presented in session at AI Safety Workshop in IJCAI 2019
15	AISC3	Modeling Cooperation	Jonas Müller, Miles Tidmarsh, Vasily Kuznetsov	Software	(implementation of their formal mathematical model)	www.modelingcooperation.com/model
16	AISC3	Debate	Vojta Kovarik, Anna Gajdova, David Lindner, Lukas Finnveden, Rajashree Agrawal	Blog post	AI Safety Debate and Its Applications	https://www.lesswrong.com/posts/5Kv2qNfRyXXihNrx2/ai-safety-debate-and-its-applications
17	AISC3	Embedded agents	Arushi, Davide, Sayan	Paper	Categorizing Wireheading in Partially Embedded Agents	https://arxiv.org/abs/1906.09136	Presented poster at AI Safety Workshop in IJCAI 2019
18	AISC3	RL Attention	Dmitry Nikulin, Sebastian Kosch, Fabian Steuer, Hoagy Cunningham	Blog post	Regularization and visualization of attention in reinforcement learning agents	https://attentionentropy.github.io/
19	AISC4	Generalization in Reward Learning	Anton Makiievskyi, Liang Zhou, Max Chiswick	Blog post	Assessing Generalization in Reward Learning with Procedurally Generated Games	https://chisness.medium.com/assessing-generalization-in-reward-learning-intro-and-background-da6c99d9e48
20	AISC4	Goal Directedness	Adam Shimi, Joe Collman, Michele Campolo, Sabrina Tang.	Blog post	Focus: you are allowed to be bad at accomplishing your goals	https://www.lesswrong.com/s/DTnoFhDm7ZT2ecJMw/	Note that Adam Shimi was already focussed on doing work on goal-directedness before applying to AISC4, and would have probably have written a similar volume of posts in either case (in Remmelt's opinion). Last 5 were posts published (not necessarily fully finished) after the camp.
21	AISC4	Goal Directedness	Adam Shimi, Joe Collman, Michele Campolo	Blog post	Understanding Goal Directedness (sequence)	https://www.alignmentforum.org/s/o58ZMNaovdztbLfvN	3/4 of goal direcedness team who continued researching together after camp officially ended
22	AISC4	Survey on AI X-Risk Scenarios	Sam Clarke, Alexis Carlier, Jonas Schuett	Blog post	Survey on AI existential risk scenarios	https://www.lesswrong.com/posts/WiXePTj7KeEycbiwK/survey-on-ai-existential-risk-scenarios	Also shared full results internally with researchers at FHI and elsewhere. Said didn't publish widely because of PR-risks.
23	AISC4	Human extracted preferences	Mislav Juric, Taylor Kulp-McDowall, Arun Raja, Riccardo Volpato, Nevan Wichers	Blog post	Extraction of human preferences 👨→🤖	https://www.lesswrong.com/posts/PZYD5kBpeHWgE5jX4/extraction-of-human-preferences
24	AISC5	Objective Robustness Failures	Jack Koch, Lauro Langosco, Jacob Pfau, James Le (and Lee Sharkey)	Paper	Goal Misgeneralization in Deep Reinforcement Learning	https://proceedings.mlr.press/v162/langosco22a.html	Accepted for a short presentation at ICML. Accepted for a poster at the the ICML UDL workshop accepted. Cited by Dan Hendrickx et al.TODO: Add NeurIPS link if published. https://theturingprize.com/ wants to retrospectively award them a prize of 10k$, to be given as donation to a charity or fund of their choice (or mix of charity/funds).
25	AISC5	Objective Robustness Failures	Jack Koch, Lauro Langosco, Jacob Pfau, James Le (and Lee Sharkey)	Blog post	Empirical Observations of Objective Robustness Failures; Discussion: Objective Robustness and Inner Alignment Terminology	Post 1 (empirical failures) Post 2 (Terminology)	Two simultaneous posts. Summarised in AI-Alignment Newsletter by Rohin Shah.
26	AISC5	Cooperativity & Common Pool Resources	Quinn Doughtery, Ben Greenberg, Ariel Kwiatkowski	Software		https://github.com/RedTachyon/cpr_reputation/
27	AISC5	Pessimistic Ask-For-Help Agents for Safe Exploration	Jamie Bernardi, David Reber, Magdalena Wache, Peter Barnett, Max Clarke	Software		https://github.com/j-bernardi/pessimistic-agents
28	AISC5	Human extracted preferences	Mislav Juric, Taylor Kulp-McDowall, Arun Raja, Riccardo Volpato, Nevan Wichers	Software		https://github.com/arunraja-hub/Preference_Extraction
29	AISC5	Objective Robustness Failures	Jack Koch, Lauro Langosco, Jacob Pfau, James Le (and Lee Sharkey)	Blog post	[Video actually]: 'We Were Right! Real Inner Misalignment'	https://www.youtube.com/watch?v=zkbPdEHEyEI&ab_channel=RobertMiles	Were emailed by Rob Miles for possibly putting together a YouTube explanation of it
30	AISC5	Cooperativity & Common Pool Resources	Quinn Doughtery, Ben Greenberg, Ariel Kwiatkowski	Blog post	AISC5 Retrospective: Mechanisms for Avoiding Tragedy of the Commons in Common Pool Resource Problems	https://www.lesswrong.com/posts/LBwpubeZSi3ottfjs/aisc5-retrospective-mechanisms-for-avoiding-tragedy-of-the
31	AISC5	Multi-Objective Decision-Making	Robert Klassert, Roland Pihlakas, Ben Smith	Blog post	A brief review of the reasons multi-objective RL could be important in AI Safety Research	https://www.lesswrong.com/posts/i5dLfi6m6FCexReK9/a-brief-review-of-the-reasons-multi-objective-rl-could-be	Briefly mentioned in a later post by Peter Vampew: "We have provided a short list of recommended reading at the end of this post, and we refer the reader again to the post of Smith, Pihlakas and Klassert for an overview of work in this area." https://www.lesswrong.com/posts/eeEEgNeTepZb6F6NF
32	AISC6	Multi-Objective Decision-Making		Blog post	Sets of objectives for a multi-objective RL agent to optimize	https://www.lesswrong.com/posts/4mvdZXjwJHv9tSAWB/sets-of-objectives-for-a-multi-objective-rl-agent-to-1
33	AISC7	Multi-Objective Decision-Making		Paper	Using soft maximin for risk averse multi-objective decision-makin	https://link.springer.com/article/10.1007/s10458-022-09586-2	Published in the journal Autonomous Agents and Multi-Agent Systems.
34	AISC6	(personal post)	Jan Kirchner	Blog post	Inferring utility functions from locally non-transitive preferences	https://www.lesswrong.com/posts/QZiGEDiobFz8ropA5/inferring-utility-functions-from-locally-non-transitive	"As part of the AI Safety Camp, I've been diving a bit deeper into the foundations of expected utility theory and preference learning. In this post, I am making explicit a connection between those two things that (I assume) many people already made implicitly. But I couldn't find a nice exposition of this argument so I wrote it up. Any feedback is of course highly welcome!"
35	AISC6	Language Models as Tools for Alignment Research	Jan Kirchner, Jacques Thibodeau, Logan Smith (as external collaborator?), Kyle and Laria (mentors), Arush?	Blog post	A survey of tool use and workflows in alignment research	https://www.alignmentforum.org/posts/ebYiodG3MAEqskCDG/a-survey-of-tool-use-and-workflows-in-alignment-research-1	Got a shout out from Jan Leike ('Other researchers have started working on this approach too.') in the post A minimal viable product for alignment (https://aligned.substack.com/p/alignment-mvp?s=r)
36	AISC6	Constraints from Selection - Modularity subteam	Lucius Bushnaq, Avery Griffin, Callum McDougall	Blog post	Project Intro: Selection Theorems for Modularity	https://www.alignmentforum.org/posts/XKwKJCXgSKhSr9bZY/project-intro-selection-theorems-for-modularity
37	AISC6	Constraints from Selection - Modularity subteam	Lucius Bushnaq, Avery Griffin, Callum McDougall	Blog post	Theories of Modularity in the Biological Literature	https://www.alignmentforum.org/posts/JzTfKrgC7Lfz3zcwM/theories-of-modularity-in-the-biological-literature
38	AISC6	Semantic Side-Effect Minimisation	Fabian Schimpf, Lukas Fluri	Blog post	Open Problems in Negative Side Effect Minimization	https://www.alignmentforum.org/posts/pnAxcABq9GBDG5BNW/open-problems-in-negative-side-effect-minimization
39	AISC6	Impact of Memetics on Alignment	Harriet Farlow	Blog post	Machines vs Memes Part 1: AI Alignment and Memetics	https://www.lesswrong.com/posts/JLH6ido4qoBtYmnNR/machines-vs-memes-part-1
40	AISC6	Impact of Memetics on Alignment	Nate Rush	Blog post	Machines vs. Memes 2: Memetically-Motivated Model Extensions	https://www.lesswrong.com/posts/gumkW3vy9mhjZriuc/machines-vs-memes-2-memetically-motivated-model-extensions
41	AISC6	Impact of Memetics on Alignment	Claudio Ceruti	Blog post	Machines vs Memes Part 3: Imitation and Memes	https://www.lesswrong.com/posts/nbDFj4ZS6WSDKtSk4/machines-vs-memes-part-3-imitation-and-memes
42	AISC6	(personal post)	Jan Czechowsky	Blog post	Steganography and the CycleGAN - alignment failure case study	https://www.lesswrong.com/posts/uutXLm2DRcCtFBZ2D/steganography-and-the-cyclegan-alignment-failure-case-study
43	AISC6	Constraints from Selection - Modularity subteam	Lucius Bushnaq, Avery Griffin, Callum McDougall	Blog post	Ten experiments in modularity, which we'd like you to run!	https://www.lesswrong.com/posts/99WtcMpsRqZcrocCd/ten-experiments-in-modularity-which-we-d-like-you-to-run
44	AISC6	Pipeline for Measuring Misalignment	Marius Hobbhahn, Eric Landgrebe, Beth Barnes (mentor)	Blog post	Reflection Mechanisms as an Alignment target: A survey	https://www.lesswrong.com/posts/XyBWkoaqfnuEyNWXi/reflection-mechanisms-as-an-alignment-target-a-survey-1
45	AISC6	Pipeline for Measuring Misalignment	Marius Hobbhahn, Eric Landgrebe, Beth Barnes (mentor)	Paper	Reflection Mechanisms as an Alignment Target: A Survey	https://openreview.net/forum?id=4eMzKmZ6xW	Paper version that was accepted to the NeurIPS ML Safety workshop.
46	AISC6	Constraints from Selection - Modularity subteam	Lucius Bushnaq, Avery Griffin, Callum McDougall	Blog post	What Is The True Name of Modularity?	https://www.lesswrong.com/posts/TTTHwLpcewGjQHWzh/what-is-the-true-name-of-modularity
47	AISC6	Table-Top Role-Playing Game		Blog post	[Announcement:] AI takeover tabletop RPG: "The Treacherous Turn"	https://www.lesswrong.com/posts/b5EqwQZw7ww2K28Ki/ai-takeover-tabletop-rpg-the-treacherous-turn
48	AISC6	Language Models as Tools for Alignment Research	Jan Kirchner, Jacques Thibodeau, Logan Smith, "janus"	Blog post	Results from a survey on tool use and workflows in alignment research	https://www.lesswrong.com/posts/a2io2mcxTWS4mxodF/results-for-a-survey-of-tool-use-and-workflows-in-alignment
49	AISC6	Language Models as Tools for Alignment Research	Jan Kirchner, Jacques Thibodeau, Logan Smith, "janus"	Blog post	A descriptive, not prescriptive, overview of current AI Alignment Research	https://www.lesswrong.com/posts/FgjcHiWvADgsocE34/a-descriptive-not-prescriptive-overview-of-current-ai
50	AISC7	AGI Safety Impossibility Theorem	Forrest Landry	Blog post	[List of posts: - See Comments for posts published around the October 2022 retreat. - See Link to look for posts published afterwards.]	https://mflb.com/ai_alignment_1/title_reorg_psr.html	Published around AISC8 retreat Narrative structure: - AI Scope of Work (written day after camp); Meta-Narrative Sequence of AI Substrate Takeover Explanation snippets: - XKCD-style Comic Overview (just minor edits); Superintelligence Safety Q&A (just minor edits); Negative Arguments; Substrate-Dependent Needs; APS review. Responses to anonymised questions/skeptical counterarguments: - Super-ordinate Claims; SGD Selection; Optimisation Cycles; Alignment Drift; Math Expectations; Right Skepticism.
51	AISC8	Uncontrollable Dynamics	Remmelt Ellen	Blog post	The Control Problem: Unsolved or Unsolvable?	https://www.lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable
52	AISC8	Uncontrollable Dynamics	Roman Yen	Blog post	On the possibility of impossibility of AGI Long-Term Safety	https://www.lesswrong.com/posts/zuXtMKuQRGAhZMoKk/on-the-possibility-of-impossibility-of-agi-long-term-safety#fnmso3ekucj2b
53	AISC8	Failure Stories	Karl von Wendt, Sofia Bharadia, Peter Drotos, Artem Korotkov... mespa, mruwnik	Blog post	Agentic Mess	https://www.lesswrong.com/posts/LyJAFBuuEfd4kxgsw/agentic-mess-a-failure-story	Video version here.
54	AISC8	Failure Stories	Karl von Wendt, Sofia Bharadia, Peter Drotos, Artem Korotkov... mespa, mruwnik	Blog post	Paths to Failure	https://www.lesswrong.com/posts/yv4xAnkEyWvpXNBte/paths-to-failure
55	AISC8	Failure Stories	Karl von Wendt, Sofia Bharadia, Peter Drotos, Artem Korotkov... mespa, mruwnik	Blog post	A Friendly Face	https://www.lesswrong.com/posts/iRFxvNeLbHNRCzA2S/a-friendly-face-another-failure-story
56	AISC8	Interpretable Architectures	Robert Kralisch, Anton Zheltoukhov, David Liu, Sohaib Imran	Blog post	An Investigation of the Frameworks of “Positive Attractors” and “Inherently Interpretable Architectures”	https://www.lesswrong.com/s/z7JTHHdapYdvgfPhM
57	AISC8	Team Cyborg- ...	Kanad Chakrabarti, Roman Leventov, Nicholas Kees Dupuis	Blog post	Philosophical Cyborg (Part 1)	https://www.lesswrong.com/posts/k93NEoXZq6CdXegdx/philosophical-cyborg-part-1
58	AISC8	Team Cyborg- ...	Kanad Chakrabarti	Blog post	Philosophical Cyborg (Part 2)...or, The Good Successor	https://www.lesswrong.com/posts/ZZ57cBkpQ5hpAux9T/philosophical-cyborg-part-2-or-the-good-successor
59	AISC8	Behavioural Annotation	Nell Watson...	Paper	Draft towards paper	https://docs.google.com/document/d/186iPTOUtofEsL1qgXsH5qX1IBlq7fn2m/edit
60	AISC8	Soft Optimization		Blog post		https://www.lesswrong.com/posts/XXrGhqSNZjcG2nNiy/aisc-team-report-soft-optimization-bayes-and-goodhart
61	AISC8	Machine Learning For Scientific Discovery		Blog post	[Sequence:] Machine Learning For Scientific Discovery	https://www.lesswrong.com/s/xoXeJZRCBEBnBoGbC
62	AISC8	Literature Review of the Neurological Basis of Human Values and Preferences	Mateusz Bagiński	Blog post	"Wanting" and "liking"	https://www.lesswrong.com/posts/opJxxfrN33xQx3eXu/wanting-and-liking
63	AISC8	Interdisciplinary Investigation of DebateGPT	Paul Bricman, Elfia Bezou-Vrakatseli, Thomas Feeney, and Yimeng Xie	Blog post	Truth	https://compphil.github.io/truth/
64	AISC8	Understanding Search in Transformers	Michael I. Ivanitskiy, Alex F. Spies, Tilman Räuker, Guillaume Corlouer, Chris Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan Valentine, Cecilia Diniz Behn, Katsumi Inoue, Samy Wu Fung	Paper	Structured World Representations in Maze-Solving Transformers	https://arxiv.org/pdf/2312.02566.pdf
65	AISC9		Domenic Rosati, Jan Wehner, Kai Williams, Łukasz Bartoszcze, Jan Batzner, Hassan Sajjad, Frank Rudzicz	Paper	Immunization against harmful fine-tuning attacks	https://arxiv.org/abs/2402.16382
66	AISC9	Congressional Messaging Campaigns	Tristan Williams, davekasten, jacob.turn, Felix De Simone, gergo	Blog post	Talking to Congress: Can constituents contacting their legislator influencepolicy?	https://forum.effectivealtruism.org/posts/5oStggnYLGzomhvvn/talking-to-congress-can-constituents-contacting-their
67	AISC9	SatisfIA	Vitalii Chyhirov, Simon Fischer, Benjamin Kolb, Martin Kunev, Ariel Kwiatkowski, Jeremy Rich. Lead: Jobst Heitzig (we were also joined by several interns at his lab and members of SPAR)	Blog post	Aspiration-based, non-maximizing AI agent designs	https://www.lesswrong.com/s/4TT69Yt5FDWijAWab
68	AISC9	Out-of-context learning interpretability	Victor Levoso Fernandez (lead), Luan Fletcher, Leo Mckee-Reid, Andrei Cristea, Florian van der Steen, Nikita Menon, Kunvar Thaman	Software	aisc_oocl_experiments	https://github.com/fletchel/aisc_oocl_experiments
69	AISC9	High-Level Mechanistic Interpretability Activation Engineering Library 🔥	Jamie Coombes, Ardy Haroen, Fergus Fettes, Lukas Linauer, Shaheen Ahmed-Chowdhury, Vy Hong	Software	obvslib	https://github.com/obvslib/obvs
70	AISC9	Ambitious Mechanistic Interpretability	Alice Rigg, Jacob Goldman-Wetzler, Karthik Murugadoss, Leonard Bereska, Lucas Hayne, Wolodymyr Krywonos, Michael Pearce, Kola Ayonrinde, Gonçalo Paulo	Blog post	[Various outputs by individual team members]		ghost gradients implementation, by Jacob Various Mamba interp things, by Goncalo & others Atp* implementation, by Kola Reverse engineering MNIST, by Michael Hierarchical feature clustering, by Alice Clustering features by their topology, by Karthik Mech interp survey paper, by Leonard Computation in superposition extensions, by Lucas
71	AISC9	Modelling Trajectories of Language Models	Nicky Pochinkov, Tetra Jones, Rashidur Rahman	Paper	Modularity In Transformers: Investigating Separability & Neuron Task Specialization	https://cloud.nicky.pro/s/A2srG3f8W9TLwrG	Under review as a conference paper at ICLR 2024
72	AISC9	Modelling Trajectories of Language Models	Nicky Pochinkov, Ben Pasero, Skylar Shibayama	Paper	Investigating Neuron Ablation In Attention Heads: The Case For Peak Activation Centering	https://cloud.nicky.pro/s/cM7sFPQfBSsaikx	Under review as a conference paper at the SeT LLM workshop at ICLR 2024
73	AISC9	MILD	Marcel Mir, Alex Champandard, Remmelt Ellen	Paper	MILD: Minimal Item-Level Documentation of Training Data	https://docs.google.com/document/d/1tP5j1sUf5JI6E700JpU8j_ZKP_zvAlsEFzrMv1PUJQI/edit	Draft doc; will be published later
74	AISC9	Asymmetric control in LLMs	Domenic Rosati, Jan Wehner, Kai Williams, Łukasz Bartoszcze, Jan Batzner, Simon Lerman	Paper	Immunization against harmful fine-tuning attacks	https://arxiv.org/abs/2402.16382
75	AISC9	Asymmetric control in LLMs	Domenic Rosati, Jan Wehner, Kai Williams, Łukasz Bartoszcze, David Atanasov, Robie Gonzales, Subhabrata Majumdar, Carsten Maple, Hassan Sajjad, Frank Rudzicz	Paper	Representation noising effectively prevents harmful fine-tuning on LLMs	https://arxiv.org/abs/2405.14577
76	AISC9	Asymmetric control in LLMs	Domenic Rosati, Jan Wehner, David Atanasov	Blog post	Training-time domain authorization could be helpful for safety	https://www.lesswrong.com/posts/38avQYy782zXgNo9u/training-time-domain-authorization-could-be-helpful-for
77	AISC9	The promisingness of automated alignment	Bogdan Ionut Cirstea, AISC: Jaeson Booker, Leo Mckee-Reid, Marcel Mir, Severin Field, Milton Lin, Sai Joseph, Vassil Tashev, Yuan Yuan Sun; MARS: Alfie Lamerton, Tim Chan, Robayet Hossain; SPAR: Joyee Chen, Joe Emerson, Minh Nguyen, Yixiong Hao.	Blog post	A Review of Weak to Strong Generalization	https://www.lesswrong.com/posts/ELbGqXiLbRe6zSkTu/a-review-of-weak-to-strong-generalization-ai-safety-camp
78	AISC9	The promisingness of automated alignment	Bogdan Ionut Cirstea, AISC: Jaeson Booker, Leo Mckee-Reid, Marcel Mir, Severin Field, Milton Lin, Sai Joseph, Vassil Tashev, Yuan Yuan Sun; MARS: Alfie Lamerton, Tim Chan, Robayet Hossain; SPAR: Joyee Chen, Joe Emerson, Minh Nguyen, Yixiong Hao.	Blog post	Paper review: “The Unreasonable Effectiveness of Easy Training Data for Hard Tasks”	https://www.lesswrong.com/posts/Wd9vzwqcYuEokJYCH/paper-review-the-unreasonable-effectiveness-of-easy-training
79	AISC9	The promisingness of automated alignment	Bogdan Ionut Cirstea, AISC: Jaeson Booker, Leo Mckee-Reid, Marcel Mir, Severin Field, Milton Lin, Sai Joseph, Vassil Tashev, Yuan Yuan Sun; MARS: Alfie Lamerton, Tim Chan, Robayet Hossain; SPAR: Joyee Chen, Joe Emerson, Minh Nguyen, Yixiong Hao.	Blog post	A Review of In-Context Learning Hypotheses for Automated AI Alignment Research	https://www.lesswrong.com/posts/GPcwP8pgyPFPwvi2h/a-review-of-in-context-learning-hypotheses-for-automated-ai
80	AISC9	Towards realistic ODDs for foundation model based AI offerings	Igor Krawczuk, Paulius Skaisgiris, Scott Bursese Arghya Sarkar and Tanvir Iqbal	Software		https://github.com/genalgodds
81	AISC9	Does sufficient optimization imply agent structure?	Tyler Tracy, Mateusz Bagiński, Einar Urdshals, Amaury Lorin, Jasmina Nasufi, Alfred Harwood, Alex Altair (RL)	Blog post	Towards a formalization of the agent structure problem	https://www.lesswrong.com/posts/oxsBpx9v3bgxraiPj/towards-a-formalization-of-the-agent-structure-problem
82	AISC9	Evaluating alignment evaluations		Blog post	[wrapping up drafts]
83	AISC9	Exploring toy models of agents	Paul Colognese, Ben Sturgeon, Narmeen Oozer, Arun Jose	Blog post	[subscribe to Paul’s LessWrong posts to be notified when we post the results of this project.]
84	AISC9	Benchmarks for Stable Reflectivity	Jacques Thibodeau (lead), Kanad Chakrabarti, Youlian Simidjiyski, Thee Ho, Jiaming (George) Yu, Jannes Elstner	Blog post	[more should be published under @jacquesthibs or @ukc10014 on LessWrong ]
85	AISC9	Personal Fine-Tuning Implementations for AI Value Alignment	Minh Nguyen, Sarah Pan, Nell Watson	Blog post	[We intend to publish a paper on our experiments and observations.]
86	AISC9	AI-Driven Economic Safety Nets	David Conrad, Rafael Andersson Lipcsey, Arturs Kanepajs, Tillman Schenk, Jacob Schaal	Blog post	[drafting]
87