ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
Reference Paper List
https://docs.google.com/spreadsheets/d/1qpPQI9rnHjR-xipPRkojuVgSAnJ1VBcv/edit#gid=1144707069
2
DatePresenter(s)TopicPaper TitleConferencePaper URLSlides
3
Apr 2Jiawei ZhangIntroductionCourse Introduction.
https://drive.google.com/file/d/18UttPNUjyBI4cW4MJRaoWjIazNNcaDUD/view?usp=sharing
4
Apr 4Xinhao XiangMulti-ModalVideoPoet: A Large Language Model for Zero-Shot Video Generationarxiv'24https://arxiv.org/pdf/2312.14125.pdf
https://drive.google.com/file/d/10riGM7XjWV00GSUnCDIfa-ai8O48q8V1/view?usp=share_link
5
Apr 9Zizhong LiNLPSelf-Rewarding Language Modelsarxiv'24https://arxiv.org/pdf/2401.10020.pdf
https://drive.google.com/file/d/1VtPI9jl353H9kwL28JIT8BOLxuirM_nJ/view?usp=share_link
6
Apr 11Zhuoheng LiCVEfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anythingarxiv'23https://arxiv.org/pdf/2312.00863.pdf
https://drive.google.com/file/d/1A95K9nSN4UJkxBFU93xnUq_zieOgpLVd/view?usp=share_link
7
Apr 16Rong Ching ChangNLPAre Emergent Abilities of Large Language Models a Mirage?Neurips'23https://arxiv.org/pdf/2304.15004.pdf
https://drive.google.com/open?id=1vDmkUFq6DtJw2Phx9oQLky-ryHMoDDdJ&usp=drive_fs
8
Apr 18Terry TongNLPLong-form factuality in large language modelsarxiv'24https://arxiv.org/pdf/2403.18802.pdf
https://drive.google.com/file/d/1_eZyervQDwfrh-ulfW7iZarJUo8cSTXI/view?usp=share_link
9
Apr 23Yifang RenMulti-ModalLanguage Model Beats Diffusion -- Tokenizer is Key to Visual Generationarxiv'23https://arxiv.org/pdf/2310.05737.pdf
https://drive.google.com/file/d/13GkHz6RCPgfIoalM4efD1lzkw9bu7cI9/view?usp=share_link
10
Apr 25Terry TongNLPRAFT: Adapting Language Model to Domain Specific RAGarxiv'24https://arxiv.org/pdf/2403.10131.pdf
https://drive.google.com/file/d/1S-xzfLNBFWRlqua-7PSgmg97-rYCdvin/view?usp=share_link
11
Apr 30Halil Ozgur DemirMulti-ModalSynth2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddingsarxiv'24https://arxiv.org/pdf/2403.07750.pdf
https://drive.google.com/file/d/1QzB979DirXXbmDNzpT6yfjy1wRadpPOw/view?usp=share_link
12
May 2Zhuoheng LiCVScaling Rectified Flow Transformers for High-Resolution Image Synthesisarxiv'24https://arxiv.org/pdf/2403.03206.pdf
https://drive.google.com/file/d/1s86aJ1SDWiPYr62-RrQa-ghOokE-8pnr/view?usp=share_link
13
May 7Zhuosheng Liu CVInstantID: Zero-shot Identity-Preserving Generation in Secondsarxiv'24https://arxiv.org/pdf/2401.07519.pdf
https://drive.google.com/file/d/1IOTFaPtA7QzX9caTm-CvWKYsSNsksum4/view?usp=share_link
14
May 9Tong MiaoCV
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
CVPR'24https://arxiv.org/pdf/2311.17911.pdf
https://drive.google.com/file/d/1JM3t9-m_KNeZM-gh1xg-0Qv-fiNBQ9z-/view?usp=share_link
15
May 14Yifang RenCVGenerative Image Dynamicsarxiv'23https://arxiv.org/pdf/2309.07906.pdf
https://drive.google.com/file/d/1RRCkKyFAl3cF8uszpsRM0iPGacwf6ekM/view?usp=share_link
16
May 16Anant Vishwakama NLPMistral 7B && Mixtral of Experts
https://arxiv.org/pdf/2310.06825.pdf
https://arxiv.org/pdf/2401.04088.pdf
https://drive.google.com/file/d/1EqFCJdj-YBw-0y0ZKafnJxe1EtAk6NpB/view?usp=share_link
17
May 21Halil Ozgur DemirDL & OthersMamba: Linear-Time Sequence Modeling with Selective State Spacesarxiv'23
https://arxiv.org/ftp/arxiv/papers/2312/2312.00752.pdf
https://drive.google.com/file/d/1dBwl6j3792ghfbzW45gKS8xKp9mxRSLG/view?usp=share_link
18
May 23Tong MiaoDL & Others
Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips
ICLR'24
https://openreview.net/pdf?id=1SIBN5Xyw7
https://drive.google.com/file/d/1Vw-VSAEfLEADWczNzj3OdBVTfJAiZ3Nk/view?usp=share_link
19
May 28Anant Vishwakama DL & OthersQLORA: Efficient Finetuning of Quantized LLMsarxiv'23https://arxiv.org/pdf/2305.14314.pdf
https://drive.google.com/file/d/1I8QARuVEG-HadjU_tb-XT1KDClm6iH7k/view?usp=share_link
20
May 30Joe ZhuRoboticsVoyager: An Open-Ended Embodied Agent with Large Language Modelsarxiv'23https://arxiv.org/pdf/2305.16291.pdf
https://drive.google.com/file/d/17eobhln6a6tfGt6UQxcT_wyi1t0_41i4/view?usp=share_link
21
Jun 4Rong Ching Chang Robotics
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
ICCV'23
https://openaccess.thecvf.com/content/ICCV2023/papers/Song_LLM-Planner_Few-Shot_Grounded_Planning_for_Embodied_Agents_with_Large_Language_ICCV_2023_paper.pdf
https://drive.google.com/file/d/1lMOJTRqjuy9EiXsK_-MzlfbhyyPN8WTG/view?usp=share_link
22
Jun 6Zhuosheng LiuDL & OthersAccurate structure prediction of biomolecular interactions with AlphaFold 3Nature'24
https://drive.google.com/file/d/1AKM-hk-SIDg9fgp7TAC0DqchV_48SSbY/view?usp=sharing
https://drive.google.com/file/d/11ZWmKn4gWuX-MOz38-GWfU7Ji2yvscn3/view?usp=share_link
23
24
25
ExtraCVOne-step Diffusion with Distribution Matching Distillationarxiv'23https://arxiv.org/pdf/2311.18828.pdf
26
ExtraCVLearning and Leveraging World Models in Visual Representation LearningICLR'24https://arxiv.org/pdf/2403.00504.pdf
27
ExtraMulti-ModalStable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasetsarxiv'24https://arxiv.org/pdf/2311.15127.pdf
28
ExtraMulti-Modal
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
arxiv'24https://arxiv.org/pdf/2402.17485.pdf
29
ExtraCVDiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphingarxiv'23https://arxiv.org/pdf/2312.07409.pdf
30
ExtraNLPThe Unreasonable Ineffectiveness of the Deeper Layersarxiv'24https://arxiv.org/pdf/2403.17887.pdf
31
ExtraMulti-ModalCLIP as RNN: Segment Countless Visual Concepts without Training Endeavorarxiv'23https://arxiv.org/pdf/2312.07661.pdf
32
ExtraCVFoundationPose: Unified 6D Pose Estimation and Tracking of Novel ObjectsCVPR'24https://arxiv.org/pdf/2312.08344.pdf
33
ExtraCVVisual Autoregressive Modeling: Scalable Image Generation via Next-Scale Predictionarxiv'24https://arxiv.org/pdf/2404.02905.pdf
34
ExtraRobotics
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
arxiv'24https://arxiv.org/pdf/2403.12943.pdf
35
ExtraNLPDirect Preference Optimization: Your Language Model is Secretly a Reward Modelarxiv'23https://arxiv.org/pdf/2305.18290.pdf
36
ExtraCVGenerative Powers of Tenarxiv'23https://arxiv.org/pdf/2312.02149.pdf
37
Robotics
BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation
arxiv'24https://arxiv.org/pdf/2403.09227.pdf
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100