Lang’s Journal Club
Department of Mathematical Sciences, IU Indianapolis
Organizer: Honglang Wang (hlwang at iu dot edu)
Talk times: Tuesdays 9:30-11:00am (EST)
Zoom Meetings: We host our journal club via zoom meetings: Join from computer or mobile by clicking: Zoom to Join or use Meeting ID: 83130869503 with Password: 990915 to join.
Date | Speaker | Title | Note |
---|---|---|---|
Jan 14, 2025 | Introduction to Fully Connected Neural Network (FNN), Backpropgation (BP) and Convolutional Neural Network (CNN) | FNN and BP, CNN, Coding | |
Jan 21, 2025 | Introduction to Recurrent Neural Network (RNN), Attension and Transformer | RNN, Attension, Transformer | |
Jan 28, 2025 | Cancelled | ||
Feb 4, 2025 | Philosophy of Language | bilibili talk | |
Feb 11, 2025 | Estimating textual treatment effect via causal disentangled representation learning | Paper1, Paper2, Paper3 | |
Feb 18, 2025 | Speech and Language Processing | book | |
Mar 4, 2025 | Investigating Gender Bias in Language Models Using Causal Mediation Analysis | paper | |
Mar 11, 2025 | Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation | Paper | |
Apr 8, 2025 | Disentangled Representation for Causal Mediation Analysis | paper | |
Apr 15, 2025 | Mixture of Experts Explained | 混合专家模型 (MoE) 详解 | |
Apr 22, 2025 | Latent Semantic and Disentangled Attention | Paper | |
Apr 29, 2025 | Towards Reasoning in Large Language Models: A Survey | paper |
No matching items
Resources: places to select papers to study
- Awesome LLM
- Luberlab Journal Club - AI in Healthcare Research
- A.I. LLM Journal Club
- LLM Research Papers: The 2024 List from Sebastian Raschka
- Noteworthy LLM Research Papers of 2024-12 influential AI papers from January to December 2024
- Memory Networks
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Deep Residual Learning for Image Recognition
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- AdderNet: Do We Really Need Multiplications in Deep Learning?
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Mixture of Experts(MoE)学习笔记
- 混合专家模型 (MoE) 详解
- deepseek技术解读(1)-彻底理解MLA(Multi-Head Latent Attention); deepseek技术解读(2)-MTP(Multi-Token Prediction)的前世今生; deepseek技术解读(3)-MoE的演进之路
- Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data
- Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
- JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
- Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
- RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
- From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
- An Overview of Large Language Models for Statisticians
Nice YouTube Videos:
- Intro to Large Language Models-Andrej Karpathy
- Transformer论文逐段精读
- 从编解码和词嵌入开始,一步一步理解Transformer,注意力机制(Attention)的本质是卷积神经网络(CNN)
- Introduction to Transformers w/ Andrej Karpathy
- Let’s build GPT: from scratch, in code, spelled out
- Let’s reproduce GPT-2 (124M)
- Building LLMs from the Ground Up: A 3-hour Coding Workshop-Sebastian Raschka