This course offers a comprehensive study of Large Language Models (LLMs). We'll explore architecture engineering, training techniques, efficiency enhancements, and prompt engineering. Students will gain insights into the application of LLMs in various domains, tools integration, privacy and bias issues, as well as their limitations and alignment. The curriculum includes guest lectures on advanced topics and in-class presentations to stimulate practical understanding. This course is ideal for anyone seeking to master the use of LLMs in their field.
本课程提供对大语言模型(LLM)的全面学习。 我们将探索大模型的架构工程、提示工程、训练技术、效率提升。 学生将深入了解大语言模型在各个领域的应用、工具集成、隐私和偏见问题及其局限性和对齐。 该课程包括高级主题的客座讲座和课堂演示,以激发实践理解。 本课程对于任何想要掌握大语言模型在其领域的使用的人来说都是理想的选择。

Teaching team


Instructor
Benyou Wang

Benyou Wang is an assistant professor in the School of Data Science, The Chinese University of Hong Kong, Shenzhen. He has achieved several notable awards, including the Best Paper Nomination Award in SIGIR 2017, Best Explainable NLP Paper in NAACL 2019, Best Paper in NLPCC 2022, Marie Curie Fellowship, Huawei Spark Award. His primary focus is on large language models.

Leading TA
Xidong Wang
Leading TA
Juhao Liang
TA
Fei Yu

Poster Session


A final project poster session is planned by the end of the course (tentatively Dec. 15th 2023). This is to provide students the opportunities to present their wonderful work.

Anyone interested in LLMs are welcome to join. More details will be provided when it is close to the event. Feel free to reach out!

Logistics


Course Information


The course will introduce the key concepts in LLMs in terms of training, deployment, downstream applications. In the technical level, it covers language model, architecture engineering, prompt engineering, retrieval, reasoning, multimodality, tools, alignment and evaluations. This course will form a sound basis for further use of LLMs. In particular, the topics include:

  • Introduction to Large Language Models (LLMs) - User's perspective
  • Language models and beyond
  • Architecture engineering and scaling law - Transformer and beyond
  • Training LLMs from scratch - Pre-training, SFT, learning LLMs with human feedback
  • Efficiency in LLMs
  • Prompt engineering
  • Knowledge and reasoning
  • Multimodal LLMs
  • LLMs in vertical domains
  • Tools and large language models
  • Privacy, bias, fairness, toxicity and holistic evaluation
  • Alignment and limitations

Prerequisites

Learning Outcomes

Textbooks

Recommended Books:

Grading Policy (CSC 6201/CIE 6021)

Assignments (40%)

Review of project proposal (15%)

We will have a review for project proposals, to assist students better prepare their final projects. A revision is welcome after taking our suggestions into consideration.

Final project (40%)

The project could be done by a group but each indivisual is separately evaluated. You need to write a project report (max 6 pages) for the final project. Here is the report template. You are also expected to make a project poster presentation. After the final project deadline, feel free to make your project open source; we appreciate if you acknowledge this course

Participation (5%)

Here are some ways to earn the participation credit, which is capped at 5%.

Late Policy

The penalty is 0.5% off the final course grade for each late day.

Schedule


Date Topics Recommended Reading Pre-Lecture Questions Lecture Note Coding Events Deadlines Feedback Providers
Sep. 4-15th self-study; do not come to the classroom Tutorial 0: GitHub, LaTeX, Colab, and ChatGPT API OpenAI's blog
LaTeX and Overleaf
Colab
GitHub
Benyou Wang
Sep. 15th Lecture 1: Introduction to Large Language Models (LLMs) On the Opportunities and Risks of Foundation Models
Sparks of Artificial General Intelligence: Early experiments with GPT-4
What is ChatGPT and how to use it? [slide] Xidong Wang
Sep. 22nd Lecture 2: Language models and beyond A Neural Probabilistic Language Model
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Training language models to follow instructions with human feedback
What is language model and why is it important? [slide] Juhao Liang
Oct. 8th Lecture 3: Architecture engineering and scaling law: Transformer and beyond Attention Is All You Need
HuggingFace's course on Transformers
Scaling Laws for Neural Language Models
The Transformer Family Version 2.0
On Position Embeddings in BERT
Why does Transformer become the backbone of LLMs? [slide] [nanoGPT] Xidong Wang
Oct. 13th Tutorial 1: Usage of OpenAI API and Assignment 1 OpenAI's blog
How to automatically use ChatGPT in a batch? [slide] [Using ChatGPT API] Assignment 1 out Xidong Wang
Oct. 20th Lecture 4: Training LLMs from scratch Training language models to follow instructions with human feedback
LLaMA: Open and Efficient Foundation Language Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
How to train LLMs from scratch? [slide] [LLMZoo], [LLMFactory] Juhao Liang
Oct. 20th Tutorial 2: train your own LLMs and assignment 2 Are you ready to train your own LLMs? [LLMZoo], [nanoGPT], [LLMFactory] Juhao Liang
Oct. 27th Lecture 5: Efficiency in LLMs Efficient Transformers: A Survey
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Towards a Unified View of Parameter-Efficient Transfer Learning
How to make LLMs train/inference faster? [slide] [llama2.c] Assignment 1 due (Oct. 31, 11:59pm) Zhengyang Tang
Nov. 3rd Lecture 6: Mid review of final project N/A N/A [slide] Assignment 2 out
Benyou Wang
Nov. 3rd Tutorial 3: preparing your own project Any ideas to train a unique LLM to solve problems in your own research field? Junyin Chen
Nov. 10th Lecture 7: Knowledge, Reasoning, and Prompt engineering Natural Language Reasoning, A Survey and others
Best practices for prompt engineering with OpenAI API
prompt engineering
Can LLMs reason? how to better prompt LLMs? [slide] Shuo Yan and Fei Yu
Nov. 17th Lecture 8: Multimodal LLMs CLIP, MiniGPT-4, Stable Diffusion and others Can LLMs see? [slide] Assignment 2 due (11:59pm) Junying Chen
Nov. 24th Lecture 9: LLM agent ToolBench
AgentBench
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
LLM Powered Autonomous Agents
Can LLMs plan? [slide] Juhao Liang
Nov. 24th Tutorial 4: Improving your LLM projects (Personal Discussion) How to improve your LLM applications? Project proposal due (11:59pm) Benyou Wang, Juhao Liang and Xidong Wang
Dec. 1st Lecture 10: LLMs in vertical domains Large Language Models Encode Clinical Knowledge, Capabilities of GPT-4 on Medical Challenge Problems, Performance of ChatGPT on USMLE, Medical-NLP, ChatLaw Can LLMs be mature experts like doctors/lawyers? [slide] [HuatuoGPT] Junying Chen
Dec. 8th Lecture 11: Alignment, Limitations, and broader Impact Superalignment
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks
Theory of Mind Might Have Spontaneously Emerged in Large Language Models On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
Survey of Hallucination in Natural Language Generation
Extracting Training Data from Large Language Models
What are LLMs' limitations? Juhao Liang
TBD Guest lectures N/A Benyou Wang
Dec. 15 Lecture 12: In-class presentation (extended class) N/A How to solve real-world problems using LLMs Zhengyang Tang, Juhao Liang and Xidong Wang

Acknowledgement


https://github.com/LLM-Course/LLM-course.github.io

We borrowed some concepts and the website template from [CSC3160/MDS6002] where Prof. Zhizheng Wu is the instructor.

Website github repo is [here] .