CSC 6201/CIE 6021 Large Language Models

Teaching team

Benyou Wang is an assistant professor in the School of Data Science, The Chinese University of Hong Kong, Shenzhen. He has achieved several notable awards, including the Best Paper Nomination Award in SIGIR 2017, Best Explainable NLP Paper in NAACL 2019, Best Paper in NLPCC 2022, Marie Curie Fellowship, Huawei Spark Award. His primary focus is on large language models.

Poster Session

A final project poster session is planned by the end of the course (tentatively Dec. 15th 2023). This is to provide students the opportunities to present their wonderful work.

Anyone interested in LLMs are welcome to join. More details will be provided when it is close to the event. Feel free to reach out!

Logistics

Lectures: are on Teaching C Bldg 208, Friday 13:30 - 16:30, Sep. 4th - Dec. 15, 2023.
Office hours

Benyou Wang: Friday 4:30-6:00 PM. Daoyuan Building 504A
Xidong Wang: Wednesday 7:30-8:30 PM. Daoyuan Building 223 (Seat-14)
Juhao Liang: Monday 4:00-5:00 PM. Daoyuan Building 223 (Seat-5)

Contact: If you have any question, please reach out to us via email, WeChat group, or post it to BB.

Course Information

The course will introduce the key concepts in LLMs in terms of training, deployment, downstream applications. In the technical level, it covers language model, architecture engineering, prompt engineering, retrieval, reasoning, multimodality, tools, alignment and evaluations. This course will form a sound basis for further use of LLMs. In particular, the topics include:

Introduction to Large Language Models (LLMs) - User's perspective
Language models and beyond
Architecture engineering and scaling law - Transformer and beyond
Training LLMs from scratch - Pre-training, SFT, learning LLMs with human feedback
Efficiency in LLMs
Prompt engineering
Knowledge and reasoning
Multimodal LLMs
LLMs in vertical domains
Tools and large language models
Privacy, bias, fairness, toxicity and holistic evaluation
Alignment and limitations

Prerequisites

Proficiency in LaTex: All the reports need to be written by using LaTex. A template will be provided. If you are not familiar with LaTex, please learn from the tutorial in advance.
Proficiency in GitHub: All the source codes need to be submitted in GitHub.
Proficiency in Python: All the assignments will be in Python (using Numpy and PyTorch).
Basic machine learning knowledge: It is possible to take this course without any machine learning knowledge, however, the course will be easier if you have foundations of machine learning.

Learning Outcomes

Knowledge: a) Students will understand basic concepts and principles of LLM; b) Students could effectively use LLMs for daily study, work and research; and c) Students will know which tasks LLMs are suitable to solve and which are not.
Skills: a) Students could train a toy LLM following a complete pipeline and b) Students could call ChatGPT API for daily usage in study, work and research.
Valued/Attitude: a) Students will appreciate the importance of data; b) Students will tend to use data-driven paradigm to solve problems; and c) Students will be aware of the limitations and risks of using ChatGPT.

Textbooks

Recommended Books:

The course is too cutting-edge to have any textbooks, we might write a white paper during teaching this course. See OpenAI Blogs for lastest updates.

Grading Policy (CSC 6201/CIE 6021)

Assignments (40%)

Assignment 1 (20%): Using API for testing prompt engineering
Assignment 2 (20%): A toy LLM application

Review of project proposal (15%)

We will have a review for project proposals, to assist students better prepare their final projects. A revision is welcome after taking our suggestions into consideration.

Final project (40%)

The project could be done by a group but each indivisual is separately evaluated. You need to write a project report (max 6 pages) for the final project. Here is the report template. You are also expected to make a project poster presentation. After the final project deadline, feel free to make your project open source; we appreciate if you acknowledge this course

Project poster (10%): Your poster presentation will be rated by at least 3 experts (TAs and at least one external professor or scientist from industry). The average rating will be the final credit.

Poster quality (1%): We all like well-presented posters.
Oral presentation (4%): Presenters are encouraged to speak clearly and with enthusiasm.
Overall subjective assesment (5%): Although subjective assesment might be biased, it happens everywhere!

Project report (30%): The project report will be publicly available after the final poster session. Please let us know if you don't wish so.

Technical excitement (10%): It is encouraged to do something that is either interesting or useful!
Technical soundness (10%): A) discuss the motivation on why you work this project and your algorithm or approach. Even you are reproducing a published paper, you should have your own motivation. B) Cite existing related work. C) Present your algorithms or systems for your project. Provide key information for reviewers to judge whether it is technically correct. D) Provide reasonable evaluation protocol, it should be detailed to contexualize your results; E)Report quantitative results and include qualitative evaluation. Analyze and understand your system by inspecting key outputs and intermediate results. Discuss how it works, when it succeeds and when it fails, and try to interpret why it works and why not.
Clarity in writing (5%): The report is written in a precise and concise manner so the report can be easily understood.
Indivisual contribution (5%): This is based on individual contribution, probably on a subjective basis.

Bonus and penalty Note that the project credit is capped at 40%

TA favorites (2%): If one of TAs nominates the project as his/her favorite, the involved students would get 1% bonus credit. Each TA could nominate one and he or she could reserve his/her nomination. This credict could only be obtained once.
Instructor favorites (1%): If the instructor nominates the project as his/her favorite, the involved students would get 1% bonus credit. Instructor could nominate at most three projects. One could get both TA favorites and Instructor favorites.
Project early-bird bonus (2%): If you submit the project report by the early submission due date, 2% bonus credit will be entitled.
Code reproducibility bonus (1%): One could obtain this If TAs think they could easily reproduce your results based on the provide material.
Ethics concerns (-1%): If there are any serious ethics concerns by the ethics committee (The instructor and all TAs), the project would get 1% penalty.

Participation (5%)

Here are some ways to earn the participation credit, which is capped at 5%.

Attending guest lectures: In the second half of the course, we have four invited speakers. We encourage students to attend the guest lectures and participate in Q&A. All students get 0.75% per guest lecture (in total 3%) for either attending in person, or by writing a guest lecture report if they attend remotely or watch the recording.
Completing feedback surveys: We will send out two feedback surveys during the semester to
User Study: Students are welcone to conduct user study upon their interest; this is not mandatory (thus it does not affect final marks).
Course and Teaching Evaluation (CTE): The school will send requests for CTE to all students. The CTE is worth 1% credit.
Volunteer credit (1%): TAs/instuctor can nominate students for a volunteer credit for those who help the poster session organization, or help answer questions from other students (not writing assignments).

Late Policy

The penalty is 0.5% off the final course grade for each late day.

Schedule

Date	Topics	Recommended Reading	Pre-Lecture Questions	Lecture Note	Coding	Events Deadlines	Feedback Providers
Sep. 4-15th self-study; do not come to the classroom	Tutorial 0: GitHub, LaTeX, Colab, and ChatGPT API	OpenAI's blog LaTeX and Overleaf Colab GitHub					Benyou Wang
Sep. 15th	Lecture 1: Introduction to Large Language Models (LLMs)	On the Opportunities and Risks of Foundation Models Sparks of Artificial General Intelligence: Early experiments with GPT-4	What is ChatGPT and how to use it?	[slide]			Xidong Wang
Sep. 22nd	Lecture 2: Language models and beyond	A Neural Probabilistic Language Model BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Training language models to follow instructions with human feedback	What is language model and why is it important?	[slide]			Juhao Liang
Oct. 8th	Lecture 3: Architecture engineering and scaling law: Transformer and beyond	Attention Is All You Need HuggingFace's course on Transformers Scaling Laws for Neural Language Models The Transformer Family Version 2.0 On Position Embeddings in BERT	Why does Transformer become the backbone of LLMs?	[slide]	[nanoGPT]		Xidong Wang
Oct. 13th	Tutorial 1: Usage of OpenAI API and Assignment 1	OpenAI's blog	How to automatically use ChatGPT in a batch?	[slide]	[Using ChatGPT API]	Assignment 1 out	Xidong Wang
Oct. 20th	Lecture 4: Training LLMs from scratch	Training language models to follow instructions with human feedback LLaMA: Open and Efficient Foundation Language Models Llama 2: Open Foundation and Fine-Tuned Chat Models	How to train LLMs from scratch?	[slide]	[LLMZoo], [LLMFactory]		Juhao Liang
Oct. 20th	Tutorial 2: train your own LLMs and assignment 2		Are you ready to train your own LLMs?		[LLMZoo], [nanoGPT], [LLMFactory]		Juhao Liang
Oct. 27th	Lecture 5: Efficiency in LLMs	Efficient Transformers: A Survey FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Towards a Unified View of Parameter-Efficient Transfer Learning	How to make LLMs train/inference faster?	[slide]	[llama2.c]	Assignment 1 due (Oct. 31, 11:59pm)	Zhengyang Tang
Nov. 3rd	Lecture 6: Mid review of final project	N/A	N/A	[slide]		Assignment 2 out	Benyou Wang
Nov. 3rd	Tutorial 3: preparing your own project		Any ideas to train a unique LLM to solve problems in your own research field?				Junyin Chen
Nov. 10th	Lecture 7: Knowledge, Reasoning, and Prompt engineering	Natural Language Reasoning, A Survey and others Best practices for prompt engineering with OpenAI API prompt engineering	Can LLMs reason? how to better prompt LLMs?	[slide]			Shuo Yan and Fei Yu
Nov. 17th	Lecture 8: Multimodal LLMs	CLIP, MiniGPT-4, Stable Diffusion and others	Can LLMs see?	[slide]		Assignment 2 due (11:59pm)	Junying Chen
Nov. 24th	Lecture 9: LLM agent	ToolBench AgentBench Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks LLM Powered Autonomous Agents	Can LLMs plan?	[slide]			Juhao Liang
Nov. 24th	Tutorial 4: Improving your LLM projects (Personal Discussion)		How to improve your LLM applications?			Project proposal due (11:59pm)	Benyou Wang, Juhao Liang and Xidong Wang
Dec. 1st	Lecture 10: LLMs in vertical domains	Large Language Models Encode Clinical Knowledge, Capabilities of GPT-4 on Medical Challenge Problems, Performance of ChatGPT on USMLE, Medical-NLP, ChatLaw	Can LLMs be mature experts like doctors/lawyers?	[slide]	[HuatuoGPT]		Junying Chen
Dec. 8th	Lecture 11: Alignment, Limitations, and broader Impact	Superalignment GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks Theory of Mind Might Have Spontaneously Emerged in Large Language Models On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Survey of Hallucination in Natural Language Generation Extracting Training Data from Large Language Models	What are LLMs' limitations?				Juhao Liang
TBD	Guest lectures	N/A					Benyou Wang
Dec. 15	Lecture 12: In-class presentation (extended class)	N/A	How to solve real-world problems using LLMs				Zhengyang Tang, Juhao Liang and Xidong Wang

Date

Topics

CSC 6201/CIE 6021 Large Language Models

Teaching C Bldg 208, Friday 13:30 - 16:30, Sep. 4th - Dec. 15, 2023

Winter 2023

Teaching team

Poster Session

Logistics