CSC 6203 Large Language Models

This course offers a comprehensive study of Large Language Models (LLMs). We'll explore architecture engineering, training techniques, efficiency enhancements, and prompt engineering. Students will gain insights into the application of LLMs in various domains, tools integration, privacy and bias issues, as well as their limitations and alignment. The curriculum includes guest lectures on advanced topics and in-class presentations to stimulate practical understanding. This course is ideal for anyone seeking to master the use of LLMs in their field.
本课程提供对大语言模型（LLM）的全面学习。我们将探索大模型的架构工程、提示工程、训练技术、效率提升。学生将深入了解大语言模型在各个领域的应用、工具集成、隐私和偏见问题及其局限性和对齐。该课程包括高级主题的客座讲座和课堂演示，以激发实践理解。本课程对于任何想要掌握大语言模型在其领域的使用的人来说都是理想的选择。

Previous offerings

Below you can find course websites from previous years. Our course content and assignments will change from year to year; please do not do assignments from previous years.

2024 course: CSC 6203 (★ Current)
2023 course: CSC 6201/CIE 6021

Course Information

What is this course about?

The course will introduce the key concepts in LLMs in terms of training, deployment, downstream applications. In the technical level, it covers language model, architecture engineering, prompt engineering, retrieval, reasoning, multimodality, tools, alignment and evaluations. This course will form a sound basis for further use of LLMs. In particular, the topics include:

Introduction to Large Language Models (LLMs) - User's perspective
Language models and beyond
Architecture engineering and scaling law - Transformer and beyond
Training LLMs from scratch - Pre-training, SFT, learning LLMs with human feedback
Efficiency in LLMs
Prompt engineering
Knowledge and reasoning
Multimodal LLMs
LLMs in vertical domains
Tools and large language models
Privacy, bias, fairness, toxicity and holistic evaluation
Alignment and limitations

Prerequisites

Proficiency in LaTex: All the reports need to be written by using LaTex. A template will be provided. If you are not familiar with LaTex, please learn from the tutorial in advance.
Proficiency in GitHub: All the source codes need to be submitted in GitHub.
Proficiency in Python: All the assignments will be in Python (using Numpy and PyTorch).
Basic machine learning knowledge: It is possible to take this course without any machine learning knowledge, however, the course will be easier if you have foundations of machine learning.

Learning Outcomes

Knowledge: a) Students will understand basic concepts and principles of LLM; b) Students could effectively use LLMs for daily study, work and research; and c) Students will know which tasks LLMs are suitable to solve and which are not.
Skills: a) Students could train a toy LLM following a complete pipeline and b) Students could call ChatGPT API for daily usage in study, work and research.
Valued/Attitude: a) Students will appreciate the importance of data; b) Students will tend to use data-driven paradigm to solve problems; and c) Students will be aware of the limitations and risks of using ChatGPT.

Schedule

Please note that the course materials are outdated and will be updated before each class.

Date	Topics	Recommended Reading	Pre-Lecture Questions	Lecture Note	Coding	Events Deadlines	Feedback Administrators
Sep. 6-17th self-study; do not come to the classroom	Tutorial 0: GitHub, LaTeX, Colab, and ChatGPT API	OpenAI's blog LaTeX and Overleaf Colab GitHub					Benyou Wang
Sep. 6th	Lecture 1: Introduction to Large Language Models (LLMs)	On the Opportunities and Risks of Foundation Models Sparks of Artificial General Intelligence: Early experiments with GPT-4	What is ChatGPT and how to use it?	[slide]			Junying Chen
Sep. 13th	Lecture 2: Language models and beyond	A Neural Probabilistic Language Model BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Training language models to follow instructions with human feedback	What is language model and why is it important?	[slide]			Ke Ji
Sep. 13th	Tutorial 1: Prompt Engineering	OpenAI's blog	The Guide to LLM Prompt Engineering	[slide]	[Tutorial Code] [Assignment1]	Assignment 1 release	Junying Chen
Sep. 20th	Lecture 3: Architecture engineering and scaling law: Transformer and beyond	Attention Is All You Need HuggingFace's course on Transformers Scaling Laws for Neural Language Models The Transformer Family Version 2.0 On Position Embeddings in BERT	Why does Transformer become the backbone of LLMs?	[slide]	[nanoGPT]		Junying Chen
Sep. 27th	Lecture 4: Training LLMs from scratch	Training language models to follow instructions with human feedback LLaMA: Open and Efficient Foundation Language Models Llama 2: Open Foundation and Fine-Tuned Chat Models	How to train LLMs from scratch?	[slide]	[LLMZoo], [LLMFactory]		Ke Ji
Oct. 11th	Lecture 5: Efficiency in LLMs	Efficient Transformers: A Survey FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Towards a Unified View of Parameter-Efficient Transfer Learning	How to make LLMs train/inference faster?	[slide]	[llama2.c]		Junying Chen
Oct. 11th	Tutorial 2: train your own LLMs and assignment 2		Are you ready to train your own LLMs?	[slide]	[Tutorial Code] [Assignment1]	Assignment 2 release	Ke Ji
Oct. 18th	Lecture 6: Knowledge, Reasoning, and Prompt engineering	Natural Language Reasoning, A Survey and others Best practices for prompt engineering with OpenAI API prompt engineering	Can LLMs reason? how to better prompt LLMs?	[slide]		Assignment 1 due (Oct. 18, 11:59pm)	Ke Ji
Oct. 25th	Lecture 7: Multimodal LLMs	CLIP, MiniGPT-4, Stable Diffusion and others	Can LLMs see?	[slide]			Junying Chen
Nov. 1st	Lecture 8: LLM agent	ToolBench AgentBench Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks LLM Powered Autonomous Agents	Can LLMs plan?	[slide]			Ke Ji
Nov. 8th	Lecture 9: A Review to Spark Final Projects	N/A	N/A	[slide]		Final Project release	Junying Chen
Nov. 15th	Tutorial 3: Preparing your own project		How to improve your LLM applications?	[slide]	[Final Project]	Assignment 2 due (Nov. 15th, 11:59pm)	Junying Chen and Ke Ji
Nov. 22th	Lecture 10: LLMs in vertical domains	Large Language Models Encode Clinical Knowledge, Capabilities of GPT-4 on Medical Challenge Problems, Performance of ChatGPT on USMLE, Medical-NLP, ChatLaw	Can LLMs be mature experts like doctors/lawyers?	[slide]	[HuatuoGPT]		Junying Chen
Nov. 29th	Guest lectures		Geometric Deep Learning & Efficiently Democratizing Medical LLMs	[slide1] [slide2]			Yan Hu and Xidong Wang
Dec. 6th	Lecture 11: Towards AGI via Test-Time Scaling	OpenAI-O1	Exploring Test-Time Scaling				Junying Chen and Ke Ji
Dec. 13th	Q&A Session		Q&A session for final projects				Junying Chen and Ke Ji
Dec. 20th	Poster Presentation		How to solve real-world problems using LLMs			Final Project Presentation	Junying Chen and Ke Ji

Grading Policy (CSC 6203)

Assignments (40%)

Assignment 1 (20%): Using API for testing prompt engineering
Assignment 2 (20%): A toy LLM application

Final project (55%)

The final project consists of two parts: Project Presentation (15%) and Project Report (40%) .

Project Presentation (15%): You are required to design your project poster using the specified Poster template. Your poster presentation will be rated by at least 3 experts (TAs and at least one external professor or scientist from industry). The average rating will be the final credit.

Content quality (5%): Well-presented posters or slides are highly valued.
Oral presentation (5%): Clear and enthusiastic speaking is encouraged.
Overall subjective assesment (5%): Although subjective assesment might be biased, it happens everywhere!

Project report (40%): The project report will be publicly available after the final poster session. Please let us know if you don't wish so.

Technical excitement (15%): It is encouraged to do something that is either interesting or useful!
Technical soundness (15%): A) discuss the motivation on why you work this project and your algorithm or approach. Even you are reproducing a published paper, you should have your own motivation. B) Cite existing related work. C) Present your algorithms or systems for your project. Provide key information for reviewers to judge whether it is technically correct. D) Provide reasonable evaluation protocol, it should be detailed to contexualize your results; E)Report quantitative results and include qualitative evaluation. Analyze and understand your system by inspecting key outputs and intermediate results. Discuss how it works, when it succeeds and when it fails, and try to interpret why it works and why not.
Clarity in writing (5%): The report is written in a precise and concise manner so the report can be easily understood.
Indivisual contribution (5%): This is based on individual contribution, probably on a subjective basis.

Bonus and penalty Note that the project credit is capped at 55%

TA favorites (2%): If one of TAs nominates the project as his/her favorite, the involved students would get 1% bonus credit. Each TA could nominate one and he or she could reserve his/her nomination. This credict could only be obtained once.
Instructor favorites (1%): If the instructor nominates the project as his/her favorite, the involved students would get 1% bonus credit. Instructor could nominate at most three projects. One could get both TA favorites and Instructor favorites.
Project early-bird bonus (2%): If you submit the project report by the early submission due date, 2% bonus credit will be entitled.
Code reproducibility bonus (1%): One could obtain this If TAs think they could easily reproduce your results based on the provide material.
Ethics concerns (-1%): If there are any serious ethics concerns by the ethics committee (The instructor and all TAs), the project would get 1% penalty.

Participation (5%)

Here are some ways to earn the participation credit, which is capped at 5%.

Attending guest lectures: In the second half of the course, we have four invited speakers. We encourage students to attend the guest lectures and participate in Q&A. All students get 0.75% per guest lecture (in total 3%) for either attending in person, or by writing a guest lecture report if they attend remotely or watch the recording.
Completing feedback surveys: We will send out two feedback surveys during the semester to
User Study: Students are welcone to conduct user study upon their interest; this is not mandatory (thus it does not affect final marks).
Course and Teaching Evaluation (CTE): The school will send requests for CTE to all students. The CTE is worth 1% credit.
Volunteer credit (1%): TAs/instuctor can nominate students for a volunteer credit for those who help the poster session organization, or help answer questions from other students (not writing assignments).

Late Policy

The penalty is 0.5% off the final course grade for each late day.