This course offers a comprehensive study of Large Language Models (LLMs). We'll explore architecture engineering, training techniques, efficiency enhancements, and prompt engineering. Students will gain insights into the application of LLMs in various domains, tools integration, privacy and bias issues, as well as their limitations and alignment. The curriculum includes guest lectures on advanced topics and in-class presentations to stimulate practical understanding. This course is ideal for anyone seeking to master the use of LLMs in their field.
本课程提供对大语言模型(LLM)的全面学习。 我们将探索大模型的架构工程、提示工程、训练技术、效率提升。 学生将深入了解大语言模型在各个领域的应用、工具集成、隐私和偏见问题及其局限性和对齐。 该课程包括高级主题的客座讲座和课堂演示,以激发实践理解。 本课程对于任何想要掌握大语言模型在其领域的使用的人来说都是理想的选择。

Teaching team


Instructor
Benyou Wang

Benyou Wang is an assistant professor in the School of Data Science, The Chinese University of Hong Kong, Shenzhen. He has achieved several notable awards, including the Best Paper Nomination Award in SIGIR 2017, Best Explainable NLP Paper in NAACL 2019, Best Paper in NLPCC 2022, Marie Curie Fellowship, Huawei Spark Award. His primary focus is on large language models.

TA
Ke Ji

Previous offerings


Below you can find course websites from previous years. Our course content and assignments will change from year to year; please do not do assignments from previous years.

-->

Logistics


Course Information


What is this course about?

The course will introduce the key concepts in LLMs in terms of training, deployment, downstream applications. In the technical level, it covers language model, architecture engineering, prompt engineering, retrieval, reasoning, multimodality, tools, alignment and evaluations. This course will form a sound basis for further use of LLMs. In particular, the topics include:

  • Introduction to Large Language Models (LLMs) - User's perspective
  • Language models and beyond
  • Architecture engineering and scaling law - Transformer and beyond
  • Training LLMs from scratch - Pre-training, SFT, learning LLMs with human feedback
  • Efficiency in LLMs
  • Prompt engineering
  • Knowledge and reasoning
  • Multimodal LLMs
  • LLMs in vertical domains
  • Tools and large language models
  • Privacy, bias, fairness, toxicity and holistic evaluation
  • Alignment and limitations

Prerequisites

Learning Outcomes

Schedule


Please note that the course materials are outdated and will be updated before each class.
Date Topics Recommended Reading Pre-Lecture Questions Lecture Note Coding Events Deadlines Feedback Administrators
Sep. 6-17th self-study; do not come to the classroom Tutorial 0: GitHub, LaTeX, Colab, and ChatGPT API OpenAI's blog
LaTeX and Overleaf
Colab
GitHub
Benyou Wang
Sep. 6th Lecture 1: Introduction to Large Language Models (LLMs) On the Opportunities and Risks of Foundation Models
Sparks of Artificial General Intelligence: Early experiments with GPT-4
What is ChatGPT and how to use it? [slide] Junying Chen
Sep. 13nd Lecture 2: Language models and beyond A Neural Probabilistic Language Model
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Training language models to follow instructions with human feedback
What is language model and why is it important? [slide] Ke Ji
Sep. 13th Tutorial 1: Prompt Engineering OpenAI's blog
The Guide to LLM Prompt Engineering [slide] [Tutorial Code] [Assignment1] Assignment 1 release Junying Chen
Sep. 20th Lecture 3: Architecture engineering and scaling law: Transformer and beyond Attention Is All You Need
HuggingFace's course on Transformers
Scaling Laws for Neural Language Models
The Transformer Family Version 2.0
On Position Embeddings in BERT
Why does Transformer become the backbone of LLMs? [slide] [nanoGPT] Junying Chen
Sep. 27th Lecture 4: Training LLMs from scratch Training language models to follow instructions with human feedback
LLaMA: Open and Efficient Foundation Language Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
How to train LLMs from scratch? [slide] [LLMZoo], [LLMFactory] Ke Ji
Oct. 11th Lecture 5: Efficiency in LLMs Efficient Transformers: A Survey
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Towards a Unified View of Parameter-Efficient Transfer Learning
How to make LLMs train/inference faster? [slide] [llama2.c] Junying Chen
Oct. 11th Tutorial 2: train your own LLMs and assignment 2 Are you ready to train your own LLMs? [slide] [LLMZoo], [nanoGPT], [LLMFactory] Assignment 2 release Ke Ji
Oct. 18th Lecture 6: Knowledge, Reasoning, and Prompt engineering Natural Language Reasoning, A Survey and others
Best practices for prompt engineering with OpenAI API
prompt engineering
Can LLMs reason? how to better prompt LLMs? [slide] Assignment 1 due (Oct. 18, 11:59pm) Ke Ji
Oct. 25th Lecture 7: Mid review of final project N/A N/A [slide] Junying Chen
Nov. 1st Lecture 8: Multimodal LLMs CLIP, MiniGPT-4, Stable Diffusion and others Can LLMs see? [slide] Junying Chen
Nov. 8th Lecture 9: LLM agent ToolBench
AgentBench
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
LLM Powered Autonomous Agents
Can LLMs plan? [slide] Final Project release Ke Ji
Nov. 15th Tutorial 3: Preparing your own project How to improve your LLM applications? Assignment 2 due (Nov. 15th, 11:59pm) Junying Chen and Ke Ji
Nov. 22th Lecture 10: LLMs in vertical domains Large Language Models Encode Clinical Knowledge, Capabilities of GPT-4 on Medical Challenge Problems, Performance of ChatGPT on USMLE, Medical-NLP, ChatLaw Can LLMs be mature experts like doctors/lawyers? [slide] [HuatuoGPT] Junying Chen
Nov. 29th Lecture 11: Alignment, Limitations, and broader Impact Superalignment
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
Theory of Mind Might Have Spontaneously Emerged in Large Language Models
Survey of Hallucination in Natural Language Generation
What are LLMs' limitations? Ke Ji
TBD Guest lectures N/A Benyou Wang
Dec. 13 Lecture 12: In-class presentation N/A How to solve real-world problems using LLMs Final Project Presentation Junying Chen and Ke Ji

Grading Policy (CSC 6203)


Grading Policy (CSC 6203)


Assignments (40%)

  • Assignment 1 (20%): Using API for testing prompt engineering
  • Assignment 2 (20%): A toy LLM application
  • Both assignments need a report and code attachment if it has coding. See the relevant evalution criterion as the final project.

Final project (55%)

The final project consists of two parts: Project Presentation (15%) and Project Report (40%) .

  • Project Presentation (15%): You are required to design your project poster using the specified Poster template. Your poster presentation will be rated by at least 3 experts (TAs and at least one external professor or scientist from industry). The average rating will be the final credit.
    • Content quality (5%): Well-presented posters or slides are highly valued.
    • Oral presentation (5%): Clear and enthusiastic speaking is encouraged.
    • Overall subjective assesment (5%): Although subjective assesment might be biased, it happens everywhere!
  • Project report (40%): The project report will be publicly available after the final poster session. Please let us know if you don't wish so.
    • Technical excitement (15%): It is encouraged to do something that is either interesting or useful!
    • Technical soundness (15%): A) discuss the motivation on why you work this project and your algorithm or approach. Even you are reproducing a published paper, you should have your own motivation. B) Cite existing related work. C) Present your algorithms or systems for your project. Provide key information for reviewers to judge whether it is technically correct. D) Provide reasonable evaluation protocol, it should be detailed to contexualize your results; E)Report quantitative results and include qualitative evaluation. Analyze and understand your system by inspecting key outputs and intermediate results. Discuss how it works, when it succeeds and when it fails, and try to interpret why it works and why not.
    • Clarity in writing (5%): The report is written in a precise and concise manner so the report can be easily understood.
    • Indivisual contribution (5%): This is based on individual contribution, probably on a subjective basis.
  • Bonus and penalty Note that the project credit is capped at 55%
    • TA favorites (2%): If one of TAs nominates the project as his/her favorite, the involved students would get 1% bonus credit. Each TA could nominate one and he or she could reserve his/her nomination. This credict could only be obtained once.
    • Instructor favorites (1%): If the instructor nominates the project as his/her favorite, the involved students would get 1% bonus credit. Instructor could nominate at most three projects. One could get both TA favorites and Instructor favorites.
    • Project early-bird bonus (2%): If you submit the project report by the early submission due date, 2% bonus credit will be entitled.
    • Code reproducibility bonus (1%): One could obtain this If TAs think they could easily reproduce your results based on the provide material.
    • Ethics concerns (-1%): If there are any serious ethics concerns by the ethics committee (The instructor and all TAs), the project would get 1% penalty.

Participation (5%)

Here are some ways to earn the participation credit, which is capped at 5%.

  • Attending guest lectures: In the second half of the course, we have four invited speakers. We encourage students to attend the guest lectures and participate in Q&A. All students get 0.75% per guest lecture (in total 3%) for either attending in person, or by writing a guest lecture report if they attend remotely or watch the recording.
  • Completing feedback surveys: We will send out two feedback surveys during the semester to
  • User Study: Students are welcone to conduct user study upon their interest; this is not mandatory (thus it does not affect final marks).
  • Course and Teaching Evaluation (CTE): The school will send requests for CTE to all students. The CTE is worth 1% credit.
  • Volunteer credit (1%): TAs/instuctor can nominate students for a volunteer credit for those who help the poster session organization, or help answer questions from other students (not writing assignments).

Late Policy

The penalty is 0.5% off the final course grade for each late day.

Acknowledgement


We borrowed some concepts and the website template from [CSC3160/MDS6002] where Prof. Zhizheng Wu is the instructor.

Website github repo is [here] .