课程大纲-教务系统

课程大纲

随机过程

课程编码：120100M01002H 英文名称：Stochastic Modeling 课时：60 学分：3.00 课程属性：一级学科核心课 主讲教师：王曙明等

教学目的要求

本课程主要介绍动态决策系统建模时常用的随机过程模型及其原理，同时介绍基于随机的一些前沿动态优化技术如强化学习等。希望通过本课程的训练，学生可以掌握随机过程建模的核心思想和原理，具备动态系统建模、分析及优化的基本能力，并能运用这些模型工具解决一些实际问题。

预修课程

高等数学，概率论基础

大纲内容

第一章 Lecture 1: Course Introduction and Foundation of Probability 4.0学时
第1节 1. Course Introduction
2. Probability and Random Variable
3. Probability Distributions
4. Correlation and Dependence 2.0学时
第2节 5. Sum of Independent R.V. and Limit Theorems
6. Probability Inequalities
7. Stochastic Processes 2.0学时
第二章 Lecture 2: Conditional Probability & Conditional Expectation 4.0学时
第1节 1. Definition
2. Computing Density and Expectation by Conditioning 2.0学时
第2节 3. Computing Variance by Conditioning
4. Matching Rounds Problem 2.0学时
第三章 Lecture 3: Discrete-Time Markov Chains 6.0学时
第1节 1. Introduction
2. Chapman-Kolmogorov Equations
3. Classification of States 2.0学时
第2节 4. Long-Run Proportions
5. Limiting Probabilities 2.0学时
第3节 6. Absorption States
7. Applications 2.0学时
第四章 Lecture 4: Exponential Distribution and Poisson Process 6.0学时
第1节 1. Exponential Distribution 2.0学时
第2节 2. Counting Process
3. Poisson Process
4. Inter-arrival and Waiting Time Distributions 2.0学时
第3节 5. Conditional Distribution of the Arrival Times
6. Generalizations of the Poisson Process 2.0学时
第五章 Lecture 5: Continuous-Time Markov Chains 4.0学时
第1节 1. Continuous-Time Markov Chains
2. Birth and Death Processes 2.0学时
第2节 3. Transition Probability Function
4. Limiting Probabilities 2.0学时
第六章 Lecture 6: Renewal Theory with Applications 4.0学时
第1节 1. Introduction
2. Distribution of N(t) 2.0学时
第2节 3. Limit Theorems and Applications
4. Renewal Reward Processes 2.0学时
第七章 Lecture 7: Markov decision process 4.0学时
第1节 1. Course summary;
2. Problems;
3. Concepts; 2.0学时
第2节 4. MDP
5. Agent-environment interface 2.0学时
第八章 Lecture 8: Finite-horizon MDP 8.0学时
第1节 1. Model formulation: four examples
2. Finite-horizon MDP, part 1 2.0学时
第2节 3. Finite-horizon MDP, part 2
4. Bellman equation 2.0学时
第3节 5. Solve four examples using dynamic programming 2.0学时
第4节 6. Monotone policies 2.0学时
第九章 Lecture 9: Infinite-horizon MDP 14.0学时
第1节 1. Three policy value types
2. Optimality criteria 2.0学时
第2节 3. Policy evalution: examples 2.0学时
第3节 4. Bellman equations and solution 2.0学时
第4节 5. Value iteration 2.0学时
第5节 6. Policy iteration 2.0学时
第6节 7. Modified policy iteration
8. convergence 2.0学时
第7节 9. Algorithm comparison
10. Extensions 2.0学时
第十章 Lecture 10: ADP and RL 6.0学时
第1节 1. The q function
2. Monte Carlo methods 2.0学时
第2节 3. TD learning 2.0学时
第3节 4. Frontiers 2.0学时

参考书

1、 Introduction to Probability Models Twelfth Edition Sheldon M. Ross 2019 Academic Press

课程教师信息

王曙明博士毕业于日本早稻田大学，获工学博士学位, 曾任日本政府学术振兴会（JSPS）Special Research Fellow, 新加坡国家研究基金委(NRF) CREATE Research Fellow。目前的主要研究领域是不确定最优化、统计学习理论及应用。

贺舟博士毕业于中国科学院大学经济与管理学院（香港理工大学商学院联合培养），获管理学博士学位，曾任新加坡国立大学高级研究员。目前的主要研究领域是复杂管理系统计算、仿真与优化。