临时通知 :本店大量收购旧书。库存。大量回收库存积压的东西。废品。17310590362 18201556383
强化学习与最优控制
正版现货!实拍 ! 当天5点前发货!
¥
40
2.7折
¥
149
全新
仅1件
作者[美]德梅萃·P. 博赛卡斯(Dimitri P. Bertsekas) 著
出版社清华大学出版社
出版时间2020-06
版次1
装帧其他
货号18号
上书时间2025-01-10
商品详情
- 品相描述:全新
图书标准信息
-
作者
[美]德梅萃·P. 博赛卡斯(Dimitri P. Bertsekas) 著
-
出版社
清华大学出版社
-
出版时间
2020-06
-
版次
1
-
ISBN
9787302540328
-
定价
149.00元
-
装帧
其他
-
开本
16开
-
纸张
胶版纸
-
页数
373页
-
字数
411千字
- 【内容简介】
-
本书的目的是考虑大型且具有挑战性的多阶段决策问题,这些问题原则上可以通过动态规划和*控制来解决,但它们的精确解决方案在计算上是难以处理的。本书讨论依赖于近似的解决方法,以产生具有足够性能的次优策略。这些方法统称为增强学习,也可以叫做近似动态规划和神经动态规划等。 本书的主题产生于*控制和人工智能思想的相互作用。本书的目的之一是探索这两个领域之间的共同边界,并架设一座具有任一领域背景的专业人士都可以访问的桥梁。
- 【作者简介】
-
Dimitri P. Bertseka,美国MIT终身教授,美国国家工程院院士,清华大学复杂与网络化系统研究中心客座教授。电气工程与计算机科学领域国际知名作者,著有《非线性规划》《网络优化》《凸优化》等十几本畅销教材和专著。
- 【目录】
-
1 Exact Dynamic Programming
1.1 DeterministicDynamicProgramming 2
1.1.1 DeterministicProblems 2
1.1.2 TheDynamicProgrammingAlgorithm 7
1.1.3 Approximation inValue Space 12
1.2 StochasticDynamicProgramming 14
1.3 Examples,Variations, and Simplifications 18
1.3.1 Deterministic ShortestPathProblems 19
1.3.2 DiscreteDeterministicOptimization 21
1.3.3 Problemswith aTermination State 25
1.3.4 Forecasts 26
1.3.5 Problems with Uncontrollable State Components 29
1.3.6 PartialState Information andBelief States 34
1.3.7 LinearQuadraticOptimalControl 38
1.3.8 SystemswithUnknownParameters -Adaptive
Control 40
1.4 ReinforcementLearning andOptimalControl - Some
Terminology 43
1.5 Notes and Sources 45
2 Approximation in Value Space
2.1 ApproximationApproaches inReinforcementLearning 50
2.1.1 General Issues ofApproximation inValue Space 54
2.1.2 Off-Line andOn-LineMethods 56
2.1.3 Model-Based Simplification of the Lookahead
Minimization 57
2.1.4 Model-Free off-Line Q-Factor Approximation 58
2.1.5 Approximation inPolicy Space onTop of
ApproximationinValue Space 61
2.1.6 When is Approximation in Value Space Effective? 62
2.2 Multistep Lookahead 64
??ii
viii Contents
2.2.1 Multistep Lookahead and Rolling Horizon 65
2.2.2 Multistep Lookahead and Deterministic Problems 67
2.3 Problem Approximation 69
2.3.1 Enforced Decomposition 69
2.3.2 Probabilistic Approximation - Certainty Equivalent
Control 76
2.4 Rollout and the Policy Improvement Principle 83
2.4.1 On-Line Rollout for Deterministic Discrete
Optimization 84
2.4.2 Stochastic Rollout and Monte Carlo Tree Search 95
2.4.3 Rollout with an Expert 104
2.5 On-Line Rollout for Deterministic Infinite-Spaces Problems -
Optimization Heuristics 106
2.5.1 Model Predictive Control 108
2.5.2 Target Tubes and the Constrained Controllability
Condition 115
2.5.3 Variants of Model Predictive Control 118
2.6 Notes and Sources 120
3 Parametric Approximation
3.1 Approximation Architectures 126
3.1.1 Linear and Nonlinear Feature-Based Architectures 126
3.1.2 Training of Linear and Nonlinear Architectures 134
3.1.3 Incremental Gradient and Newton Methods 135
3.2 Neural Networks 149
3.2.1 Training of Neural Networks 153
3.2.2 Multilayer and Deep Neural Networks 157
3.3 Sequential Dynamic Programming Approximation 161
3.4 Q-Factor Parametric Approximation 162
3.5 Parametric Approximation in Policy Space by
Classification 165
3.6 Notes and Sources 171
4 Infinite Horizon Dynamic Programming
4.1 An Overview of Infinite Horizon Problems 174
4.2 Stochastic Shortest Path Problems 177
4.3 Discounted Problems 187
4.4 Semi-Markov Discounted Problems 192
4.5 Asynchronous Distributed Value Iteration 197
4.6 Policy Iteration 200
4.6.1 Exact Policy Iteration 200
4.6.2 Optimistic and Multistep Lookahead Policy
Iteration 205
4.6.3 Policy Iteration for Q-factors 208
Contents i??
4.7 Notes and Sources 209
4.8 Appendix: MathematicalAnalysis 211
4.8.1 Proofs for Stochastic ShortestPathProblems 212
4.8.2 Proofs forDiscountedProblems 217
4.8.3 ConvergenceofExact andOptimistic
Policy Iteration 218
5 Infinite Horizon Reinforcement Learning
5.1 Approximation in Value Space - Performance Bounds 222
5.1.1 LimitedLookahead 224
5.1.2 Rollout and Approximate Policy Improvement 227
5.1.3 ApproximatePolicy Iteration 232
5.2 FittedValue Iteration 235
5.3 Simulation-BasedPolicy IterationwithParametric
Approximation 239
5.3.1 Self-Learning andActor-CriticMethods 239
5.3.2 Model-Based Variant of a Critic-Only Method 241
5.3.3 Model-FreeVariant of aCritic-OnlyMethod 243
5.3.4 Implementation Issues ofParametricPolicy
Iteration 246
5.3.5 Convergence Issues ofParametricPolicy Iteration -
Oscillations 249
5.4 Q-Learning 253
5.4.1 Optimistic Policy Iteration with Parametric Q-Factor
Approximation- SARSAandDQN 255
5.5 AdditionalMethods -TemporalDifferences 256
5.6 Exact andApproximateLinearProgramming 267
5.7 Approximation inPolicy Space 270
5.7.1 Training byCostOptimization -PolicyGradient,
Cross-Entropy,andRandomSearchMethods 276
5.7.2 Expert-BasedSupervisedLearning 286
5.7.3 ApproximatePolicy Iteration,Rollout, and
ApproximationinPolicySpace 288
5.8 Notes and Sources 293
5.9 Appendix: MathematicalAnalysis 298
5.9.1 Performance Bounds for Multistep Lookahead 299
5.9.2 Performance Bounds for Rollout 301
5.9.3 Performance Bounds for Approximate Policy
Iteration 304
6 Aggregation
6.1 AggregationwithRepresentativeStates 308
6.1.1 Continuous State and Control Space Discretization p 314
6.1.2 Continuous State Space - POMDP Discretization 315
?? Contents
6.2 AggregationwithRepresentativeFeatures 317
6.2.1 Hard Aggregation and Error Bounds 320
6.2.2 AggregationUsingFeatures 322
6.3 Methods for Solving theAggregateProblem 328
6.3.1 Simulation-BasedPolicy Iteration 328
6.3.2 Simulation-Based Value Iteration 331
6.4 Feature-BasedAggregationwith aNeuralNetwork 332
6.5 BiasedAggregation 334
6.6 Notes and Sources 337
6.7 Appendix: MathematicalAnalysis 340
References 345
Index 369
点击展开
点击收起
- 配送说明
暂时只发快递!
— 没有更多了 —
虽然没有拆开,但三层包装,可以的,不错
这品也太好了,超出预算