Friday, March 29, 2024, 1:00pm
Pappajohn Business Building , W401
21 East Market Street, Iowa City, IA 52245

Join us to hear from Mehrdad Moharrami, who will present “A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP."

Abstract: We study the risk-sensitive exponential cost MDP formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average-cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risk-sensitive cost and show that this new cost criterion can be used to approximate the risk-sensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory-based gradient algorithm to minimize the smooth truncated estimation of the risk-sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem.

Individuals with disabilities are encouraged to attend all University of Iowa–sponsored events. If you are a person with a disability who requires a reasonable accommodation in order to participate in this program, please contact Nicole Bible in advance at 319-467-3185 or nicole-bible@uiowa.edu.