Marrying Stochastic Gradient Descent with Bandits: Learning Algorithms for Inventory Systems with Fixed Costs, October 1

Attend the next IE Decision Systems Engineering Fall ’21 Seminar Series with Cong Shi from the University of Michigan and hosted by Assistant Professor Geunyeong Byeon. A Q&A will follow the talk about learning algorithms.

Marrying Stochastic Gradient Descent with Bandits: Learning Algorithms for Inventory Systems with Fixed Costs
Presented by Cong Shi, Industrial and Operations Engineering, University of Michigan

Friday, October 1, 2021
Noon MST
Attend on Zoom

For a recording of the seminar, contact Geunyeong Byeon at [email protected].

See a list of all School of Computing and Augmented Intelligence invited talks and lectures.


Cong Shi considers a periodic-review single-product inventory system with fixed costs under censored demand. Under full demand distributional information, it is well known that the celebrated $(s,S)$ policy is optimal. In this paper, Shi assumes the firm does not know the demand distribution a priori and makes adaptive inventory ordering decisions in each period based only on the past sales (censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. Compared to prior literature, the key difficulty of this problem lies in the loss of joint algorithm termed the $(\delta, S)$ policy that combines the powers of stochastic gradient descent, bandit controls and simulation-based methods in a seamless and non-trivial fashion. Shi proves that the cumulative regret is $O(\log T\swrt{T})$, which is provably tight up to a logarithmic factor. Shi also develops several technical results that are of independent interest. Shi believes that the framework developed could be widely applied to learning other important stochastic systems with partial convexity in the objectives.

About the speaker

Cong Shi is an associate professor of industrial and operations engineering at the University of Michigan. His research is focused on the design of efficient algorithms with theoretical performance guarantees for stochastic optimization models in operations management. His main areas of applications include inventory control, supply chain management, revenue management and service operations. He received his doctorate in operations research at MIT in 2012, and won first place in the INFORMS George Nicholson Student Paper Competition in 2009, and third place  in the INFORMS Junior Faculty Interest Group (JFIG) Paper Competition in 2017.