Introduction to the MPLab (PDF)MPLAB 5 Year Progress Report (PDF) |

**NEWS**

- Receding Horizon DDP: Tom Erez gave a version of this talk to Emo’s lab just before Neuroscience. Yuval Tassa said he may be going to work for Emo. It seems like really good work. The situation is continuous state space. The idea is to use Emo-like methods to start with a trajectory (open loop policy) and iteratively estimate the value function around it, and improve it to a local optimum. Fill the space with a small number of open loop trajectories, and then for closed loop control, always figure out which learned trajectory you are near and follow that. They worked with 15 and 30 dimensional continuous spaces, which apparently is quite high. Receding Horizon is appropriate where you have an implicit value gradient, or at least the local value function is in line with the global value function. In practice I would guess that this is quite often the case. This is something I would want to replicate. The application was a swimming world, which we should look into as a test of algorithms, etc.
- Random Sampling of States in Dynamic Programming: Chris Atkeson is apparently a famous control guy at CMU. Tom Erez was very respectful to him. Atkeson’s poster was about discretizing state space using randomly sampled points and some heuristics about whether to keep the points. The points define a voronoi diagram, each with a local linear dynamics, quadratic value function estimator. This is something I woulld like to replicate. The application was up-to-4-link inverted pendulum and swing up problems. Atkeson claims to be the only one he knows of with such success on 4-link inverted pendulums. This is another system we should test algorithms on.
- Bayesian Policy Learning with Trans-Dimensional MCMC: “Trans-dimensional” means probability distributions defined on variable-length random variables through statistics of those random variables. These guys were using probability functions defined on trajectories to do trajectory sampling. It seemed like a fancy way to do policy improvement, without much benefit over other methods for doing the same thing.
- Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs: “Regret” means “difference between optimal average reward and obtained average reward.” It is a common way to measure the loss incurred by exploration of the MDP, and is a currently important topic to RL people. Peter Bartlett showed that a good policy that yields low regret is to keep confidence intervals on the value function and to act as if you will receive the upper bound of your confidence interval. Based on your confidence interval and the amount of time spent acting, there are provable regret bounds.
- The Epoch-Greedy Algorithm for Multi-armed Bandids with Side Information: The “contextual bandit problem” is a weird MDP where actions affect reward but not state transitions (you float around the world among states X, depending on x you have different reward distributions). Based on the complexity of the learning problem (how well and simply X predicts reward), things can be proven about a strategy “act randomly (uniform) / act greedily” based on how many experiences in this state you have.
- Learning To Race by Model-Based Reinforcement Learning with Adaptive Abstraction: The Microsoft Applied Games group applied reinforcement learning to a car-driving game. To get it to work, they came up with fancy heuristic ways for discretizing state space on the fly, and combined it with fancy online dynamic programming methods (prioritized sweeping). Eric Wiewiora was very impressed. He said it was one of the first compelling applications of real reinforcement learning that he’d seen. I talked to these guys a bit about applying Emo-style continuous-state-space optimal control to their problem, and they were interested.

Check out these papers if you get a chance:

J. Zico Kolter, Pieter Abbeel, Andrew Ng, “Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion.” Advances in Neural Information Processing Systems. 2007. [ps.gz] [pdf] [bibtex] [slide] [audio]

Umar Syed, Robert Schapire, “A Game-Theoretic Approach to Apprenticeship Learning.” Advances in Neural Information Processing Systems. 2007. [ps.gz] [pdf] [bibtex] [supplemental] [slide]

Luis Von Ahn gave a **fresh **talk about using computer games whose product is that humans label images and sounds for free. Lots of things to learn about what Luis is doing.

Here is the Paper

Leon Bottou is pushing slightly sophisticated gradient descent methods over batch methods for the case in which you want to make use of tons of data. He shows that they can be much quicker without loss of accuracy, esp. in the cases people like these days (CRFs, convex optimizations, etc.). It may be worth looking into these current gradient methods.

Roby Jacobs is working on some really cool work on motor control.

karklin-lewicki-nc05-preprint.pdf It may be interesting to generalize this idea to auditory “textures”.

link to Mike’s Research

« go back — keep looking »follow the MPLab on Twitter