# NIPS 2007 Report: Filter Boost

# NIPS 2007: More Monday Night Posters

Multiple Instance Active Learning

-key idea: we have a series of bags that are either positive or negative. Each bag contains a series of examples. We know that each negative bag contains no positives and each positive bag contains at least one positive. We assume the bag level labels are easy to obtain. This work gives several strategies for selecting individual examples in the positive bags to query for labels. These strategies are more or less heuristic, but the results are strong. This is the same setup as the weakly supervised object detection problem.

Learning Monotonic transforms:

- really cool. Simultaneous learn an svm and a monotonic transformation on the features. These monotonic transforms can model saturation effects and other nonlinearities.

Variational inference for network failures:

- interesing application of variational inference. Very similar to the idea of predicting a set of diseases from a set of observed symptoms. The system is an expert system in that we use a noisy or model for the expression of a symptom given a disease where the noisy or parameters are given. Some additional tricks are used such as putting beta priors on the individual diseases.

Learning Thresholds for Cascades using Backward Pruning (Work with Paul Viola)

A cool idea for picking thresholds. Train a classifier on all examples. At the end select all positives that are above a certain threshold and now train a series of cascades. The threshold selected at each level of the cascade sould guarantee that none of the positives that would survive to the end are removed.

Audio Tags for music: similar to work at UCSD except uses discriminative instead of generative framework. Also, they test on a much harder dataset and use the tags to reproduce a notion of artist similarity induced by collaborative filtering. The people who did this work are aware of the work at UCSD.

Language recognition from phones:

They put a phone recognizer as a front end to an n-gram model for predecting which language the speech is from (multiclass: e.g. english, spanish, german, etc.). A pruning algorithm is used to prevent combinatorial explosion in the number of features. Just thinking out loud, but is this a possible justification for the loss of discrimination of certain phones that are not part of your native language?

# NIPS 2007: Monday Night Posters

- Receding Horizon DDP: Tom Erez gave a version of this talk to Emo’s lab just before Neuroscience. Yuval Tassa said he may be going to work for Emo. It seems like really good work. The situation is continuous state space. The idea is to use Emo-like methods to start with a trajectory (open loop policy) and iteratively estimate the value function around it, and improve it to a local optimum. Fill the space with a small number of open loop trajectories, and then for closed loop control, always figure out which learned trajectory you are near and follow that. They worked with 15 and 30 dimensional continuous spaces, which apparently is quite high. Receding Horizon is appropriate where you have an implicit value gradient, or at least the local value function is in line with the global value function. In practice I would guess that this is quite often the case. This is something I would want to replicate. The application was a swimming world, which we should look into as a test of algorithms, etc.
- Random Sampling of States in Dynamic Programming: Chris Atkeson is apparently a famous control guy at CMU. Tom Erez was very respectful to him. Atkeson’s poster was about discretizing state space using randomly sampled points and some heuristics about whether to keep the points. The points define a voronoi diagram, each with a local linear dynamics, quadratic value function estimator. This is something I woulld like to replicate. The application was up-to-4-link inverted pendulum and swing up problems. Atkeson claims to be the only one he knows of with such success on 4-link inverted pendulums. This is another system we should test algorithms on.
- Bayesian Policy Learning with Trans-Dimensional MCMC: “Trans-dimensional” means probability distributions defined on variable-length random variables through statistics of those random variables. These guys were using probability functions defined on trajectories to do trajectory sampling. It seemed like a fancy way to do policy improvement, without much benefit over other methods for doing the same thing.
- Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs: “Regret” means “difference between optimal average reward and obtained average reward.” It is a common way to measure the loss incurred by exploration of the MDP, and is a currently important topic to RL people. Peter Bartlett showed that a good policy that yields low regret is to keep confidence intervals on the value function and to act as if you will receive the upper bound of your confidence interval. Based on your confidence interval and the amount of time spent acting, there are provable regret bounds.
- The Epoch-Greedy Algorithm for Multi-armed Bandids with Side Information: The “contextual bandit problem” is a weird MDP where actions affect reward but not state transitions (you float around the world among states X, depending on x you have different reward distributions). Based on the complexity of the learning problem (how well and simply X predicts reward), things can be proven about a strategy “act randomly (uniform) / act greedily” based on how many experiences in this state you have.
- Learning To Race by Model-Based Reinforcement Learning with Adaptive Abstraction: The Microsoft Applied Games group applied reinforcement learning to a car-driving game. To get it to work, they came up with fancy heuristic ways for discretizing state space on the fly, and combined it with fancy online dynamic programming methods (prioritized sweeping). Eric Wiewiora was very impressed. He said it was one of the first compelling applications of real reinforcement learning that he’d seen. I talked to these guys a bit about applying Emo-style continuous-state-space optimal control to their problem, and they were interested.

# NIPS 2007: Apprenticeship learning

Check out these papers if you get a chance:

J. Zico Kolter, Pieter Abbeel, Andrew Ng, “Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion.” Advances in Neural Information Processing Systems. 2007. [ps.gz] [pdf] [bibtex] [slide] [audio]

Umar Syed, Robert Schapire, “A Game-Theoretic Approach to Apprenticeship Learning.” Advances in Neural Information Processing Systems. 2007. [ps.gz] [pdf] [bibtex] [supplemental] [slide]

# NIPS 2007: Awesome Talk on Human Computation

Luis Von Ahn gave a **fresh **talk about using computer games whose product is that humans label images and sounds for free. Lots of things to learn about what Luis is doing.

# NIPS 2007 Report: Hinton’s Tutorial on Deep Belief Nets

Here is the Paper

# NIPS 2007: Leon Bottou tutorial

Leon Bottou is pushing slightly sophisticated gradient descent methods over batch methods for the case in which you want to make use of tons of data. He shows that they can be much quicker without loss of accuracy, esp. in the cases people like these days (CRFs, convex optimizations, etc.). It may be worth looking into these current gradient methods.

# NIPS 2007 Report: Poggio’s Tutorial on Visual Processing

« go back — keep looking »