FireStats is not installed in the database
Laryngitis homøopatiske midler køb propecia edderkop bid retsmidler

ICDL 2007

Author:nick @ July 25th, 2007 Leave a Comment

I recently attended ICDL2007, the 6th Annual IEEE International Conference on Development and Learning, which was at Imperial College of London. I presented my paper with Javier Movellan on Learning to Learn [pdf: paper ; presentation ]. After the conference, I traveled to the University of Bath, where I gave the same talk to the Bath AI group at the invitation of Professor Joanna Bryson.

For an extended survey of the material presented at the conference and my impressions, see below.

============

Mark Johnson gave the opening Key Note. He says that most of development is divided into the “Maturational” and “Skill Learning” view. He wants to push a third view that he calls “Interacting Dynamic Specialization” or something like that. It seemes that “skill learning” has particular formulations with particular hypotheses that don’t match my idea of what it should mean. It looks like yet another area in which warring psychologists wanting to tear down a theory or camp push for overly narrow definitions and predictions which are easy to falsify.

Mark cited work by Joan Stiles showing that children have quite different FFA activation than adults. The citation is “Passarotti et. al, 2003″. He presented research that children and adults are much more location primed by eye direction than by arrows or motion, while infants and autistic kids are basically primed equally by all of the above. Infants and autistic kids have different ERP activation to gaze location than adults and children. I think the story is that originally eye movement is processed in a way similar to faces generally, but later it gets relocated in some other brain region.

—-

Sinapov & Stoytchev gave a presentation about learning “affordances”, which basically is something like learning a long-term predictive (forward) model. I think this is similar to Sutton’s work on “options”. The idea is basically to learn a functional mapping between state and action to change-in-state.

Stoytchev wrote an essay for a NIPS workshop last year that was interesting. Of particular interest was his view that the computational benefit of embodiment was to have a statistically constant anchor in your environment that provides some sort of easy predictability while everything else is changing. He also discusses Sutton’s “verification” principle, which asserts that all knowledge that we attempt to put into a robot should be independently verifiable by that robot. This has many relations to grounding. Stoytchev is very interested in our work detecting contingencies.

—-

Takamuku is a friend of Boom’s. He built a humanoid hand and used tapping/squeezing between the middle-finger and palm to learn tactile representations that destinguish between materials. Supposedly this is ultimately to provide info for visual learning. I think it is the wrong way to go about a good idea. The basic idea is very similar to our face learning experiments, where multimodal labels provide anchors for visual learning. I think we have a very good way of doing this, but we should aggressively pursue it as a general way of robotic learning, rather than just a developmental model, before others do.

—-

Herbort spoke about a system that stored a very redundant Inverse Kinematic representation so that he could build flexible sets of movements very easily. In general, I think the idea of redundant representations is very important for learning.

—-

Ugur had something that was intuitively very similar to our “learning to learn” except that his framework was only indirectly motivated computationally. His idea was to use “curiosity”, which was defined as “close to an SVM margin” to select examples for learning. If you are far from the margin, you are certain and have nothing to learn from the example; If you are close, you are less certain. Thus you are more “curious” about the examples close to the margin. This is right now very heuristic and free-parameter driven, rather than theoretically motivated.

—-

Kevin Gold is one of Scassellati’s students. His work this year had to do with using “word trees” to infer binary semantic properties of words. His paper won the best paper award.

Elizabeth Kim is another of Scassellati’s students. She is working on prosodic feedback for reinforcement learning. She is currenty using some KNN classification scheme for prosody. Her prosody classifier is somewhat brittle and you have to tell people what kind of feedback they are giving so that they learn to modulate their own voice to fit to the classifier, rather than the other way around. Right now they have prosody as a reward signal for learning a 9-state policy, and it takes a huge number of examples to learn a good policy. My general feeling is that prosody is not good as a reinforcer in isolation because you need to provide way too many examples, which gets tiresome. A more promising framework is something like Cynthea Breazeal’s now ex-student Andrea Thomaz, who did work on user-mediated reinforcement, in which users can provide augmenting reinforcement in addition to a normal reinforcement signal to speed the rate of learning.

—-

Andrea Thomaz has now graduated and is starting a faculty position at Georgia Tech. At the conference, she presented her work with Leonardo, which was about “social interaction and learning.” The idea was for a user (who apparently had to be well trained and extremely patient to perform the task) to “teach” the robot something. I was still left unconvinced that anything important about social interaction had been learned: a human had to be well trained and very careful and very patient to give some input that could have been given more easily and naturally via a keyboard. Nothing was learned about the real time dynamics or the statistical uncertainty or statistical structure of social interaction. These are the difficult but important problems.

—-

One poster from an Italian group was about something called “intrinsic reinforcement” which was proposed by Barto in 2004. The idea here is that it is useful to train many policies under more-or-less random reward structures (intrinsic rewards), and then use these learned policies as building blocks for learning actual action policies for actual rewards (extrinsic rewards) after the fashion of Doya, for example. The work for this ICDL poster was using evolutionary algorithms to learn a good set of “intrinsic rewards.”

Kenji Doya’s student, Eiji Uchibe, gave a talk. They are also working on these ideas of combining various “intrinsic and extrinsic” rewards. This seems like an interesting and important area of RL that is opening up.

—-

Nathan Sprague’s poster was on “Basis Reduction in Reinforcement Learning.” His answer to the question of “what is a good filter” is “what provides a good state space for MDPs for Reinforcement Learning?” The idea is that you have an external reward, and you need to choose how to chop up the state space, and you have an iterative process for successively choosing a state representation that allows you to accrue more reward. This chopping up was done with the “neighborhood components analysis” algorithm with some kind of cost function provided by the current policy.

—-

Davie Yoon is a student at Stanford Psych. She was doing interesting but possibly unrelated work on two-tone images and “perceptual reorganization”. The idea here is that you see these very ambiguous two-tone images and can’t tell what they’re representing. But it “clicks”, or alternatively once you see the original image, you map the tones onto the original image, and then you see what they are. When you take the original image away, you still have the image “organized” in your mind – you know what all the parts are, etc. It is essentially impossible to forget after seeing the original image. Children can’t do this kind of “perceptual reorganization”. Here is a famous example: http://www.psy.msu.ru/illusion/recognition/dalmation.gif

—-

Arthur Franz is a student of Jochen Triesch. He was using some “reinforcement learning” model of learning disparity tuned cells for binocular disparities. His neural modeling community seems to have a very different view of what “reinforcement learning” means than the NIPS/ICML communities. It’s more like a generalized error signal rather than a specific delta, which one could get from optimizing an L1 Error instead of an L2 Error. There is no MDP, time, or Value function. Presumably there is an objective function but he couldn’t say what it was when questioned after his talk.

—-

Verena Schmittmann is a student in Amsterdam who was modeling “shift learning,” which is very similar to a problem we are interested in. The idea here is that you have people learn to classify objects based on a simple rule, and then after they’ve learned, you switch the rule on them without warning. The question is how long does it take to detect the change in rule, and then to learn the new rule. Unfortunately the model was descriptive — it copied the effect but did not give deep computational insight into why different types of observed behavior were good — so I’m not sure how useful it will be for us. At least if we decide to look at these “shift learning” kind of things, now we have a key word.

—-

Dan Grollman is a student of Chad Jenkins at Brown. He is doing work with robots, especially Aibos. His ICDL paper was about using Sparse Gaussian Process Regression to learn a policy from examples. Supposedly this was in the context of “imitation learning,” but realistically it seemed like you would need to trace out almost the whole policy space.

Possibly a more interesting avenue would be some kind of policy gradient approach, where you have a bunch of “prototypes” that define the SGPR and you figure out how to move the prototypes to make a better policy. I wonder if this has been done? It seems like a good idea. A quick google search shows a 2003 ICML paper and a 2004 NIPS paper (by Rasmussen) using GPs for Value functions, but nothing about GP policies. According to Dan, one problem is that GPs cannot encode bimodal distributions easily, which is often important for policies. He is currently looking into Mixtures of GPs to help get around this problem

—-

Ben Kuipers, a professor at UT Austin, gave an interesting talk about learning the structure of a robot’s sensors from simple statistical interaction. Using a simple scanning / mapping / navigation robot, he was able to construct something like a representation of the robot’s environment, including fixed and moving elements, and the robot’s position in that environment. The robot did this with basically no prior knowledge of its sensors or the environment.

An interesting implication of this that Kuipers did not mention is that as organisms grow and as robots get knocked around and things, their sensors change in relation to each other and the world. In any kind of rectified, calibrated system, this is disastrous: none of your learned knowledge would be relevant and you would have to start relearning from scratch. It seems like in a statistically, sensory based approach like the one presented, this kind of plasticity in sensor arrangement could be easily accommodated.

—-

Charlie Kemp, now a research scientist at Georgia Tech, gave an interesting, deliberately provocative talk about how much you can do in robotics without learning or development, using simple engineering tricks, having a specific task (that at the same time wasn’t overspecified), and relying on the friendly help of humans. His illustrative example was the roomba, which vacuums your house pretty well using simple heuristic algorithms, and if it misses a spot, you can pick it up, put it down in that spot, and press the big button that says “clean this spot.” His own work was with a humanoid robot that would grasp long thin things that were handed to it (he found that mostly people would hand the objects to the robot in a way that made them easy to grasp).

The point he made that I liked the most was that having a general design goal or purpose (this robot needs to be for X) gives you insight into what the actual difficult and important questions are, much more than just thinking generally about “the important ideas in learning and development”.