View publication

We present a generic and flexible Reinforcement Learning (RL) based meta-learning framework for the problem of few-shot learning. During training, it learns the best optimization algorithm to produce a learner (ranker/classifier, etc) by exploiting stable patterns in loss surfaces. Our method implicitly estimates the gradients of a scaled loss function while retaining the general properties intact for parameter updates. Besides providing improved performance on few-shot tasks, our framework could be easily extended to do network architecture search. We further propose a novel dual encoder, affinity-score based decoder topology that achieves additional improvements to performance. Experiments on an internal dataset, MQ2007, and AwA2 show our approach outperforms existing alternative approaches by 21%, 8%, and 4% respectively on accuracy and NDCG metrics. On Mini-ImageNet dataset our approach achieves comparable results with Prototypical Networks. Empirical evaluations demonstrate that our approach provides a unified and effective framework.

This paper was accepted by 7th ICML Workshop on Automated Machine Learning (AutoML).

Related readings and updates.

SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking

In-context learning with Large Language Models (LLMs) has emerged as a promising avenue of research in Dialog State Tracking (DST). However, the best-performing in-context learning methods involve retrieving and adding similar examples to the prompt, requiring access to labeled training data. Procuring such training data for a wide range of domains and applications is time-consuming, expensive, and, at times, infeasible. While zero-shot learning…
See paper details

Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations

This work investigates pre-trained audio representations for few shot Sound Event Detection. We specifically address the task of few shot detection of novel acoustic sequences, or sound events, with semantically meaningful temporal structure without assuming access to non-target audio. We develop procedures for pre-training suitable representations and methods that transfer them to our few shot learning scenario. Our experiments evaluate the…
See paper details