Chain of Thought

paper: https://arxiv.org/abs/2201.11903

  • zero-shot:输入问题,等待输出结果
  • CoT:输入问题并提示Let’s think step by step
  • Manual-CoT: 是一种few shot方法,所以构造了一些模板Q&A(模板A中也有Let’s think step by step),然后再给出问题并提示Let’s think step by step
  • Auto-CoT:采样多个问题,每个问题提示Let’s think step by step,让模型给出答案。然后拼接所有生成的Q&A并给出最终问题,并提示Let’s think step by step

为什么需要CoT?

问题可以分为两类:一类是容易回答的,没有太多逻辑推理的,比如:天气如何?面包几块钱?另一类是需要长链条的逻辑推理的问题:数学等。

当语言模型的规模指数级增大时,它解决常规问题的能力有了很大的提升,然而它解决逻辑推理的问题的能力却提升很小。而CoT就是帮助解决这样的问题,它的核心思想是:不要光给出答案,把推理过程也给出来。如下图所示,关键在于构造的prompt要包含推理过程:

为什么延长推理过程就有效呢?这可能是因为语言模型token-by-token的特点。

标准的prompt可以被视为大模型能力的下限,如何提取大模型学到的知识的问题是一个难点,标准的prompt是一个很好的起点,但却绝不是终点。

运用 BERT 的 MLM 模型进行小样本学习

转载自《必须要GPT3吗?不,BERT的MLM模型也能小样本学习》《P-tuning:自动构建模版,释放语言模型潜能》,作者:苏剑林,部分内容有修改。

大家都知道现在 GPT3 风头正盛,然而,到处都是 GPT3、GPT3 地推,读者是否记得 GPT3 论文的名字呢?事实上,GPT3 的论文叫做《Language Models are Few-Shot Learners》,标题里边已经没有 G、P、T 几个单词了,只不过它跟开始的 GPT 是一脉相承的,因此还是以 GPT 称呼它。顾名思义,GPT3 主打的是 Few-Shot Learning,也就是小样本学习。此外,GPT3 的另一个特点就是大,最大的版本多达 1750 亿参数,是 BERT Base的一千多倍。

正因如此,前些天 Arxiv 上的一篇论文《It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners》便引起了笔者的注意,意译过来就是“谁说一定要大的?小模型也可以做小样本学习”。显然,这标题对标的就是 GPT3,于是笔者饶有兴趣地点进去看看是谁这么有勇气挑战 GPT3,又是怎样的小模型能挑战 GPT3?经过阅读,原来作者提出通过适当的构造,用 BERT 的 MLM 模型也可以做小样本学习,看完之后颇有一种“原来还可以这样做”的恍然大悟感~在此与大家分享一下。

冉冉升起的 MLM

MLM,全称“Masked Language Model”,可以翻译为“掩码语言模型”,实际上就是一个完形填空任务,随机 Mask 掉文本中的某些字词,然后要模型去预测被 Mask 的字词,示意图如下:

BERT 的 MLM 模型简单示意图

其中被 Mask 掉的部分,可以是直接随机选择的 Token,也可以是随机选择连续的能组成一整个词的 Token,后者称为 Whole Word Masking (WWM)。

开始,MLM 仅被视为 BERT 的一个预训练任务,训练完了就可以扔掉的那种,因此有一些开源的模型干脆没保留 MLM 部分的权重,比如 brightmart版 和 clue版 的 RoBERTa,而哈工大开源的 RoBERTa-wwm-ext-large 则不知道出于什么原因随机初始化了 MLM 部分的权重,因此如果要复现本文后面的结果,这些版本是不可取的。

然而,随着研究的深入,研究人员发现不止 BERT 的 Encoder 很有用,预训练用的 MLM 本身也很有用。比如论文《BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model》指出 MLM 可以作为一般的生成模型用,论文《Spelling Error Correction with Soft-Masked BERT》则将 MLM 用于文本纠错,笔者之前在《从语言模型到Seq2Seq:Transformer如戏,全靠Mask》的实验也表明 MLM 的预训练权重也可以当作 UniLM 来用做 Seq2Seq 任务,还有《无监督分词和句法分析!原来BERT还可以这样用》一文将 MLM 的思想用于无监督分词和句法分析了。可以说 MLM 已经是大放异彩了。

将任务转成完形填空

在本文里,我们再学习 MLM 的一个精彩应用:用于小样本学习或半监督学习,某些场景下甚至能做到零样本学习。

怎么将我们要做的任务跟 MLM 结合起来呢?很简单,给任务一个文本描述,然后转换为完形填空问题即可。举个例子,假如给定句子“这趟北京之旅我感觉很不错。”,那么我们补充个描述,构建如下的完形填空:______满意。这趟北京之旅我感觉很不错。

进一步地,我们限制空位处只能填一个“很”或“不”,问题就很清晰了,就是要我们根据上下文一致性判断是否满意,如果“很”的概率大于“不”的概率,说明是正面情感倾向,否则就是负面的,这样我们就将情感分类问题转换为一个完形填空问题了,它可以用 MLM 模型给出预测结果,而 MLM 模型的训练可以不需要监督数据,因此理论上这能够实现零样本学习了。

多分类问题也可以做类似转换,比如新闻主题分类,输入句子为“八个月了,终于又能在赛场上看到女排姑娘们了。”,那么就可以构建下面报导一则______新闻。八个月了,终于又能在赛场上看到女排姑娘们了。

这样我们就将新闻主题分类也转换为完形填空问题了,一个好的 MLM 模型应当能预测出“体育”二字来。

还有一些简单的推理任务也可以做这样的转换,常见的是给定两个句子,判断这两个句子是否相容,比如“我去了北京”跟“我去了上海”就是矛盾的,“我去了北京”跟“我在天安门广场”是相容的,常见的做法就是将两个句子拼接起来输入到模型做,作为一个二分类任务。如果要转换为完形填空,那该怎么构造呢?一种比较自然的构建方式是:我去了北京?______,我去了上海。
我去了北京?______,我在天安门广场。

其中空位之处的候选词为 是的,不是是的,不是。

Pattern-Exploiting

读到这里,读者应该不难发现其中的规律了,就是给输入的文本增加一个前缀或者后缀描述,并且 Mask 掉某些 Token,转换为完形填空问题,这样的转换在原论文中称为 Pattern,这个转换要尽可能与原来的句子组成一句自然的话,不能过于生硬,因为预训练的 MLM 模型就是在自然语言上进行的。显然同一个问题可以有很多不同的 Pattern,比如情感分类的例子,描述可以放最后,变成“这趟北京之旅我感觉很不错。__满意。”;也可以多加几个字,比如“觉得如何?__满意。这趟北京之旅我感觉很不错。”。

然后,我们需要构建预测 Token 的候选空间,并且建立 Token 到实际类别的映射,这在原论文中称为 Verbalizer,比如情感分类的例子,我们的候选空间是 很,不很,不,映射关系是 很→正面,不→负面很→正面,不→负面,候选空间与实际类别之间不一定是一一映射,比如我们还可以加入“挺”、“太”、“难”字,并且认为 很,挺,太→正面很,挺,太→正面 以及 不,难→负面不,难→负面,等等。不难理解,不少 NLP 任务都有可能进行这种转换,但显然这种转换一般只适用于候选空间有限的任务,说白了就是只用来做选择题,常见任务的就是文本分类。

刚才说了,同一个任务可以有多种不同的Pattern,原论文是这样处理的:

1、对于每种 Pattern,单独用训练集 Finetune 一个 MLM 模型出来;
2、然后将不同 Pattern 对应的模型进行集成,得到融合模型;
3、用融合模型预测未标注数据的伪标签;
4、用伪标签数据 Finetune 一个常规的(非 MLM 的)模型。

具体的集成方式大家自己看论文就行,这不是重点。这种训练模式被称为 Pattern-Exploiting Training (PET),它首先出现在论文《Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference》,本文要介绍的这篇论文则进一步肯定和完善了 Pattern-Exploiting Training 的价值和结果,并整合了多任务学习,使得它在 SuperGLUE 榜单上的小样本学习效果超过了 GPT3。两篇论文的作者是相同的,是一脉相承的作品。

PET 在 SuperGLUE 上的小样本学习的结果

不过要吐槽一个点是,上图中 PET 的 223M 参数,所用的模型是 ALBERT-xxlarge-v2,事实上称 ALBERT 为“小模型”是一种很耍流氓的行为,因为它前向计算的速度并没有得到任何提升。ALBERT-xxlarge 共有 12 层,层与层之间参数是共享的,就前向计算而言,它应该等价于约 2700M(12 倍)参数的 GPT 才对。

PET 中文实践,检验效果

要真正确认一个方法或模型的价值,看论文的实验表格是不够的,论文给出的实验结果谁都不好说能否复现,其次就算英文上能复现也不代表中文上有s价值,因此最实际的还是亲自动手做实验验证。下面是笔者的实验代码,供读者参考:Github地址:https://github.com/bojone/Pattern-Exploiting-Training

我们将从以下几个角度来探讨 PET 的可行性:

1、直接利用现成的 MLM 模型效果如何?(零样本学习1)
2、用“大量无标签数据”微调现成的 MLM 模型效果如何?(零样本学习2)
3、用“小量标签数据”微调现成的 MLM 模型效果如何?(小样本学习)
4、用“小量标签数据+大量无标签数据”微调现成的 MLM 模型效果如何?(半监督学习)

下面主要给出情感二分类的实验结果。另外还有一个新闻主题的多分类,代码也放到 Github 了,其结果是类似的,就不重复陈述了。

零样本学习1

这里主要探索的是给输入文本补上对应的 Pattern 后,直接基于现成的 MLM 模型进行预测,预测的准确率。由于构建模型的整个过程都不涉及到标签数据监督训练,因此这算是一种“零样本学习”。我们需要比较的是不同 Pattern、不同 MLM 模型上的效果:

下面是实验的几个 Pattern,其中空位处候选词语都为“很”和“不”:

P1:____满意。这趟北京之旅我感觉很不错。
P2:这趟北京之旅我感觉很不错。____满意。
P3:____好。这趟北京之旅我感觉很不错。
P4:____理想。这趟北京之旅我感觉很不错。
P5:感觉如何?____满意。这趟北京之旅我感觉很不错。

至于 MLM 模型,则是下面几个:

M1:Google 开源的中文版 BERT Base(链接);

M2:哈工大开源的 RoBERTa-wwm-ext Base(链接):

M3:腾讯 UER 开源的 BERT Base(链接);

M4:腾讯 UER 开源的BERT Large(链接)。

实验结果如下表(验证集/测试集):不同模型不同Pattern的零样本学习效果

最好的效果居然可以达到 88%!也就是说,加载现成的 MLM,配合适当的 Pattern,不需要任何标注数据,就可以正确识别大部分样本情感倾向了。这不得不让我们对 MLM 模型的潜力刮目相看了。

可以观察到,不同的 Pattern、不同的预训练模型之间还是有一定的差异的,整体而言 Large 版本的效果要明显好于 Base 版本的模型,说明像 GPT 到 GPT2 再到 GPT3 一样,还是把模型做得更大会更好。此外,这还有可能说明实际上 MLM 还没有被充分训练好,或许是因为 BERT 这种 Mask 掉一部分的训练方式过于低效了,可能用《修改Transformer结构,设计一个更快更好的MLM模型》一文提到的改进版 MLM 会更好。

零样本学习2

看完上述结果,读者可能会想到:如果我用领域内的数据继续预训练 MLM 模型,那么能不能提升效果呢?答案是:能!下面是我们的实验结果,算力有限,我们只在 RoBERTa-wwm-ext(上述的 M2,继续预训练后的模型我们称为 M2+无监督M2+无监督)的基础上做了比较:

要注意的是,这里我们只是用领域内的数据继续做 MLM 训练,这个过程是无监督的,也不需要标注信号,因此也算是“零样本学习”。同时,从到目前为止的结果我们可以看出,给输入本文加入“前缀”的效果比“后缀”更有优势一些。

小样本学习

刚才我们讨论了无标签数据继续预训练 MLM 的提升,如果回到 PET 的目标场景,直接用小量的标签数据配合特定的 Pattern 训练 MLM 又如何呢?这也就是真正的“小样本学习”训练了,这里我们保留约 200 个标注样本,构造样本的时候,我们先给每个句子补上 Pattern,除了 Pattern 自带的 Mask 位置之外,我们还随机 Mask 其他一部分,以增强对模型的正则。最终实验结果如下:

结论就是除了“后缀式”的 P2 之外,其它结果都差不多,这进一步说明了“前缀式”的 Pattern 会比“后缀式”更有竞争力一些。在效果上,直接用同样的数据用常规的方法去微调一个 BERT 模型,大概的结果是 88.93 左右,所以基于“MLM+Pattern”的小样本学习方法可能带来轻微的性能提升。

半监督学习

无监督的零样本学习和有监督的小样本学习都说完了,自然就轮到把标注数据和非标注数据都结合起来的“半监督学习”了。还是同样的任务,标注数据和非标注数据的比例大约是 1:99,标注数据带 Pattern,非标注数据不带 Pattern,大家都 Mask 掉一部分 Token 进行 MLM 预训练,最终测出来的效果如下:

还是同样的,“后缀”明显比“前缀”差,“前缀”的效果差不多。具体效果上,则是肯定了额外的无标注数据也是有作用的。直觉上来看,“前缀”比“后缀”要好,大体上是因为“前缀”的 Mask 位置比较固定,微弱的监督信号得以叠加增强?但这也不能解释为什么零样本学习的情况下也是“前缀”更好,估计还跟模型的学习难度有关系,可能句子前面部分的规律更加明显,相对来说更加容易学一些,所以前面部分就学习得更加充分?这一切都还只是猜测。

汇总与结论

将上述结果汇总如下:结果汇总比较

读者还可以对比我们之前在文章《泛化性乱弹:从随机噪声、梯度惩罚到虚拟对抗训练》中用虚拟对抗训练 (VAT) 做半监督学习的结果,可以看到不管是零样本学习、小样本学习还是半监督学习,基于 MLM 模型的方式都能媲美基于 VAT 的半监督学习的结果。我们在做短新闻多分类实验时的结果也是相似的。因此,这说明了 MLM 模型确实也可以作为一个优秀的零样本/小样本/半监督学习器来使用。

当然,基于 MLM 模型的缺点还是有的,比如 MLM 所使用的独立假设限制了它对更长文本的预测能力(说白了空位处的文字不能太长),以及无法预测不定长的答案也约束了它的场景(所以当前只能用于做选择题,不能做生成)。我们期待有更强的 MLM 模型出现,那时候就有可能在所有任务上都能与 GPT3 一较高下了。

什么是模版

前面介绍的 Pattern-Exploiting Training (PET) 方法,其主要的思想是借助由自然语言构成的模版(英文常称 Pattern 或 Prompt),将下游任务也转化为一个完形填空任务,这样就可以用 BERT 的 MLM 模型来进行预测了。比如下图中通过条件前缀来实现情感分类和主题分类的例子:

通过特定模版将情感分类转换为 MLM 任务

通过特定模版将新闻分类转换为 MLM 任务

当然,这种方案也不是只有 MLM 模型可行,用 GPT 这样的单向语言模型(LM)其实也很简单:

通过特定模版将情感分类转换为 LM 任务

通过特定模版将新闻分类转换为 LM 任务

不过由于语言模型是从左往右解码的,因此预测部分只能放在句末了(但还可以往补充前缀说明,只不过预测部分放在最后)。

某种意义上来说,这些模版属于语言模型的“探针”,我们可以通过模版来抽取语言模型的特定知识,从而做到不错的零样本效果,而配合少量标注样本,可以进一步提升效果。

然而,对于某些任务而言,人工构建模版并不是那么容易的事情,模型的优劣我们也不好把握,而不同模型之间的效果差别可能很大,在这种情况下,人工标注一些样本可能比构建模版还要轻松得多。所以,如何根据已有的标注样本来自动构建模版,便成了一个值得研究的问题了。

P-tuning

最近 Arxiv 上的论文《GPT Understands, Too》提出了名为 P-tuning 的方法,成功地实现了模版的自动构建。不仅如此,借助 P-tuning,GPT 在 SuperGLUE 上的成绩首次超过了同等级别的 BERT 模型,这颠覆了一直以来“GPT 不擅长 NLU”的结论,也是该论文命名的缘由。

P-tuning 重新审视了关于模版的定义,放弃了“模版由自然语言构成”这一常规要求,从而将模版的构建转化为连续参数优化问题,虽然简单,但却有效。

模版的反思

首先,我们来想一下“什么是模版”。直观来看,模版就是由自然语言构成的前缀/后缀,通过这些模版我们使得下游任务跟预训练任务一致,这样才能更加充分地利用原始预训练模型,起到更好的零样本、小样本学习效果。

等等,我们真的在乎模版是不是“自然语言”构成的吗?

并不是。本质上来说,我们并不关心模版长什么样,我们只需要知道模版由哪些 token 组成,该插入到哪里,插入后能不能完成我们的下游任务,输出的候选空间是什么。模版是不是自然语言组成的,对我们根本没影响,“自然语言”的要求,只是为了更好地实现“一致性”,但不是必须的。于是,P-tuning 考虑了如下形式的模版:

P-tuning 直接使用 [unused*] 的 token 来构建模版,不关心模版的自然语言性

这里的 [u1]~[u6],代表 BERT 词表里边的 [unused1]~[unused6],也就是用几个从未见过的 token 来构成模板,这里的 token 数目是一个超参数,放在前面还是后面也可以调整。接着,为了让“模版”发挥作用,我们用标注数据来求出这个模板。

如何去优化

这时候,根据标注数据量的多少,我们又分两种情况讨论。

第一种,标注数据比较少。这种情况下,我们固定整个模型的权重,只优化 [unused1]~[unused6] 这几个 token 的 Embedding,换句话说,其实我们就是要学 6 个新的 Embedding,使得它起到了模版的作用。这样一来,因为模型权重几乎都被固定住了,训练起来很快,而且因为要学习的参数很少,因此哪怕标注样本很少,也能把模版学出来,不容易过拟合。

第二种,标注数据很充足。这时候如果还按照第一种的方案来,就会出现欠拟合的情况,因为只有 6 个 token 的可优化参数实在是太少了。因此,我们可以放开所有权重微调,原论文在 SuperGLUE 上的实验就是这样做的。读者可能会想:这样跟直接加个全连接微调有什么区别?原论文的结果是这样做效果更好,可能还是因为跟预训练任务更一致了吧。

P-tuning 在 SuperGLUE 上的表现

此外,在上面的例子中,目标 token 如“很”、“体育”是认为选定的,那么它们可不可以也用 [unused*] 的 token 代替呢?答案是可以,但也分两种情况考虑:1、在标注数据比较少的时候,人工来选定适当的目标 token 效果往往更好些;2、在标注数据很充足的情况下,目标 token 用 [unused*] 效果更好些,因为这时候模型的优化空间更大一些。

增强相关性

在原论文中,P-tuning 并不是随机初始化几个新 token 然后直接训练的,而是通过一个小型的 LSTM 模型把这几个 Embedding 算出来,并且将这个 LSTM 模型设为可学习的。这样多绕了一步有什么好处呢?原论文大概的意思是:LSTM 出现的 token 表示相关性更强,某种程度上来说更像“自然语言”(因为自然语言的 token 之间不是独立的),此外还能防止局部最优。我在 Github 上进一步向作者确认了一下(参考这里),效果上的差别是通过 LSTM 多绕一步的方法可以使得模型收敛更快、效果更优。

然而,这样多了一个LSTM,总感觉有些别扭,而且实现上也略微有点麻烦。按照作者的意思,LSTM 是为了帮助模版的几个 token(某种程度上)更贴近自然语言,但这并不一定要用 LSTM 生成,而且就算用 LSTM 生成也不一定达到这一点。笔者认为,更自然的方法是在训练下游任务的时候,不仅仅预测下游任务的目标 token(前面例子中的“很”、“新闻”),还应该同时做其他 token 的预测

比如,如果是 MLM 模型,那么也随机 mask 掉其他的一些 token 来预测;如果是 LM 模型,则预测完整的序列,而不单单是目标词。这样做的理由是:因为我们的 MLM/LM 都是经过自然语言预训练的,所以我们(迷之自信地)认为能够很好完成重构的序列必然也是接近于自然语言的,因此这样增加训练目标,也能起到让模型更贴近自然语言的效果。经过笔者的测试,加上这样辅助目标,相比单纯优化下游任务的目标,确实提升了效果。

P-tuning 实验与效果

所谓“talk is cheap, show me the code”,又到了喜闻乐见的实验时间了。这里分享一下 P-tuning 的实验结果,其中还包括笔者对 P-tuning 的实现思路,以及笔者在中文任务上的实验结果。

停止的梯度

怎么实现上述的 P-tuning 算法比较好呢?如果是放开所有权重训练,那自然是简单的,跟普通的 BERT 微调没有什么区别。关键是在小样本场景下,如何实现“只优化几个 token”呢?

当然,实现的方法也不少,比如为那几个要优化的 token 重新构建一个 Embedding 层,然后拼接到 BERT 的Embedding层中,然后训练的时候只放开新 Embedding 层的权重。但这样写对原来模型的改动还是蛮大的,最好的方法是尽可能少改动代码,让使用者几乎无感。为此,笔者构思了一种用 stop_gradient 简单修改 Embedding 层的方案,大体上是将 Embedding 层修改如下:

class PtuningEmbedding(Embedding):
    """新定义Embedding层,只优化部分Token
    """
    def call(self, inputs, mode='embedding'):
        embeddings = self.embeddings
        embeddings_sg = K.stop_gradient(embeddings)
        mask = np.zeros((K.int_shape(embeddings)[0], 1))
        mask[1:9] += 1  # 只优化id为1~8的token
        self.embeddings = embeddings * mask + embeddings_sg * (1 - mask)
        return super(PtuningEmbedding, self).call(inputs, mode)

变量经过 stop_gradient 算子后,在反向传播的时候梯度为 0,但是前向传播不变,因此在上述代码中,前向传播的结果不会有变化,但是反向传播求梯度的时候,梯度不为 0 的 token 由 mask 变量控制,其余 token 的梯度都为零,因此就实现了只更新部分 token。

完整代码可见:Github:https://github.com/bojone/P-tuning

对了,原论文也开源了代码:Github:https://github.com/THUDM/P-tuning

测试与效果

前面已经分享了原作者在 SuperGLUE 上的实验结果,显示出如果配合 P-tuning,那么:1、GPT、BERT 的效果相比直接 finetune 都有所提升;2、GPT 的效果还能超过了 BERT。这表明 GPT 不仅有 NLG 的能力,也有 NLU 能力,可谓是把 GPT 的潜能充分“压榨”出来了,当然 BERT 配合 P-tuning 也有提升,说明 P-tuning 对语言模型潜能的释放是较为通用的。

原论文的实验比较丰富,建议读者仔细阅读原论文,相信会收获颇多。特别指出的是原论文的 Table 2 最后一列,当预训练模型足够大的时候,我们的设备可能无法 finetune 整个模型,而 P-tuning 可以选择只优化几个 Token 的参数,因为优化所需要的显存和算力都会大大减少,所以 P-tuning 实则上给了我们一种在有限算力下调用大型预训练模型的思路

P-tuning 在各个体量的语言模型下的效果

当然,笔者一直以来的观点是“没有在中文上测试过的算法是没有灵魂的”,因此笔者也在中文任务上简单测试了,测试任务跟前文一致,都是情感分类的小样本学习,测试模型包括 BERT 和 GPT,两者的候选模版分别如下图:

笔者在中文情感分类上使用的“BERT+P-tuning”模版

笔者在中文情感分类上使用的“GPT+P-tuning”模版

注意,对于 LM 模型,前缀的引入非常重要,只引入后缀时效果会明显变差;而对于 MLM 模型,前缀的效果通常也优于后缀。总的效果如下表:

其中“小样本”只用到了“少量标注样本”,“无监督”则用到了“大量无标注样本”,“半监督”则用到了“少量标注样本+大量无标注样本”,“P-tuning” 都是小样本,PET 的几个任务报告的是最优的人工模版的结果,其实还有更差的人工模版。从小样本角度来看,P-tuning 确实取得了最优的小样本学习效果;从模版构建的角度来看,P-tuning 确实也比人工构建的模版要好得多;从模型角度看,P-tuning 确实可以将 GPT 的分类性能发挥到跟 BERT 相近,从而揭示了 GPT 也有很强的 NLU 能力的事实。

进一步理解 P-tuning

这一节将会介绍笔者对 P-tuning 的进一步思考,以求从多个维度来理解 P-tuning。

离散 vs 连续

在 P-tuning 之前,也已经有一些在做模版的自动构建,如《How Can We Know What Language Models Know?》《AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts》等,但它们搜索的都是在离散空间下搜索的自然语言模版,所以效果有所限制,并没有取得特别突出的结果。

相反,P-tuning 放弃了“模版由自然语言构成”这一要求,从而将其变成了可以简单梯度下降求解的连续参数问题,效果还更好。同时,这一改动意味着 P-tuning 突出了模版的本质——即模版的关键在于它是怎么用的,不在于它由什么构成——给人一种去芜存菁、眼前一亮的感觉,确实值得点赞。

(注:经读者@brotherb提醒,年初有一篇论文《Prefix-Tuning: Optimizing Continuous Prompts for Generation》提出的 Prefix-Tuning 方法其实已经相当接近 P-tuning,两者都设计了非自然语言的模版,只不过 Prefix-Tuning 主要关心 NLG 的应用而 P-tuning 更加关心 NLU 的应用。)

Adapter

我们还可以从 Adapter 的角度来理解 P-tuning。BERT 出来后不久,Google 在论文《Parameter-Efficient Transfer Learning for NLP》中提出了一种名为 Adapter 的微调方式,它并不是直接微调整个模型,而是固定住 BERT 原始权重,然后在 BERT 的基础上添加一些残差模块,只优化这些残差模块,由于残差模块的参数更少,因此微调成本更低。Adapter 的思路实际上来源于 CV 的《Learning multiple visual domains with residual adapters》,不过这两年似乎很少看到了,也许是因为它虽然提高了训练速度,但是预测速度却降低了,精度往往还有所损失。

在 P-tuning 中,如果我们不将新插入的 token 视为“模版”,是将它视为模型的一部分,那么实际上 P-tuning 也是一种类似 Adapter 的做法,同样是固定原模型的权重,然后插入一些新的可优化参数,同样是只优化这些新参数,只不过这时候新参数插入的是 Embedding 层。因此,从这个角度看,P-tuning 与 Adapter 有颇多异曲同工之处。

为什么有效

然后,还有一个值得思考的问题:为什么 P-tuning 会更好?比如全量数据下,大家都是放开所有权重,P-tuning 的方法依然比直接 finetune 要好,为啥呢?

事实上,提出这个问题的读者,应该是对 BERT 加个全连接层的直接 finetune 做法“习以为常”了。很明显,不管是 PET 还是 P-tuning,它们其实都更接近预训练任务,而加个全连接层的做法,其实还没那么接近预训练任务,所以某种程度上来说,P-tuning 有效更加“显然”,反而是加个全连接层微调为什么会有效才是值得疑问的。

去年有篇论文《A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks》试图回答这个问题,大致的论证顺序是:

1、预训练模型是某种语言模型任务;
2、下游任务可以表示为该种语言模型的某个特殊情形;
3、当输出空间有限的时候,它又近似于加一个全连接层;
4、所以加一个全连接层微调是有效的。

可以看到,该论文的假设主要是第 2 点,其实就是直接假设了下游任务可以表达为类似 PET 的形式,然后才去证明的。所以这进一步说明了,PET、P-tuning 等才是更自然的使用预训练模型的方式,加全连接直接 finetune 的做法其实只是它们的推论罢了,也就是说,PET、P-tuning 才是返璞归真、回归本质的方案,所以它们更有效。

转载自《必须要GPT3吗?不,BERT的MLM模型也能小样本学习》《P-tuning:自动构建模版,释放语言模型潜能》,作者:苏剑林,部分内容有修改。

Prompt Learning(模板学习)

论文:https://arxiv.org/pdf/2107.13586.pdf

“Prompt:NLP 新范式”

“Pre-train, Prompt, and Predict” —- Prompt可以认为就是下游任务来适应预训练模型而做的微调 (所需数据量少、训练快、效果好),原始的微调是让预训练模型来适应下游任务。

文章摘自:未闻 Prompt 名

个人觉得 2021 年 NLP 最火的两个 idea,一个是对比学习(Contrastive Learning),另一个就是 Prompt

浅谈我对 Prompt 的理解

Prompt 说简单也简单,看了几篇论文以及博客后发现其实就是构建一个语言模版。但是细想起来又觉得复杂,因为总感觉里面还有很多细节,因此本文就来从头梳理一下 Prompt(Prompt 很多地方会翻译成「范式」,但是「范式」这个词本身也不好理解,因此读者把他看作是「模板」即可)

今天我还与室友讨论预训练模型(例如 BERT)到底做了什么,我给出的回答是

预训练模型提供了一个非常好的初始化参数,这组参数在预训练任务上的表现非常好(预训练损失非常低),但是由于下游任务千奇百怪,我们需要在这组参数的基础上进行 Fine-tune 以适应我们的下游任务(使得下游任务的损失值非常低)

上面这段话其实隐含了目前做 NLP 任务的大致流程,即 “Pre-train, Fine-tune”,而对我们来说实际上大部分时候都是直接拿别人预训练好的模型做 Fine-tune,并没有 Pre-train 这一步

融入了 Prompt 的模式大致可以归纳成 “Pre-train, Prompt, and Predict”,在该模式中,下游任务被重新调整成类似预训练任务的形式。例如,通常的预训练任务有 MLM(Masked Language Model),在文本情感分类任务中,对于 “I love this movie” 这句输入,可以在后面加上 Prompt:”the movie is ___”,组成如下这样一句话:

I love this movie, the movie is ___

然后让预训练模型用表示情感的答案(例如 “great”、”terrible” 等)做完形填空,最后再将该答案转换为情感分类的标签。这样一来,我们就可以通过构造合适的「模板」,通过小样本数据集训练一个模型来解决各种各样的下游任务

注意,Prompt 设计的这种完形填空和 MLM(Masked Language Modeling) 任务是有区别的,二者虽然都是都是词分类,但是候选集不同,MLM 的候选词是整个词库,不过如果是生成任务,那么 Prompt 和 MLM 的候选集就是一样的,都是整个词库

如何构建 Prompt

对于输入文本 x,存在一个函数 fPrompt(x),将 x 转化成 x′ 的形式,即

该函数通常会进行两步操作:

  1. 使用一个模板,模板通常为一段自然语言句子,并且该句子包含两个空位置:用于填输入 x 的位置 [X]、用于生成答案文本 z 的位置 [Z]
  2. 把输入 x 填到 [X] 的位置

以前文提到的例子为例,在文本情感分类任务中,假设输入是

x = "I love this movie"

使用的模板是

[X]. Overall, it was a [Z] movie

那么得到的 x′ 就应该是

I love this movie. Overall, it was a [Z] movie

在实际情况中,Prompt 来填充答案的位置一般在句中或句末。如果在句中,一般称这种 Prompt 为 Cloze Prompt;如果在句末,一般称这种 Prompt 为 Prefix Prompt。[X] 和 [Z] 的位置、数量以及使用模板句的不同,都有可能对结果造成影响,因此需要灵活调整

上面讲的都是简单的情感分类任务的 Prompt 设计,读者看到这里自然而然的会想到,其他 NLP 任务的 Prompt 如何设计呢?实际上刘鹏飞大神在他的论文中给我们提供了一些参考

Text Generation 中摘要任务里有一个关键字 TL;DR,这其实是 Too Long; Don't Read 的缩写

Prompt 的选择非常重要且困难

有上述 Prompt 的基础后,我们可以得知 Prompt 的设计主要包含两部分:

  1. 模板 T:例如 [X]. Overall, It was [Z]
  2. 标签词映射:即 [Z] 位置预测输出的词汇集合与真实标签 y 构成的映射关系。例如,标签 positive 对应单词 great,标签 negative 对应单词 terrible

基于 Prompt 的微调方法中,不同的模板和标签词对最终结果影响很大,下图是陈丹琦团队论文中的实验结果

从上图我们可以看出两点:

  1. 使用相同的「模板」,不同的「标签词」会产生不一样的效果。例如 great/terribel 和 cat/dog 这两组标签词的效果不一样,而且即便是相同标签词,互换顺序也会导致最终效果有所变化,例如 cat/dog 和 dot/cat
  2. 使用相同「标签词」,对「模板」进行小改动(例如增删标点)也会呈现不同的结果

Prompt 的设计

Prompt 大概可以从下面三个角度进行设计:

  • Prompt 的形状
  • 人工设计模板
  • 自动学习模板

Prompt 的形状

Prompt 的形状主要指的是 [X] 和 [Z] 的位置和数量。上文提到的 Cloze Prompt 与 Maksed Language Model 的训练方式非常类似,因此对于 MLM 任务来说,Cloze Prompt 更合适;对于生成任务或者使用自回归 LM 解决的任务,Prefix Prompt 更合适。

人工设计模板

Prompt 的模板最开始是人工设计的,人工设计一般基于人类的自然语言知识,力求得到语义流畅且高效的「模板」。例如,Petroni 等人在著名的 LAMA 数据集中为知识探针任务人工设计了 Cloze Templates;Brown 等人为问答、翻译和探针等任务设计了 Prefix Templates。人工设计模板的优点是直观,但缺点是需要很多实验、经验以及语言专业知识。下图是 GPT Understands, Too 论文中的一个实验结果

可以看到不同的 Prompt 只有细微的区别,有的甚至只是增加减少一个词,但是最后的结果会差几十个点

自动学习模板

为了解决人工设计模板的缺点,许多研究员开始探究如何自动学习到合适的模板。自动学习的模板又可以分为离散(Discrete Prompts)和连续(Continuous Prompts)两大类。离散方法主要包括:Prompt Mining,Prompt Paraphrasing,Gradient-based SearchPrompt Generation 和 Prompt Scoring;连续的则主要包括 Prefix TuningTuning Initialized with Discrete promptsHard-Soft Prompt Hybrid TuningP-Tuning v2

离散 Prompts

简单说一下上述几种方法,首先是离散的 Prompt Mining,这篇文章发表在 TACL 2020,讲的是如何拿预训练语言模型当作「知识库」使用,并且引入了依存树和 Paraphrase(转述)等方法来挖掘更好的「模板」,下图是实验结果

可以看到,被挖掘出来的若干「连接谓词」相比于人工设计的「模板」结果提升还是很明显的

有很多种方法可以实现 Prompt Paraphrsing,例如「回译」,我们通过 DeepL 翻译看个例子:

这样我们就得到了 x shares a border with y 的一个 Prompt Paraphrasing:x and y share a boundary

论文 BARTScore 干脆给我们提供了一张表,里面有各种词组的同义替换,这个我再熟悉不过了,因为以前英语考试我也背过类似的东西

Gradient-based Search(基于梯度的搜索)是由论文 AUTOPROMPT 提出的,这篇文章发表在 EMNLP 2020,它的主要思想用下面这张图就可以表示

上图中,a real joy 是原始的输入句子 xinp,红色的 Trigger tokens 是由 xinp「激发」的相关词汇集合 xtrig,根据 Template λ 的配置,将 xtrig 和 xinp 组合起来构造最终的输入 xprompt,送入 Masked LM 预测情感标签。下面的表格增加了很多 NLP 其他任务的例子

关于如何生成 xtrig 集合,实际上主要使用的是 HotFlip 和对抗训练的思想,感兴趣的同学可以看原论文以及 HotFlip: White-box adversarial examples for text classificationUniversal Adversarial Triggers for Attacking and Analyzing NLP 这两篇论文

Prompt Generation 是陈丹琦团队的一项工作,主要是把 Seq2Seq 预训练模型 T5 应用到模板搜索的过程。T5 基于多种无监督目标进行预训练,其中最有效的一个无监督目标就是:利用 <X> 或 < Y > 替换一个或多个连续 span,然后生成对应输出。例如:

Thank you <X> me to your party <Y> week

T5 会在 <X> 生成 for inviting,在 <Y> 生成 last。很显然,T5 这种方式很适合生成模板,而且不需要指定模板的 token 数。具体来说,有三种可能的生成方式⟨S1⟩→⟨X⟩ M(y) ⟨Y⟩ ⟨S1⟩⟨S1⟩→⟨S1⟩ ⟨X⟩ M(y) ⟨Y⟩⟨S1⟩,⟨S2⟩→⟨S1⟩ ⟨X⟩ M(y) ⟨Y⟩ ⟨S2⟩

具体的模板生成过程如下图所示:

首先在标签词前后添加填充位 <X> 和 < Y>(上面提到的三种生成方式),然后将其送入 T5 模型中,T5 会自动在填充位生成序列,最后将标签词(great 或 terribel)转换为 [MASK] 标签,形成多个模板。具体过程中采用 Beam Search 的方法生成多个候选模板,然后对每一个候选模板利用 dev 集进行微调,选择其中一个最佳模板

我还想说一下这篇论文中另外一个有意思的点,最后送入模型进行预测的句子还拼接上了每种类别的「示例」(Demonstration),如下图所示

这种 Prompt 的设计有点像是在做语义相似度任务,X 为原始 Input 句子,已知 Y 为正例,Z 为负例,构造了如下形式的输入:

X是[MASK]例?Y为正例;Z为负例

这有点像是编程语言中的三目运算符,或者说相当于让模型比较 X 与 Y、Z 的语义相似度。这里我们自然而然会想问:Y、Z 是如何挑选出来的?实际上是依据下面两条规则:

  1. 对于每个原始输入句子,从每个类别中随机采样一个样本「示例」拼接到 Prompt 中
  2. 对于每个原始输入句子,在每个类别中,通过与 Sentence-BERT 进行相似度计算,从相似度最高的前 50% 样本中随机选择一个样本「示例」

连续 Prompts

构造 Prompt 的初衷是能够找到一个合适的方法,让 Pre-trained Language Model(PLM)更好地输出我们想要的结果,但其实并不一定要将 Prompt 的形式设计成人类可以理解的自然语言,只要机器理解就行了。因此,还有一些方法探索连续型 Prompts—— 直接作用到模型的 Embedding 空间。连续型 Prompts 去掉了两个约束条件:

  1. 模版中词语的 Embedding 可以是整个自然语言的 Embedding,不再只是有限的一些 Embedding
  2. 模版的参数不再直接取 PLM 的参数,而是有自己独立的参数,可以通过下游任务的训练数据进行调整

Prefix Tuning 最开始由 Li 等人提出,这是一种在输入句子前添加一组连续型向量的方法,该方法保持 PLM 的参数不动,仅训练前缀(Prefix)向量。Prefix Tuning 的提出主要是为了做生成任务,因此它根据不同的模型结构定义了不同的 Prompt 拼接方式,在 GPT 类的 Auto-Regressive(自回归)模型上采用的是 [Prefix;x;y] 的方式,在 T5 类的 Encoder-Decoder 模型上采用的是 [Prefix;x;Prefix′;y] 的方式

输入部分 Prefix, \(x, y\) 的 Position id 分别记作

\(\mathrm{P}{\mathrm{idx}}\) , \(\mathrm{X}{\mathrm{idx}}\) , \(\mathrm{Y}{\mathrm{idx}}\)。Prefix Tuning 初始化一 个可训练的矩阵,记作 \(P\theta \in \mathbb{R}^{\left|P_{\mathrm{idx}}\right| \times \operatorname{dim}\left(h_i\right)}\) ,其中
\(h_i= \begin{cases}P_\theta[i,:], & \text { if } i \in \mathrm{P}{\mathrm{idx}} \ \mathbf{L M}\phi\left(z_i, h_{<i}\right), & \text { otherwise }\end{cases}\)
上述公式的含义是,索引 $i$ 如果属于前缀的部分,则从 \(P_\theta\) 中抽取向量; \(i\) 如果不是前缀部 分,则由参数固定的预训练模型生成对应的向量。训练目标为:
\(\max \phi \log p\phi(y \mid x)=\sum_{i \in \mathrm{Y}{\mathrm{idx}}} \log p\phi\left(z_i \mid h_{<i}\right)\)

 \(P_\theta\) 本质上是一个矩阵,而生成一个矩阵的方法又很多,可以用 nn.Embedding(),或者 nn.Linear()

同样是在连续空间上搜索 Prompt,OptiPrompt 构建的「模板」并不局限于前缀,也可以在句子的中间

Hard-Soft Prompt Hybrid Tuning 方法可以说是人工设计和自动学习的结合,它通常不单纯使用可学习的 Prompt 模板,而是在人工设计的模板中插入一些可学习的 Embedding。实际上有了上面的基础我们都知道,连续的 Prompt 要比离散的 Prompt 好一点,但是在此基础上还有什么改进的余地吗?Liu 等人提出的 P-Tuning 解决了 Prompt token 之间的关联性问题

之前连续的 Prompt 生成方式无非都是训练一个矩阵,然后通过索引出矩阵的某几行向量拼起来。坦白地说,我们希望这些 prompt token Embedding 之间有一个比较好的关联性,而不是独立地学习,为了解决这个问题,P-Tuning 引入了一个 Prompt Encoder(如下图 b 所示)

上图 a 是传统的离散型 Prompt,我们把生成离散 Prompt token 的东西叫做 Prompt Generator;上图 b 首先传入一些 Virtual(Pseudo)token,例如 BERT 词表中的 [unused1],[unused2],… 当然,这里的 token 数目是一个超参数,插入的位置也可以调整。将这些 Pseudo token 通过一个 Prompt Encoder 得到连续的向量 h0,…,hm,其中

大家可能想问,如何优化 P-tuning?实际上根据标注数据量的多少,分两种情况讨论

  1. 标注数据比较少。这种情况,我们固定 PLM 的参数,只优化 [P0]∼[Pm] 这几个 token 的 Embedding。换句话说,我们只是要更新 Prompt Encoder 的参数
  2. 标注数据很充足。这种情况直接放开所有参数微调

就在 P-Tuning 方法提出不久后,Liu 等人又提出了 P-Tuning v2,主要解决 P-Tuning 的两个问题:

  1. 当预训练模型的参数量低于 100 亿(10B)时,Prompt tuning 会比传统的 Fine-tuning 差
  2. 诸如序列标注这样对推理和理解要求高的任务,prompt tuning 效果会变差

Liu 等人认为先前的 P-Tuning 只用了一层 BiLSTM 来编码 Pseudo token,这是其推理能力不足的原因之一,因此 v2 版本提出 Deep Prompt Tuning,用 Prefix Tuning 中的深层模型替换 BiLSTM,如下图所示

P-Tuning v2 相比于 P-Tuning,区别在于:

  • 取消 Reparameterization:以前的方法利用重参数化功能来提高训练速度和鲁棒性(例如,用于 Prefix-Tuning 的 MLP 和用于 P-Tuning 的 LSTM)。在 P-Tuning v2 中,作者发现重参数化的改进很小,尤其是对于较小的模型,同时还会影响模型的表现
  • Multi-task Learning:Deep Prompt Tuning 的优化难题可以通过增加额外的任务数据或者无标注数据来缓解,同时可微调的 Prefix Continuous Prompt 也可以用来做跨任务的知识共享。例如在 NER 中,可以同时训练多个数据集,不同数据集使用不同的顶层 Classifier,但是 Prefix Continuous Prompt 是共享的
  • 取消 verbalizer:v2 取消了标签映射,完全变为生成模型,可以在 [CLS] 部分输出句子级别的标签(Sentence-level label),也可以在每个 token 位置输出 token 级别的标签(Token-level label),直接输出真实标签

关于 P-Tuning 还有一些碎碎念,主要是从各个博客上看到的,汇总在这里。首先是 v1 版本的 LSTM,实际上引入 LSTM 目的是为了帮助「模板」生成的 token(某种程度上)更贴近自然语言,或者说 token 之间的语义更流畅,但更自然的方法应该是在训练下游任务的时候,不仅预测下游任务的目标 token(例如 “great”、”terrible”),还应该同时做其他 token 的预测

比如,如果是 MLM 模型,那么也随机 MASK 掉其它的一些 token 来预测,如果是 LM 模型,则预测完整的序列,而不单单是目标词。这样做的理由是:因为我们的 MLM/LM 都是经过自然语言预训练的,所以我们认为它能够很好的完成序列的重构,即便一开始不能,随着迭代轮数的增加,模型也能很好完成这项任务。所以这本质上是让模型进行「负重训练」

* 为什么要引入 Prompt?

在标准的 Fine-tune 过程中(如上图 b 所示),新引入的参数量可能会很大(独立于原始预训练模型外的参数),例如基于 RoBERTa-large 的二分类任务会新引入 2048 个参数(nn.Linear(1024, 2)),如果你仅有例如 64 个标注数据这样的小样本数据集,微调会非常困难

为解决这一问题,Prompt 应运而生(如上图 a 所示),直接将下游任务转换为输出空间有限的 MLM 任务。值得注意的是:上述方法在预训练参数的基础上进行微调,并且没有引入任何新参数,同时还减少了微调和预训练任务之间的差距。总的来说,这可以更有效地用于小样本场景

Prompt 的挑战与展望

尽管 Prompt 研究搞得如火如荼,但目前仍存在许多问题值得研究者们去探究

  1. Prompt 的设计问题。目前使用 Prompt 的工作大多集中于分类任务和生成任务,其它任务则较少。另外,「模板」和「答案」的联系也亟待解决。模型的表现同时依赖于使用的「模板」和「答案」的映射,如何同时搜索或者学习出两者联合的最好效果仍然很具挑战性
  2. Prompt 的理论分析和可解释性。尽管 Prompt 方法在很多情况下都取得了成功,但是目前 Prompt-based Learning 理论分析还很少,人们很难了解 Prompt 为什么能达到好的效果,又为什么在自然语言中意义相近的 Prompt 有时效果却相差很大
  3. Prompt 在 PLM debias 方面的应用。由于 PLM 在预训练过程中见过了大量的人类世界的自然语言,所以很自然地会受到一些影响。举一个简单的例子,比如说训练语料中有非常多 “The capital of China is Beijing”,导致模型每次看到 “capital” 的时候都会预测出 “Beijing”,而不是去分析到底是哪个国家的首都。在应用的过程中,Prompt 还暴露了 PLM 学习到的很多其它 bias,比如种族歧视、性别对立等。这也许会是一个值得研究的方向

One More Thing

最后我还想提一个实际 Code 过程中存在的问题。我们知道 MLM 任务会输出句子中 [MASK] 位置最有可能的词,而 Prompt 也类似的,例如下面的例子

这是一条__新闻。中国足球出线的可能性只有0.001%,留给中国队的时间不多了

这是一个新闻分类问题,真实标签有 “体育”、”财经”、”娱乐” 等,上面的样本很明显是一条体育新闻,因此我们希望模型对 [MASK] 部分输出 “体育”,但事实真的如此吗?实际情况模型的输出可能是 “足球”,但你认为模型预测的 “足球” 有问题吗?好像也没啥毛病,因此这就引申出了 Prompt 的一个问题,是否应该限制模型的输出空间?

还是上面新闻分类的例子,我们是否应该限制模型输出的空间,让他固定只能预测 “体育”、”财经”、”娱乐” 这几个标签?或者我们干脆把这几个标签换成索引,那就是让模型从 0,1,2 这三个数字选一个。Wait Wait Wait,如果这么做的话,和 Fine-Tune 有什么区别,Fine-Tune 也是把标签转换成索引,让模型看了句子之后,从这几个索引中选一个作为预测值

这么说的话,那我们就不应该限制模型的输出空间,可是这样的话 [MASK] 位置的输出就限制的太死了,必须一定是 “good”、”财经” 才算对,如果输出 “nice”、”财政” 就算错。实际上输出近义词或者相似词,在零样本的情况下会经常出现,但是如果你用一些有标签的样本去训练,模型自己就会慢慢固定输出空间。例如 “财经”,它不会预测成 “财政”,只会预测成其它类型的新闻,例如 “体育”

References

Induction Networks for Few-Shot TextClassification

论文:https://arxiv.org/abs/1902.10482?context=cs.CL

                            IJCNLP 2019 paper

代码: https://github.com/wuzhiye7/Induction-Network-on-FewRel

在深度学习领域,监督式深度学习对大型标记数据集的贪婪需求是出了名的,然而又由于标注数据集的昂贵成本,这就限制了深度模型对新类的可泛化性。本文提出了一个用于在文本分类领域的小样本学习训练工作。

什么是小样本学习(以图片为例)

few-shot learing 的训练目标与传统的监督学习目标不同,传统的分类是学会识别训练集合里面的图片,并且泛化到测试集合,神经网络识别出该图片属于哪个类。而few shot learing是让机器自己学会学习,学习的目的不是让机器学会那个是大象那个是老虎,而是让模型学会学习不同类别的不同之处,给定两张图片,模型知道两个图片是否是同一类别。哪怕模型训练集中没有出现过该类别。

当前的小样本学习技术经常会将输入的query和support的样本集合进行sample-wise级别的对比。但是,如果跟同一个类别下的不同表达的样本去对比的时候产生的效果就不太好,除此之外,目前的技术会使用简单地求和或平均表示来计算类别,这会丢失一些信息。因此本文利用胶囊网络,通过学习sample所属于的类别的表示得到class-wise的向量,然后跟输入的query进行对比。

模型如下:

模型分为三个模块:Encoder Module, Induction Module and Relation Module.

Encoder Module

编码器使用双向LSTM,然后对每个隐藏层进行self-attention。

其中H维度为[C*K, T, 2u] ,经过矩阵变化,a的维度变为[C*K, T] ,最后e的维度为[C*K, 2u]

Induction Module

本模块的主要目的是设计一个从样本向量到类向量的非线性映射。

这是使用动态路由算法,输出的capsule数为1.

首先将样本表征进行一次变换,这里为了能够支持不同大小的C,对原Capsule Network中不同类别使用不同的W做了修改,也就是使用一个所有类别共享的W。

Relation Module

在得到类表示后,就可以计算ci与query set的相关性了。

Objective Function

使用均方误差来计算损失,匹配对的相似度为1,不匹配的相似度为0。

小样本学习之—Pretraining+Fine Tuning (3)

小样本学习里面有一个非常常见和实用的方法:使用大数据集预训练,小数据集中微调。

参考文献和代码:

1. Chen, Liu, Kira, Wang, & Huang. A Closer Look at Few-shot Classification. In ICLR, 2019. 代码: https://github.com/wyharveychen/Close…

2. Dhillon, Chaudhari, Ravichandran, & Soatto. A baseline for few-shot image classification. In ICLR, 2020.

3. Chen, Wang, Liu, Xu, & Darrell. A New Meta-Baseline for Few-Shot Learning. arXiv, 2020. 代码:https://github.com/cyvius96/few-shot-…

小样本学习旨在通过有限标记数据学习识别新类别,可以将小样本学习算法分为三大类:基于初始化的方法、基于度量学习的方法和基于数据增强的方法。

基于初始化的方法:学习微调,旨在学习一个好的模型初始化策略,使得能够通过少量标记数据和有限的梯度更新轮次即可完成对新类别的分类,或者学习一个优化器。

基于距离度量的方法:学习比较,如果一个模型能够计算两张图像的相似度,那么它可能基于标记数据对未知图像进行分类,一般基于余弦相似度、欧式距离、岭回归、图神经网络等计算距离。

基于数据增强的方法:学习增强,旨在通过学习一个数据生成器,通过数据生成器增强新类的样本量。由于基于数据增强的方法往往与零样本方法协同优化,所以本文作者不考虑基于数据增强的方法。

领域自适应:一种旨在缓解源领域和目标领域间领域漂移现象的技术。小样本分类与领域自适应类似,区别在于在领域自适应中,目标域往往拥有大量的可用样本,而小样本学习在新领域中仅有少量可用样本。

大部分的思路:通过一个大数据集进行训练一个神经网络,在做few-shot的时候,我们要用到这个与训练好的 神经网络,我们分别把query和support set送进神经网络,并获得特征向量,就可以比较 query和support set 在空间上的相似度(比如余弦相似度),然后就可以比较了。训练好的网络去掉全连接层。

2、不微调,直接使用预训练的网络进行预测:

3、对query分类:

4、微调:

微调调的是softmax,而不是CNN特征提取网络!也就是微调下面的W和B

1、微调的初始化:让W=M,b=0

如何微调:使用entropy损失来更新w和b参数

小样本学习之–基本概念(1)

来自wangshusen的课件: https://github.com/wangshusen/DeepLearning

小样本学习就是用极少的数据做分类or回归。

如上面的suppoort set 里面有两个类别,右边query图片你认为是set里面的哪个类别?对于人类来说非常简单,对于机器来说能不能像人一样去识别query属于哪一类?

对于小样本学习,不可能按照传统的方法去训练一个分类模型。小样本不能训练出一个神经网络。few-shot learing就是去解决小样本分类问题。

few-shot learing 的训练目标与传统的监督学习目标不同,传统的分类是学会识别训练集合里面的图片 ,并且泛化到测试机,神经网络识别出该图片属于哪个类。而few shot learing是让机器自己学会学习,学习的目的不是让机器学会那个是大象那个是老虎,而是让模型学会学习不同类别的不同之处,给定两张图片,模型知道两个图片是否是同一类别。哪怕模型训练集中没有出现过该类别。

虽然模型没有见过斗牛犬和穿山甲,但能够知道这两个是不是一个东西。

给神经网络一个query和六张图片(support set),现在神经网络依次将query和supportset做对比,判断query和那个更相似。

support set 和 train set 区别:

训练集很大每一类下面有很多图片,support set小,只能在预测的时候提供一些额外信息,用一个大的训练集训练一个神经网络,训练的目的不是让模型识别图片里面的大象老虎,而是学会理解不同类别的异同。

Meta learing:元学习,自己学会学习

传统监督学习 vs few shot learing

主要区别就是:是否训练集中存在测试的类别。query可以是模型没见过的类别。因此会更难。

为了让模型识别没见的东西:需要为模型提供一个参考:support set,计算相似度。

K-way N-shot support set: K-有K个类别,,每个类别有n个样本。

准确度随着 K个类别 数增加而降低。

准确度随着每个类的样本数而增加

小样本学习最基本的想法:

可以用大规模数据集做训练:

数据集:

1、手写数字:

2、图片数据

小样本学习之—-孪生网络(Siamese Networks) (2)

孪生网络(Siamese Networks) 属于二分类,基于相似性的先验知识。

参考论文:

1、Siamese Neural Networks for One-shot Image Recognition

2、FaceNet: A Unified Embedding for Face Recognition and Clustering

简单来说,Siamese network就是“连体的神经网络”,神经网络的“连体”是通过共享权值来实现的,如下图所示。

训练连体网络的两种方法:

1、每次取一对样本,比较他们的相似度。

Siamese Neural Networks for One-shot Image Recognition

使用一个大的数据集,每一类里面有很多样本。用训练集构造正样本和负样本。正样本告诉神经网络什么是同一类,负样本告诉神经网络不同样本的区别。

1.1 数据集的获取 (正负样本的构造)

从同一类图片随机抽取两张图片,并设置标签1,表示同一类。

从不同类中随机抽取两张图片,设置标签为0,表示不同的类别。

1.2、网络架构

1.3 训练:衡量不同图片的相似度(为什么叫连体网络:共享权值,公用一个CNN框架)

1.4损失:

target 和 sim 的损失,来更新参数。

1.4、测试位置的数据

逐一将support set里面的图片分别和query求相似度,取其最高的那个。

2、每次取三个样本( anchor ,+,-),比较他们的相似度。

FaceNet: A Unified Embedding for Face Recognition and Clustering

2.1数据集处理:

从数据集中随机抽取一张图片,作为锚点anchor,再从该类别中随机去一张正样本 ,从除了该类以外的图片中抽取一个负样本作为sample

2.2训练:共享网络

2.3损失函数:

模板d+越小,d-越大

2.4 测试:选择模型输出距离最小的。

总结:

Few-Shot Papers–小样本学习论文汇总

来自GitHub仓库:https://github.com/tata1661/FSL-Mate/tree/master/FewShotPapers

This repository contains few-shot learning (FSL) papers mentioned in our FSL survey published in ACM Computing Surveys (JCR Q1, CORE A*).

For convenience, we also include public implementations of respective authors.

We will update this paper list to include new FSL papers periodically.

Citation

Please cite our paper if you find it helpful.

@article{wang2020generalizing,
  title={Generalizing from a few examples: A survey on few-shot learning},
  author={Wang, Yaqing and Yao, Quanming and Kwok, James T and Ni, Lionel M},
  journal={ACM Computing Surveys},
  volume={53},
  number={3},
  pages={1--34},
  year={2020},
  publisher={ACM New York, NY, USA}
}

Content

  1. Survey
  2. Data
  3. Model
    1. Multitask Learning
    2. Embedding/Metric Learning
    3. Learning with External Memory
    4. Generative Modeling
  4. Algorithm
    1. Refining Existing Parameters
    2. Refining Meta-learned Parameters
    3. Learning Search Steps
  5. Applications
    1. Computer Vision
    2. Robotics
    3. Natural Language Processing
    4. Acoustic Signal Processing
    5. Recommendation
    6. Others
  6. Theories
  7. Few-shot Learning and Zero-shot Learning
  8. Variants of Few-shot Learning
  9. Datasets/Benchmarks
  10. Software Library

Survey

  1. Generalizing from a few examples: A survey on few-shot learning, CSUR, 2020 Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni. paper arXiv

Data

  1. Learning from one example through shared densities on transforms, in CVPR, 2000. E. G. Miller, N. E. Matsakis, and P. A. Viola. paper
  2. Domain-adaptive discriminative one-shot learning of gestures, in ECCV, 2014. T. Pfister, J. Charles, and A. Zisserman. paper
  3. One-shot learning of scene locations via feature trajectory transfer, in CVPR, 2016. R. Kwitt, S. Hegenbart, and M. Niethammer. paper
  4. Low-shot visual recognition by shrinking and hallucinating features, in ICCV, 2017. B. Hariharan and R. Girshick. paper code
  5. Improving one-shot learning through fusing side information, arXiv preprint, 2017. Y.H.Tsai and R.Salakhutdinov. paper
  6. Fast parameter adaptation for few-shot image captioning and visual question answering, in ACM MM, 2018. X. Dong, L. Zhu, D. Zhang, Y. Yang, and F. Wu. paper
  7. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning, in CVPR, 2018. Y. Wu, Y. Lin, X. Dong, Y. Yan, W. Ouyang, and Y. Yang. paper
  8. Low-shot learning with large-scale diffusion, in CVPR, 2018. M. Douze, A. Szlam, B. Hariharan, and H. Jégou. paper
  9. Diverse few-shot text classification with multiple metrics, in NAACL-HLT, 2018. M. Yu, X. Guo, J. Yi, S. Chang, S. Potdar, Y. Cheng, G. Tesauro, H. Wang, and B. Zhou. paper code
  10. Delta-encoder: An effective sample synthesis method for few-shot object recognition, in NeurIPS, 2018. E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, A. Kumar, R. Feris, R. Giryes, and A. Bronstein. paper
  11. Low-shot learning via covariance-preserving adversarial augmentation networks, in NeurIPS, 2018. H. Gao, Z. Shou, A. Zareian, H. Zhang, and S. Chang. paper
  12. Learning to self-train for semi-supervised few-shot classification, in NeurIPS, 2019. X. Li, Q. Sun, Y. Liu, S. Zheng, Q. Zhou, T.-S. Chua, and B. Schiele. paper
  13. Few-shot learning with global class representations, in ICCV, 2019. A. Li, T. Luo, T. Xiang, W. Huang, and L. Wang. paper
  14. AutoAugment: Learning augmentation policies from data, in CVPR, 2019. E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le. paper
  15. EDA: Easy data augmentation techniques for boosting performance on text classification tasks, in EMNLP and IJCNLP, 2019. J. Wei and K. Zou. paper
  16. LaSO: Label-set operations networks for multi-label few-shot learning, in CVPR, 2019. A. Alfassy, L. Karlinsky, A. Aides, J. Shtok, S. Harary, R. Feris, R. Giryes, and A. M. Bronstein. paper code
  17. Image deformation meta-networks for one-shot learning, in CVPR, 2019. Z. Chen, Y. Fu, Y.-X. Wang, L. Ma, W. Liu, and M. Hebert. paper code
  18. Spot and learn: A maximum-entropy patch sampler for few-shot image classification, in CVPR, 2019. W.-H. Chu, Y.-J. Li, J.-C. Chang, and Y.-C. F. Wang. paper
  19. Data augmentation using learned transformations for one-shot medical image segmentation, in CVPR, 2019. A. Zhao, G. Balakrishnan, F. Durand, J. V. Guttag, and A. V. Dalca. paper
  20. Adversarial feature hallucination networks for few-shot learning, in CVPR, 2020. K. Li, Y. Zhang, K. Li, and Y. Fu. paper
  21. Instance credibility inference for few-shot learning, in CVPR, 2020. Y. Wang, C. Xu, C. Liu, L. Zhang, and Y. Fu. paper
  22. Diversity transfer network for few-shot learning, in AAAI, 2020. M. Chen, Y. Fang, X. Wang, H. Luo, Y. Geng, X. Zhang, C. Huang, W. Liu, and B. Wang. paper code
  23. Neural snowball for few-shot relation learning, in AAAI, 2020. T. Gao, X. Han, R. Xie, Z. Liu, F. Lin, L. Lin, and M. Sun. paper code
  24. Associative alignment for few-shot image classification, in ECCV, 2020. A. Afrasiyabi, J. Lalonde, and C. Gagné. paper code
  25. Information maximization for few-shot learning, in NeurIPS, 2020. M. Boudiaf, I. Ziko, J. Rony, J. Dolz, P. Piantanida, and I. B. Ayed. paper code
  26. Self-training for few-shot transfer across extreme task differences, in ICLR, 2021. C. P. Phoo, and B. Hariharan. paper
  27. Free lunch for few-shot learning: Distribution calibration, in ICLR, 2021. S. Yang, L. Liu, and M. Xu. paper code
  28. Parameterless transductive feature re-representation for few-shot learning, in ICML, 2021. W. Cui, and Y. Guo;. paper
  29. Learning intact features by erasing-inpainting for few-shot classification, in AAAI, 2021. J. Li, Z. Wang, and X. Hu. paper
  30. Variational feature disentangling for fine-grained few-shot classification, in ICCV, 2021. J. Xu, H. Le, M. Huang, S. Athar, and D. Samaras. paper
  31. Coarsely-labeled data for better few-shot transfer, in ICCV, 2021. C. P. Phoo, and B. Hariharan. paper
  32. Pseudo-loss confidence metric for semi-supervised few-shot learning, in ICCV, 2021. K. Huang, J. Geng, W. Jiang, X. Deng, and Z. Xu. paper
  33. Iterative label cleaning for transductive and semi-supervised few-shot learning, in ICCV, 2021. M. Lazarou, T. Stathaki, and Y. Avrithis. paper
  34. Meta two-sample testing: Learning kernels for testing with limited data, in NeurIPS, 2021. F. Liu, W. Xu, J. Lu, and D. J. Sutherland. paper
  35. Dynamic distillation network for cross-domain few-shot recognition with unlabeled data, in NeurIPS, 2021. A. Islam, C.-F. Chen, R. Panda, L. Karlinsky, R. Feris, and R. Radke. paper
  36. Towards better understanding and better generalization of low-shot classification in histology images with contrastive learning, in ICLR, 2022. J. Yang, H. Chen, J. Yan, X. Chen, and J. Yao. paper code
  37. FlipDA: Effective and robust data augmentation for few-shot learning, in ACL, 2022. J. Zhou, Y. Zheng, J. Tang, L. Jian, and Z. Yang. paper code
  38. PromDA: Prompt-based data augmentation for low-resource NLU tasks, in ACL, 2022. Y. Wang, C. Xu, Q. Sun, H. Hu, C. Tao, X. Geng, and D. Jiang. paper code
  39. N-shot learning for augmenting task-oriented dialogue state tracking, in Findings of ACL, 2022. I. T. Aksu, Z. Liu, M. Kan, and N. F. Chen. paper
  40. Generating representative samples for few-shot classification, in CVPR, 2022. J. Xu, and H. Le. paper code
  41. Semi-supervised few-shot learning via multi-factor clustering, in CVPR, 2022. J. Ling, L. Liao, M. Yang, and J. Shuai. paper

Model

Multitask Learning

  1. Multi-task transfer methods to improve one-shot learning for multimedia event detection, in BMVC, 2015. W. Yan, J. Yap, and G. Mori. paper
  2. Label efficient learning of transferable representations across domains and tasks, in NeurIPS, 2017. Z. Luo, Y. Zou, J. Hoffman, and L. Fei-Fei. paper
  3. Few-shot adversarial domain adaptation, in NeurIPS, 2017. S. Motiian, Q. Jones, S. Iranmanesh, and G. Doretto. paper
  4. One-shot unsupervised cross domain translation, in NeurIPS, 2018. S. Benaim and L. Wolf. paper
  5. Multi-content GAN for few-shot font style transfer, in CVPR, 2018. S. Azadi, M. Fisher, V. G. Kim, Z. Wang, E. Shechtman, and T. Darrell. paper code
  6. Feature space transfer for data augmentation, in CVPR, 2018. B. Liu, X. Wang, M. Dixit, R. Kwitt, and N. Vasconcelos. paper
  7. Fine-grained visual categorization using meta-learning optimization with sample selection of auxiliary data, in ECCV, 2018. Y. Zhang, H. Tang, and K. Jia. paper
  8. Few-shot charge prediction with discriminative legal attributes, in COLING, 2018. Z. Hu, X. Li, C. Tu, Z. Liu, and M. Sun. paper
  9. Boosting few-shot visual learning with self-supervision, in ICCV, 2019. S. Gidaris, A. Bursuc, N. Komodakis, P. Pérez, and M. Cord. paper
  10. When does self-supervision improve few-shot learning?, in ECCV, 2020. J. Su, S. Maji, and B. Hariharan. paper
  11. Pareto self-supervised training for few-shot learning, in CVPR, 2021. Z. Chen, J. Ge, H. Zhan, S. Huang, and D. Wang. paper
  12. Bridging multi-task learning and meta-learning: Towards efficient training and effective adaptation, in ICML, 2021. H. Wang, H. Zhao, and B. Li;. paper code

Embedding/Metric Learning

  1. Object classification from a single example utilizing class relevance metrics, in NeurIPS, 2005. M. Fink. paper
  2. Optimizing one-shot recognition with micro-set learning, in CVPR, 2010. K. D. Tang, M. F. Tappen, R. Sukthankar, and C. H. Lampert. paper
  3. Siamese neural networks for one-shot image recognition, ICML deep learning workshop, 2015. G. Koch, R. Zemel, and R. Salakhutdinov. paper
  4. Matching networks for one shot learning, in NeurIPS, 2016. O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra et al. paper
  5. Learning feed-forward one-shot learners, in NeurIPS, 2016. L. Bertinetto, J. F. Henriques, J. Valmadre, P. Torr, and A. Vedaldi. paper
  6. Few-shot learning through an information retrieval lens, in NeurIPS, 2017. E. Triantafillou, R. Zemel, and R. Urtasun. paper
  7. Prototypical networks for few-shot learning, in NeurIPS, 2017. J. Snell, K. Swersky, and R. S. Zemel. paper code
  8. Attentive recurrent comparators, in ICML, 2017. P. Shyam, S. Gupta, and A. Dukkipati. paper
  9. Learning algorithms for active learning, in ICML, 2017. P. Bachman, A. Sordoni, and A. Trischler. paper
  10. Active one-shot learning, arXiv preprint, 2017. M. Woodward and C. Finn. paper
  11. Structured set matching networks for one-shot part labeling, in CVPR, 2018. J. Choi, J. Krishnamurthy, A. Kembhavi, and A. Farhadi. paper
  12. Low-shot learning from imaginary data, in CVPR, 2018. Y.-X. Wang, R. Girshick, M. Hebert, and B. Hariharan. paper
  13. Learning to compare: Relation network for few-shot learning, in CVPR, 2018. F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales. paper code
  14. Dynamic conditional networks for few-shot learning, in ECCV, 2018. F. Zhao, J. Zhao, S. Yan, and J. Feng. paper code
  15. TADAM: Task dependent adaptive metric for improved few-shot learning, in NeurIPS, 2018. B. Oreshkin, P. R. López, and A. Lacoste. paper
  16. Meta-learning for semi-supervised few-shot classification, in ICLR, 2018. M. Ren, S. Ravi, E. Triantafillou, J. Snell, K. Swersky, J. B. Tenen- baum, H. Larochelle, and R. S. Zemel. paper code
  17. Few-shot learning with graph neural networks, in ICLR, 2018. V. G. Satorras and J. B. Estrach. paper code
  18. A simple neural attentive meta-learner, in ICLR, 2018. N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel. paper
  19. Meta-learning with differentiable closed-form solvers, in ICLR, 2019. L. Bertinetto, J. F. Henriques, P. Torr, and A. Vedaldi. paper
  20. Learning to propagate labels: Transductive propagation network for few-shot learning, in ICLR, 2019. Y. Liu, J. Lee, M. Park, S. Kim, E. Yang, S. Hwang, and Y. Yang. paper code
  21. Multi-level matching and aggregation network for few-shot relation classification, in ACL, 2019. Z.-X. Ye, and Z.-H. Ling. paper
  22. Induction networks for few-shot text classification, in EMNLP-IJCNLP, 2019. R. Geng, B. Li, Y. Li, X. Zhu, P. Jian, and J. Sun. paper
  23. Hierarchical attention prototypical networks for few-shot text classification, in EMNLP-IJCNLP, 2019. S. Sun, Q. Sun, K. Zhou, and T. Lv. paper
  24. Cross attention network for few-shot classification, in NeurIPS, 2019. R. Hou, H. Chang, B. Ma, S. Shan, and X. Chen. paper
  25. Hybrid attention-based prototypical networks for noisy few-shot relation classification, in AAAI, 2019. T. Gao, X. Han, Z. Liu, and M. Sun. paper code
  26. Attention-based multi-context guiding for few-shot semantic segmentation, in AAAI, 2019. T. Hu, P. Yang, C. Zhang, G. Yu, Y. Mu and C. G. M. Snoek. paper
  27. Distribution consistency based covariance metric networks for few-shot learning, in AAAI, 2019. W. Li, L. Wang, J. Xu, J. Huo, Y. Gao and J. Luo. paper
  28. A dual attention network with semantic embedding for few-shot learning, in AAAI, 2019. S. Yan, S. Zhang, and X. He. paper
  29. TapNet: Neural network augmented with task-adaptive projection for few-shot learning, in ICML, 2019. S. W. Yoon, J. Seo, and J. Moon. paper
  30. Prototype propagation networks (PPN) for weakly-supervised few-shot learning on category graph, in IJCAI, 2019. L. Liu, T. Zhou, G. Long, J. Jiang, L. Yao, C. Zhang. paper code
  31. Collect and select: Semantic alignment metric learning for few-shot learning, in ICCV, 2019. F. Hao, F. He, J. Cheng, L. Wang, J. Cao, and D. Tao. paper
  32. Transductive episodic-wise adaptive metric for few-shot learning, in ICCV, 2019. L. Qiao, Y. Shi, J. Li, Y. Wang, T. Huang, and Y. Tian. paper
  33. Few-shot learning with embedded class models and shot-free meta training, in ICCV, 2019. A. Ravichandran, R. Bhotika, and S. Soatto. paper
  34. PARN: Position-aware relation networks for few-shot learning, in ICCV, 2019. Z. Wu, Y. Li, L. Guo, and K. Jia. paper
  35. PANet: Few-shot image semantic segmentation with prototype alignment, in ICCV, 2019. K. Wang, J. H. Liew, Y. Zou, D. Zhou, and J. Feng. paper code
  36. RepMet: Representative-based metric learning for classification and few-shot object detection, in CVPR, 2019. L. Karlinsky, J. Shtok, S. Harary, E. Schwartz, A. Aides, R. Feris, R. Giryes, and A. M. Bronstein. paper code
  37. Edge-labeling graph neural network for few-shot learning, in CVPR, 2019. J. Kim, T. Kim, S. Kim, and C. D. Yoo. paper
  38. Finding task-relevant features for few-shot learning by category traversal, in CVPR, 2019. H. Li, D. Eigen, S. Dodge, M. Zeiler, and X. Wang. paper code
  39. Revisiting local descriptor based image-to-class measure for few-shot learning, in CVPR, 2019. W. Li, L. Wang, J. Xu, J. Huo, Y. Gao, and J. Luo. paper code
  40. TAFE-Net: Task-aware feature embeddings for low shot learning, in CVPR, 2019. X. Wang, F. Yu, R. Wang, T. Darrell, and J. E. Gonzalez. paper code
  41. Improved few-shot visual classification, in CVPR, 2020. P. Bateni, R. Goyal, V. Masrani, F. Wood, and L. Sigal. paper
  42. Boosting few-shot learning with adaptive margin loss, in CVPR, 2020. A. Li, W. Huang, X. Lan, J. Feng, Z. Li, and L. Wang. paper
  43. Adaptive subspaces for few-shot learning, in CVPR, 2020. C. Simon, P. Koniusz, R. Nock, and M. Harandi. paper
  44. DPGN: Distribution propagation graph network for few-shot learning, in CVPR, 2020. L. Yang, L. Li, Z. Zhang, X. Zhou, E. Zhou, and Y. Liu. paper
  45. Few-shot learning via embedding adaptation with set-to-set functions, in CVPR, 2020. H.-J. Ye, H. Hu, D.-C. Zhan, and F. Sha. paper code
  46. DeepEMD: Few-shot image classification with differentiable earth mover’s distance and structured classifiers, in CVPR, 2020. C. Zhang, Y. Cai, G. Lin, and C. Shen. paper code
  47. Few-shot text classification with distributional signatures, in ICLR, 2020. Y. Bao, M. Wu, S. Chang, and R. Barzilay. paper code
  48. Learning task-aware local representations for few-shot learning, in IJCAI, 2020. C. Dong, W. Li, J. Huo, Z. Gu, and Y. Gao. paper
  49. SimPropNet: Improved similarity propagation for few-shot image segmentation, in IJCAI, 2020. S. Gairola, M. Hemani, A. Chopra, and B. Krishnamurthy. paper
  50. Asymmetric distribution measure for few-shot learning, in IJCAI, 2020. W. Li, L. Wang, J. Huo, Y. Shi, Y. Gao, and J. Luo. paper
  51. Transductive relation-propagation network for few-shot learning, in IJCAI, 2020. Y. Ma, S. Bai, S. An, W. Liu, A. Liu, X. Zhen, and X. Liu. paper
  52. Weakly supervised few-shot object segmentation using co-attention with visual and semantic embeddings, in IJCAI, 2020. M. Siam, N. Doraiswamy, B. N. Oreshkin, H. Yao, and M. Jägersand. paper
  53. Few-shot learning on graphs via super-classes based on graph spectral measures, in ICLR, 2020. J. Chauhan, D. Nathani, and M. Kaul. paper
  54. SGAP-Net: Semantic-guided attentive prototypes network for few-shot human-object interaction recognition, in AAAI, 2020. Z. Ji, X. Liu, Y. Pang, and X. Li. paper
  55. One-shot image classification by learning to restore prototypes, in AAAI, 2020. W. Xue, and W. Wang. paper
  56. Negative margin matters: Understanding margin in few-shot classification, in ECCV, 2020. B. Liu, Y. Cao, Y. Lin, Q. Li, Z. Zhang, M. Long, and H. Hu. paper code
  57. Prototype rectification for few-shot learning, in ECCV, 2020. J. Liu, L. Song, and Y. Qin. paper
  58. Rethinking few-shot image classification: A good embedding is all you need?, in ECCV, 2020. Y. Tian, Y. Wang, D. Krishnan, J. B. Tenenbaum, and P. Isola. paper code
  59. SEN: A novel feature normalization dissimilarity measure for prototypical few-shot learning networks, in ECCV, 2020. V. N. Nguyen, S. Løkse, K. Wickstrøm, M. Kampffmeyer, D. Roverso, and R. Jenssen. paper
  60. TAFSSL: Task-adaptive feature sub-space learning for few-shot classification, in ECCV, 2020. M. Lichtenstein, P. Sattigeri, R. Feris, R. Giryes, and L. Karlinsky. paper
  61. Attentive prototype few-shot learning with capsule network-based embedding, in ECCV, 2020. F. Wu, J. S.Smith, W. Lu, C. Pang, and B. Zhang. paper
  62. Embedding propagation: Smoother manifold for few-shot classification, in ECCV, 2020. P. Rodríguez, I. Laradji, A. Drouin, and A. Lacoste. paper code
  63. Laplacian regularized few-shot learning, in ICML, 2020. I. M. Ziko, J. Dolz, E. Granger, and I. B. Ayed. paper code
  64. TAdaNet: Task-adaptive network for graph-enriched meta-learning, in KDD, 2020. Q. Suo, i. Chou, W. Zhong, and A. Zhang. paper
  65. Concept learners for few-shot learning, in ICLR, 2021. K. Cao, M. Brbic, and J. Leskovec. paper
  66. Reinforced attention for few-shot learning and beyond, in CVPR, 2021. J. Hong, P. Fang, W. Li, T. Zhang, C. Simon, M. Harandi, and L. Petersson. paper
  67. Mutual CRF-GNN for few-shot learning, in CVPR, 2021. S. Tang, D. Chen, L. Bai, K. Liu, Y. Ge, and W. Ouyang. paper
  68. Few-shot classification with feature map reconstruction networks, in CVPR, 2021. D. Wertheimer, L. Tang, and B. Hariharan. paper code
  69. ECKPN: Explicit class knowledge propagation network for transductive few-shot learning, in CVPR, 2021. C. Chen, X. Yang, C. Xu, X. Huang, and Z. Ma. paper
  70. Exploring complementary strengths of invariant and equivariant representations for few-shot learning, in CVPR, 2021. M. N. Rizve, S. Khan, F. S. Khan, and M. Shah. paper
  71. Rethinking class relations: Absolute-relative supervised and unsupervised few-shot learning, in CVPR, 2021. H. Zhang, P. Koniusz, S. Jian, H. Li, and P. H. S. Torr. paper
  72. Unsupervised embedding adaptation via early-stage feature reconstruction for few-shot classification, in ICML, 2021. D. H. Lee, and S. Chung. paper code
  73. Learning a few-shot embedding model with contrastive learning, in AAAI, 2021. C. Liu, Y. Fu, C. Xu, S. Yang, J. Li, C. Wang, and L. Zhang. paper
  74. Looking wider for better adaptive representation in few-shot learning, in AAAI, 2021. J. Zhao, Y. Yang, X. Lin, J. Yang, and L. He. paper
  75. Tailoring embedding function to heterogeneous few-shot tasks by global and local feature adaptors, in AAAI, 2021. S. Lu, H. Ye, and D.-C. Zhan. paper
  76. Knowledge guided metric learning for few-shot text classification, in NAACL-HLT, 2021. D. Sui, Y. Chen, B. Mao, D. Qiu, K. Liu, and J. Zhao. paper
  77. Mixture-based feature space learning for few-shot image classification, in ICCV, 2021. A. Afrasiyabi, J. Lalonde, and C. Gagné. paper
  78. Z-score normalization, hubness, and few-shot learning, in ICCV, 2021. N. Fei, Y. Gao, Z. Lu, and T. Xiang. paper
  79. Relational embedding for few-shot classification, in ICCV, 2021. D. Kang, H. Kwon, J. Min, and M. Cho. paper code
  80. Transductive few-shot classification on the oblique manifold, in ICCV, 2021. G. Qi, H. Yu, Z. Lu, and S. Li. paper code
  81. Curvature generation in curved spaces for few-shot learning, in ICCV, 2021. Z. Gao, Y. Wu, Y. Jia, and M. Harandi. paper
  82. On episodes, prototypical networks, and few-shot learning, in NeurIPS, 2021. S. Laenen, and L. Bertinetto. paper
  83. Few-shot learning as cluster-induced voronoi diagrams: A geometric approach, in ICLR, 2022. C. Ma, Z. Huang, M. Gao, and J. Xu. paper code
  84. Few-shot learning with siamese networks and label tuning, in ACL, 2022. T. Müller, G. Pérez-Torró, and M. Franco-Salvador. paper code
  85. Learning to affiliate: Mutual centralized learning for few-shot classification, in CVPR, 2022. Y. Liu, W. Zhang, C. Xiang, T. Zheng, D. Cai, and X. He. paper
  86. Matching feature sets for few-shot image classification, in CVPR, 2022. A. Afrasiyabi, H. Larochelle, J. Lalonde, and C. Gagné. paper code
  87. Joint distribution matters: Deep Brownian distance covariance for few-shot classification, in CVPR, 2022. J. Xie, F. Long, J. Lv, Q. Wang, and P. Li. paper
  88. CAD: Co-adapting discriminative features for improved few-shot classification, in CVPR, 2022. P. Chikontwe, S. Kim, and S. H. Park. paper
  89. Ranking distance calibration for cross-domain few-shot learning, in CVPR, 2022. P. Li, S. Gong, C. Wang, and Y. Fu. paper
  90. EASE: Unsupervised discriminant subspace learning for transductive few-shot learning, in CVPR, 2022. H. Zhu, and P. Koniusz. paper code
  91. Cross-domain few-shot learning with task-specific adapters, in CVPR, 2022. W. Li, X. Liu, and H. Bilen. paper code

Learning with External Memory

  1. Meta-learning with memory-augmented neural networks, in ICML, 2016. A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap. paper
  2. Few-shot object recognition from machine-labeled web images, in CVPR, 2017. Z. Xu, L. Zhu, and Y. Yang. paper
  3. Learning to remember rare events, in ICLR, 2017. Ł. Kaiser, O. Nachum, A. Roy, and S. Bengio. paper
  4. Meta networks, in ICML, 2017. T. Munkhdalai and H. Yu. paper
  5. Memory matching networks for one-shot image recognition, in CVPR, 2018. Q. Cai, Y. Pan, T. Yao, C. Yan, and T. Mei. paper
  6. Compound memory networks for few-shot video classification, in ECCV, 2018. L. Zhu and Y. Yang. paper
  7. Memory, show the way: Memory based few shot word representation learning, in EMNLP, 2018. J. Sun, S. Wang, and C. Zong. paper
  8. Rapid adaptation with conditionally shifted neurons, in ICML, 2018. T. Munkhdalai, X. Yuan, S. Mehri, and A. Trischler. paper
  9. Adaptive posterior learning: Few-shot learning with a surprise-based memory module, in ICLR, 2019. T. Ramalho and M. Garnelo. paper code
  10. Coloring with limited data: Few-shot colorization via memory augmented networks, in CVPR, 2019. S. Yoo, H. Bahng, S. Chung, J. Lee, J. Chang, and J. Choo. paper
  11. ACMM: Aligned cross-modal memory for few-shot image and sentence matching, in ICCV, 2019. Y. Huang, and L. Wang. paper
  12. Dynamic memory induction networks for few-shot text classification, in ACL, 2020. R. Geng, B. Li, Y. Li, J. Sun, and X. Zhu. paper
  13. Few-shot visual learning with contextual memory and fine-grained calibration, in IJCAI, 2020. Y. Ma, W. Liu, S. Bai, Q. Zhang, A. Liu, W. Chen, and X. Liu. paper
  14. Learn from concepts: Towards the purified memory for few-shot learning, in IJCAI, 2021. X. Liu, X. Tian, S. Lin, Y. Qu, L. Ma, W. Yuan, Z. Zhang, and Y. Xie. paper
  15. Prototype memory and attention mechanisms for few shot image generation, in ICLR, 2022. T. Li, Z. Li, A. Luo, H. Rockwell, A. B. Farimani, and T. S. Lee. paper code
  16. Hierarchical variational memory for few-shot learning across domains, in ICLR, 2022. Y. Du, X. Zhen, L. Shao, and C. G. M. Snoek. paper code
  17. Remember the difference: Cross-domain few-shot semantic segmentation via meta-memory transfer, in CVPR, 2022. W. Wang, L. Duan, Y. Wang, Q. En, J. Fan, and Z. Zhang. paper

Generative Modeling

  1. One-shot learning of object categories, TPAMI, 2006. L. Fei-Fei, R. Fergus, and P. Perona. paper
  2. Learning to learn with compound HD models, in NeurIPS, 2011. A. Torralba, J. B. Tenenbaum, and R. R. Salakhutdinov. paper
  3. One-shot learning with a hierarchical nonparametric bayesian model, in ICML Workshop on Unsupervised and Transfer Learning, 2012. R. Salakhutdinov, J. Tenenbaum, and A. Torralba. paper
  4. Human-level concept learning through probabilistic program induction, Science, 2015. B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. paper
  5. One-shot generalization in deep generative models, in ICML, 2016. D. Rezende, I. Danihelka, K. Gregor, and D. Wierstra. paper
  6. One-shot video object segmentation, in CVPR, 2017. S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and L. Van Gool. paper
  7. Towards a neural statistician, in ICLR, 2017. H. Edwards and A. Storkey. paper
  8. Extending a parser to distant domains using a few dozen partially annotated examples, in ACL, 2018. V. Joshi, M. Peters, and M. Hopkins. paper
  9. MetaGAN: An adversarial approach to few-shot learning, in NeurIPS, 2018. R. Zhang, T. Che, Z. Ghahramani, Y. Bengio, and Y. Song. paper
  10. Few-shot autoregressive density estimation: Towards learning to learn distributions, in ICLR, 2018. S. Reed, Y. Chen, T. Paine, A. van den Oord, S. M. A. Eslami, D. Rezende, O. Vinyals, and N. de Freitas. paper
  11. The variational homoencoder: Learning to learn high capacity generative models from few examples, in UAI, 2018. L. B. Hewitt, M. I. Nye, A. Gane, T. Jaakkola, and J. B. Tenenbaum. paper
  12. Meta-learning probabilistic inference for prediction, in ICLR, 2019. J. Gordon, J. Bronskill, M. Bauer, S. Nowozin, and R. Turner. paper
  13. Variational prototyping-encoder: One-shot learning with prototypical images, in CVPR, 2019. J. Kim, T.-H. Oh, S. Lee, F. Pan, and I. S. Kweon. paper code
  14. Variational few-shot learning, in ICCV, 2019. J. Zhang, C. Zhao, B. Ni, M. Xu, and X. Yang. paper
  15. Infinite mixture prototypes for few-shot learning, in ICML, 2019. K. Allen, E. Shelhamer, H. Shin, and J. Tenenbaum. paper
  16. Dual variational generation for low shot heterogeneous face recognition, in NeurIPS, 2019. C. Fu, X. Wu, Y. Hu, H. Huang, and R. He. paper
  17. Bayesian meta sampling for fast uncertainty adaptation, in ICLR, 2020. Z. Wang, Y. Zhao, P. Yu, R. Zhang, and C. Chen. paper
  18. Empirical Bayes transductive meta-learning with synthetic gradients, in ICLR, 2020. S. X. Hu, P. G. Moreno, Y. Xiao, X. Shen, G. Obozinski, N. D. Lawrence, and A. C. Damianou. paper
  19. Few-shot relation extraction via bayesian meta-learning on relation graphs, in ICML, 2020. M. Qu, T. Gao, L. A. C. Xhonneux, and J. Tang. paper code
  20. Interventional few-shot learning, in NeurIPS, 2020. Z. Yue, H. Zhang, Q. Sun, and X. Hua. paper code
  21. Bayesian few-shot classification with one-vs-each pólya-gamma augmented gaussian processes, in ICLR, 2021. J. Snell, and R. Zemel. paper
  22. Few-shot Bayesian optimization with deep kernel surrogates, in ICLR, 2021. M. Wistuba, and J. Grabocka. paper
  23. Modeling the probabilistic distribution of unlabeled data for one-shot medical image segmentation, in AAAI, 2021. Y. Ding, X. Yu, and Y. Yang. paper code
  24. A hierarchical transformation-discriminating generative model for few shot anomaly detection, in ICCV, 2021. S. Sheynin, S. Benaim, and L. Wolf. paper
  25. Reinforced few-shot acquisition function learning for Bayesian optimization, in NeurIPS, 2021. B. Hsieh, P. Hsieh, and X. Liu. paper
  26. GanOrCon: Are generative models useful for few-shot segmentation?, in CVPR, 2022. O. Saha, Z. Cheng, and S. Maji. paper
  27. Few shot generative model adaption via relaxed spatial structural alignment, in CVPR, 2022. J. Xiao, L. Li, C. Wang, Z. Zha, and Q. Huang. paper

Algorithm

Refining Existing Parameters

  1. Cross-generalization: Learning novel classes from a single example by feature replacement, in CVPR, 2005. E. Bart and S. Ullman. paper
  2. One-shot adaptation of supervised deep convolutional models, in ICLR, 2013. J. Hoffman, E. Tzeng, J. Donahue, Y. Jia, K. Saenko, and T. Darrell. paper
  3. Learning to learn: Model regression networks for easy small sample learning, in ECCV, 2016. Y.-X. Wang and M. Hebert. paper
  4. Learning from small sample sets by combining unsupervised meta-training with CNNs, in NeurIPS, 2016. Y.-X. Wang and M. Hebert. paper
  5. Efficient k-shot learning with regularized deep networks, in AAAI, 2018. D. Yoo, H. Fan, V. N. Boddeti, and K. M. Kitani. paper
  6. CLEAR: Cumulative learning for one-shot one-class image recognition, in CVPR, 2018. J. Kozerawski and M. Turk. paper
  7. Learning structure and strength of CNN filters for small sample size training, in CVPR, 2018. R. Keshari, M. Vatsa, R. Singh, and A. Noore. paper
  8. Dynamic few-shot visual learning without forgetting, in CVPR, 2018. S. Gidaris and N. Komodakis. paper code
  9. Low-shot learning with imprinted weights, in CVPR, 2018. H. Qi, M. Brown, and D. G. Lowe. paper
  10. Neural voice cloning with a few samples, in NeurIPS, 2018. S. Arik, J. Chen, K. Peng, W. Ping, and Y. Zhou. paper
  11. Text classification with few examples using controlled generalization, in NAACL-HLT, 2019. A. Mahabal, J. Baldridge, B. K. Ayan, V. Perot, and D. Roth. paper
  12. Low shot box correction for weakly supervised object detection, in IJCAI, 2019. T. Pan, B. Wang, G. Ding, J. Han, and J. Yong. paper
  13. Diversity with cooperation: Ensemble methods for few-shot classification, in ICCV, 2019. N. Dvornik, C. Schmid, and J. Mairal. paper
  14. Few-shot image recognition with knowledge transfer, in ICCV, 2019. Z. Peng, Z. Li, J. Zhang, Y. Li, G.-J. Qi, and J. Tang. paper
  15. Generating classification weights with gnn denoising autoencoders for few-shot learning, in CVPR, 2019. S. Gidaris, and N. Komodakis. paper code
  16. Dense classification and implanting for few-shot learning, in CVPR, 2019. Y. Lifchitz, Y. Avrithis, S. Picard, and A. Bursuc. paper
  17. Few-shot adaptive faster R-CNN, in CVPR, 2019. T. Wang, X. Zhang, L. Yuan, and J. Feng. paper
  18. TransMatch: A transfer-learning scheme for semi-supervised few-shot learning, in CVPR, 2020. Z. Yu, L. Chen, Z. Cheng, and J. Luo. paper
  19. Learning to select base classes for few-shot classification, in CVPR, 2020. L. Zhou, P. Cui, X. Jia, S. Yang, and Q. Tian. paper
  20. Few-shot NLG with pre-trained language model, in ACL, 2020. Z. Chen, H. Eavani, W. Chen, Y. Liu, and W. Y. Wang. paper code
  21. Span-ConveRT: Few-shot span extraction for dialog with pretrained conversational representations, in ACL, 2020. S. Coope, T. Farghly, D. Gerz, I. Vulic, and M. Henderson. paper
  22. Structural supervision improves few-shot learning and syntactic generalization in neural language models, in EMNLP, 2020. E. Wilcox, P. Qian, R. Futrell, R. Kohita, R. Levy, and M. Ballesteros. paper code
  23. A baseline for few-shot image classification, in ICLR, 2020. G. S. Dhillon, P. Chaudhari, A. Ravichandran, and S. Soatto. paper
  24. Cross-domain few-shot classification via learned feature-wise transformation, in ICLR, 2020. H. Tseng, H. Lee, J. Huang, and M. Yang. paper code
  25. Graph few-shot learning via knowledge transfer, in AAAI, 2020. H. Yao, C. Zhang, Y. Wei, M. Jiang, S. Wang, J. Huang, N. V. Chawla, and Z. Li. paper
  26. Knowledge graph transfer network for few-shot recognition, in AAAI, 2020. R. Chen, T. Chen, X. Hui, H. Wu, G. Li, and L. Lin. paper
  27. Context-Transformer: Tackling object confusion for few-shot detection, in AAAI, 2020. Z. Yang, Y. Wang, X. Chen, J. Liu, and Y. Qiao. paper
  28. A broader study of cross-domain few-shot learning, in ECCV, 2020. Y. Guo, N. C. Codella, L. Karlinsky, J. V. Codella, J. R. Smith, K. Saenko, T. Rosing, and R. Feris. paper code
  29. Selecting relevant features from a multi-domain representation for few-shot classification, in ECCV, 2020. N. Dvornik, C. Schmid, and J. Mairal. paper code
  30. Prototype completion with primitive knowledge for few-shot learning, in CVPR, 2021. B. Zhang, X. Li, Y. Ye, Z. Huang, and L. Zhang. paper code
  31. Partial is better than all: Revisiting fine-tuning strategy for few-shot learning, in AAAI, 2021. Z. Shen, Z. Liu, J. Qin, M. Savvides, and K.-T. Cheng. paper
  32. PTN: A poisson transfer network for semi-supervised few-shot learning, in AAAI, 2021. H. Huang, J. Zhang, J. Zhang, Q. Wu, and C. Xu. paper
  33. A universal representation transformer layer for few-shot image classification, in ICLR, 2021. L. Liu, W. L. Hamilton, G. Long, J. Jiang, and H. Larochelle. paper
  34. Making pre-trained language models better few-shot learners, in ACL-IJCNLP, 2021. T. Gao, A. Fisch, and D. Chen. paper code
  35. Self-supervised network evolution for few-shot classification, in IJCAI, 2021. X. Tang, Z. Teng, B. Zhang, and J. Fan. paper
  36. Calibrate before use: Improving few-shot performance of language models, in ICML, 2021. Z. Zhao, E. Wallace, S. Feng, D. Klein, and S. Singh. paper code
  37. Language models are few-shot learners, in NeurIPS, 2020. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. paper
  38. It’s not just size that matters: Small language models are also few-shot learners, in NAACL-HLT, 2021. T. Schick, and H. Schütze. paper code
  39. Self-training improves pre-training for few-shot learning in task-oriented dialog systems, in EMNLP, 2021. F. Mi, W. Zhou, L. Kong, F. Cai, M. Huang, and B. Faltings. paper
  40. Few-shot intent detection via contrastive pre-training and fine-tuning, in EMNLP, 2021. J. Zhang, T. Bui, S. Yoon, X. Chen, Z. Liu, C. Xia, Q. H. Tran, W. Chang, and P. S. Yu. paper code
  41. Avoiding inference heuristics in few-shot prompt-based finetuning, in EMNLP, 2021. P. A. Utama, N. S. Moosavi, V. Sanh, and I. Gurevych. paper code
  42. Constrained language models yield few-shot semantic parsers, in EMNLP, 2021. R. Shin, C. H. Lin, S. Thomson, C. Chen, S. Roy, E. A. Platanios, A. Pauls, D. Klein, J. Eisner, and B. V. Durme. paper code
  43. Revisiting self-training for few-shot learning of language model, in EMNLP, 2021. Y. Chen, Y. Zhang, C. Zhang, G. Lee, R. Cheng, and H. Li. paper code
  44. Language models are few-shot butlers, in EMNLP, 2021. V. Micheli, and F. Fleuret. paper code
  45. FewshotQA: A simple framework for few-shot learning of question answering tasks using pre-trained text-to-text models, in EMNLP, 2021. R. Chada, and P. Natarajan. paper
  46. TransPrompt: Towards an automatic transferable prompting framework for few-shot text classification, in EMNLP, 2021. C. Wang, J. Wang, M. Qiu, J. Huang, and M. Gao. paper
  47. Meta distant transfer learning for pre-trained language models, in EMNLP, 2021. C. Wang, H. Pan, M. Qiu, J. Huang, F. Yang, and Y. Zhang. paper
  48. STraTA: Self-training with task augmentation for better few-shot learning, in EMNLP, 2021. T. Vu, M. Luong, Q. V. Le, G. Simon, and M. Iyyer. paper code
  49. Few-shot image classification: Just use a library of pre-trained feature extractors and a simple classifier, in ICCV, 2021. A. Chowdhury, M. Jiang, S. Chaudhuri, and C. Jermaine. paper code
  50. On the importance of distractors for few-shot classification, in ICCV, 2021. R. Das, Y. Wang, and J. M. F. Moura. paper code
  51. A multi-mode modulator for multi-domain few-shot classification, in ICCV, 2021. Y. Liu, J. Lee, L. Zhu, L. Chen, H. Shi, and Y. Yang. paper
  52. Universal representation learning from multiple domains for few-shot classification, in ICCV, 2021. W. Li, X. Liu, and H. Bilen. paper code
  53. Boosting the generalization capability in cross-domain few-shot learning via noise-enhanced supervised autoencoder, in ICCV, 2021. H. Liang, Q. Zhang, P. Dai, and J. Lu. paper
  54. How fine-tuning allows for effective meta-learning, in NeurIPS, 2021. K. Chua, Q. Lei, and J. D. Lee. paper
  55. Multimodal few-shot learning with frozen language models, in NeurIPS, 2021. M. Tsimpoukelli, J. Menick, S. Cabi, S. M. A. Eslami, O. Vinyals, and F. Hill. paper
  56. Grad2Task: Improved few-shot text classification using gradients for task representation, in NeurIPS, 2021. J. Wang, K. Wang, F. Rudzicz, and M. Brudno. paper
  57. True few-shot learning with language models, in NeurIPS, 2021. E. Perez, D. Kiela, and K. Cho. paper
  58. POODLE: Improving few-shot learning via penalizing out-of-distribution samples, in NeurIPS, 2021. D. Le, K. Nguyen, Q. Tran, R. Nguyen, and B. Hua. paper
  59. TOHAN: A one-step approach towards few-shot hypothesis adaptation, in NeurIPS, 2021. H. Chi, F. Liu, W. Yang, L. Lan, T. Liu, B. Han, W. Cheung, and J. Kwok. paper
  60. Task affinity with maximum bipartite matching in few-shot learning, in ICLR, 2022. C. P. Le, J. Dong, M. Soltani, and V. Tarokh. paper
  61. Differentiable prompt makes pre-trained language models better few-shot learners, in ICLR, 2022. N. Zhang, L. Li, X. Chen, S. Deng, Z. Bi, C. Tan, F. Huang, and H. Chen. paper code
  62. ConFeSS: A framework for single source cross-domain few-shot learning, in ICLR, 2022. D. Das, S. Yun, and F. Porikli. paper
  63. Switch to generalize: Domain-switch learning for cross-domain few-shot classification, in ICLR, 2022. Z. Hu, Y. Sun, and Y. Yang. paper
  64. LM-BFF-MS: Improving few-shot fine-tuning of language models based on multiple soft demonstration memory, in ACL, 2022. E. Park, D. H. Jeon, S. Kim, I. Kang, and S. Na. paper code
  65. Meta-learning via language model in-context tuning, in ACL, 2022. Y. Chen, R. Zhong, S. Zha, G. Karypis, and H. He. paper code
  66. Few-shot tabular data enrichment using fine-tuned transformer architectures, in ACL, 2022. A. Harari, and G. Katz. paper
  67. Noisy channel language model prompting for few-shot text classification, in ACL, 2022. S. Min, M. Lewis, H. Hajishirzi, and L. Zettlemoyer. paper code
  68. Prompt for extraction? PAIE: Prompting argument interaction for event argument extraction, in ACL, 2022. Y. Ma, Z. Wang, Y. Cao, M. Li, M. Chen, K. Wang, and J. Shao. paper code
  69. Are prompt-based models clueless?, in ACL, 2022. P. Kavumba, R. Takahashi, and Y. Oda. paper
  70. Prototypical verbalizer for prompt-based few-shot tuning, in ACL, 2022. G. Cui, S. Hu, N. Ding, L. Huang, and Z. Liu. paper code
  71. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity, in ACL, 2022. Y. Lu, M. Bartolo, A. Moore, S. Riedel, and P. Stenetorp. paper
  72. PPT: Pre-trained prompt tuning for few-shot learning, in ACL, 2022. Y. Gu, X. Han, Z. Liu, and M. Huang. paper code
  73. ASCM: An answer space clustered prompting method without answer engineering, in Findings of ACL, 2022. Z. Wang, Y. Yang, Z. Xi, B. Ma, L. Wang, R. Dong, and A. Anwar. paper code
  74. Exploiting language model prompts using similarity measures: A case study on the word-in-context task, in ACL, 2022. M. Tabasi, K. Rezaee, and M. T. Pilehvar. paper
  75. P-Tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks, in ACL, 2022. X. Liu, K. Ji, Y. Fu, W. Tam, Z. Du, Z. Yang, and J. Tang. paper
  76. Cutting down on prompts and parameters: Simple few-shot learning with language models, in Findings of ACL, 2022. R. L. L. IV, I. Balazevic, E. Wallace, F. Petroni, S. Singh, and S. Riedel. paper code
  77. Prompt-free and efficient few-shot learning with language models, in ACL, 2022. R. K. Mahabadi, L. Zettlemoyer, J. Henderson, L. Mathias, M. Saeidi, V. Stoyanov, and M. Yazdani. paper code
  78. Pre-training to match for unified low-shot relation extraction, in ACL, 2022. F. Liu, H. Lin, X. Han, B. Cao, and L. Sun. paper code
  79. Dual context-guided continuous prompt tuning for few-shot learning, in Findings of ACL, 2022. J. Zhou, L. Tian, H. Yu, Z. Xiao, H. Su, and J. Zhou. paper
  80. Cluster & tune: Boost cold start performance in text classification, in ACL, 2022. E. Shnarch, A. Gera, A. Halfon, L. Dankin, L. Choshen, R. Aharonov, and N. Slonim. paper code
  81. Pushing the limits of simple pipelines for few-shot learning: External data and fine-tuning make a difference, in CVPR, 2022. S. X. Hu, D. Li, J. Stühmer, M. Kim, and T. M. Hospedales. paper code

Refining Meta-learned Parameters

  1. Model-agnostic meta-learning for fast adaptation of deep networks, in ICML, 2017. C. Finn, P. Abbeel, and S. Levine. paper
  2. Bayesian model-agnostic meta-learning, in NeurIPS, 2018. J. Yoon, T. Kim, O. Dia, S. Kim, Y. Bengio, and S. Ahn. paper
  3. Probabilistic model-agnostic meta-learning, in NeurIPS, 2018. C. Finn, K. Xu, and S. Levine. paper
  4. Gradient-based meta-learning with learned layerwise metric and subspace, in ICML, 2018. Y. Lee and S. Choi. paper
  5. Recasting gradient-based meta-learning as hierarchical Bayes, in ICLR, 2018. E. Grant, C. Finn, S. Levine, T. Darrell, and T. Griffiths. paper
  6. Few-shot human motion prediction via meta-learning, in ECCV, 2018. L.-Y. Gui, Y.-X. Wang, D. Ramanan, and J. Moura. paper
  7. The effects of negative adaptation in model-agnostic meta-learning, arXiv preprint, 2018. T. Deleu and Y. Bengio. paper
  8. Unsupervised meta-learning for few-shot image classification, in NeurIPS, 2019. S. Khodadadeh, L. Bölöni, and M. Shah. paper
  9. Amortized bayesian meta-learning, in ICLR, 2019. S. Ravi and A. Beatson. paper
  10. Meta-learning with latent embedding optimization, in ICLR, 2019. A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell. paper code
  11. Meta relational learning for few-shot link prediction in knowledge graphs, in EMNLP-IJCNLP, 2019. M. Chen, W. Zhang, W. Zhang, Q. Chen, and H. Chen. paper
  12. Adapting meta knowledge graph information for multi-hop reasoning over few-shot relations, in EMNLP-IJCNLP, 2019. X. Lv, Y. Gu, X. Han, L. Hou, J. Li, and Z. Liu. paper
  13. LGM-Net: Learning to generate matching networks for few-shot learning, in ICML, 2019. H. Li, W. Dong, X. Mei, C. Ma, F. Huang, and B.-G. Hu. paper code
  14. Meta R-CNN: Towards general solver for instance-level low-shot learning, in ICCV, 2019. X. Yan, Z. Chen, A. Xu, X. Wang, X. Liang, and L. Lin. paper
  15. Task agnostic meta-learning for few-shot learning, in CVPR, 2019. M. A. Jamal, and G.-J. Qi. paper
  16. Meta-transfer learning for few-shot learning, in CVPR, 2019. Q. Sun, Y. Liu, T.-S. Chua, and B. Schiele. paper code
  17. Meta-learning of neural architectures for few-shot learning, in CVPR, 2020. T. Elsken, B. Staffler, J. H. Metzen, and F. Hutter. paper
  18. Attentive weights generation for few shot learning via information maximization, in CVPR, 2020. Y. Guo, and N.-M. Cheung. paper
  19. Few-shot open-set recognition using meta-learning, in CVPR, 2020. B. Liu, H. Kang, H. Li, G. Hua, and N. Vasconcelos. paper
  20. Incremental few-shot object detection, in CVPR, 2020. J.-M. Perez-Rua, X. Zhu, T. M. Hospedales, and T. Xiang. paper
  21. Automated relational meta-learning, in ICLR, 2020. H. Yao, X. Wu, Z. Tao, Y. Li, B. Ding, R. Li, and Z. Li. paper
  22. Meta-learning with warped gradient descent, in ICLR, 2020. S. Flennerhag, A. A. Rusu, R. Pascanu, F. Visin, H. Yin, and R. Hadsell. paper
  23. Meta-learning without memorization, in ICLR, 2020. M. Yin, G. Tucker, M. Zhou, S. Levine, and C. Finn. paper
  24. ES-MAML: Simple Hessian-free meta learning, in ICLR, 2020. X. Song, W. Gao, Y. Yang, K. Choromanski, A. Pacchiano, and Y. Tang. paper
  25. Self-supervised tuning for few-shot segmentation, in IJCAI, 2020. K. Zhu, W. Zhai, and Y. Cao. paper
  26. Multi-attention meta learning for few-shot fine-grained image recognition, in IJCAI, 2020. Y. Zhu, C. Liu, and S. Jiang. paper
  27. An ensemble of epoch-wise empirical Bayes for few-shot learning, in ECCV, 2020. Y. Liu, B. Schiele, and Q. Sun. paper code
  28. Incremental few-shot meta-learning via indirect discriminant alignment, in ECCV, 2020. Q. Liu, O. Majumder, A. Achille, A. Ravichandran, R. Bhotika, and S. Soatto. paper
  29. Model-agnostic boundary-adversarial sampling for test-time generalization in few-shot learning, in ECCV, 2020. J. Kim, H. Kim, and G. Kim. paper code
  30. Bayesian meta-learning for the few-shot setting via deep kernels, in NeurIPS, 2020. M. Patacchiola, J. Turner, E. J. Crowley, M. O’Boyle, and A. J. Storkey. paper code
  31. OOD-MAML: Meta-learning for few-shot out-of-distribution detection and classification, in NeurIPS, 2020. T. Jeong, and H. Kim. paper code
  32. Unraveling meta-learning: Understanding feature representations for few-shot tasks, in ICML, 2020. M. Goldblum, S. Reich, L. Fowl, R. Ni, V. Cherepanova, and T. Goldstein. paper code
  33. Node classification on graphs with few-shot novel labels via meta transformed network embedding, in NeurIPS, 2020. L. Lan, P. Wang, X. Du, K. Song, J. Tao, and X. Guan. paper
  34. Adversarially robust few-shot learning: A meta-learning approach, in NeurIPS, 2020. M. Goldblum, L. Fowl, and T. Goldstein. paper code
  35. BOIL: Towards representation change for few-shot learning, in ICLR, 2021. J. Oh, H. Yoo, C. Kim, and S. Yun. paper code
  36. Few-shot open-set recognition by transformation consistency, in CVPR, 2021. M. Jeong, S. Choi, and C. Kim. paper
  37. Improving generalization in meta-learning via task augmentation, in ICML, 2021. H. Yao, L. Huang, L. Zhang, Y. Wei, L. Tian, J. Zou, J. Huang, and Z. Li. paper
  38. A representation learning perspective on the importance of train-validation splitting in meta-learning, in ICML, 2021. N. Saunshi, A. Gupta, and W. Hu. paper code
  39. Data augmentation for meta-learning, in ICML, 2021. R. Ni, M. Goldblum, A. Sharaf, K. Kong, and T. Goldstein. paper code
  40. Task cooperation for semi-supervised few-shot learning, in AAAI, 2021. H. Ye, X. Li, and D.-C. Zhan. paper
  41. Conditional self-supervised learning for few-shot classification, in IJCAI, 2021. Y. An, H. Xue, X. Zhao, and L. Zhang. paper
  42. Cross-domain few-shot classification via adversarial task augmentation, in IJCAI, 2021. H. Wang, and Z.-H. Deng. paper code
  43. DReCa: A general task augmentation strategy for few-shot natural language inference, in NAACL-HLT, 2021. S. Murty, T. Hashimoto, and C. D. Manning. paper
  44. MetaXL: Meta representation transformation for low-resource cross-lingual learning, in NAACL-HLT, 2021. M. Xia, G. Zheng, S. Mukherjee, M. Shokouhi, G. Neubig, and A. H. Awadallah. paper code
  45. Meta-learning with task-adaptive loss function for few-shot learning, in ICCV, 2021. S. Baik, J. Choi, H. Kim, D. Cho, J. Min, and K. M. Lee. paper code
  46. Meta-Baseline: Exploring simple meta-learning for few-shot learning, in ICCV, 2021. Y. Chen, Z. Liu, H. Xu, T. Darrell, and X. Wang. paper
  47. A lazy approach to long-horizon gradient-based meta-learning, in ICCV, 2021. M. A. Jamal, L. Wang, and B. Gong. paper
  48. Task-aware part mining network for few-shot learning, in ICCV, 2021. J. Wu, T. Zhang, Y. Zhang, and F. Wu. paper
  49. Binocular mutual learning for improving few-shot classification, in ICCV, 2021. Z. Zhou, X. Qiu, J. Xie, J. Wu, and C. Zhang. paper code
  50. Meta-learning with an adaptive task scheduler, in NeurIPS, 2021. H. Yao, Y. Wang, Y. Wei, P. Zhao, M. Mahdavi, D. Lian, and C. Finn. paper
  51. Memory efficient meta-learning with large images, in NeurIPS, 2021. J. Bronskill, D. Massiceti, M. Patacchiola, K. Hofmann, S. Nowozin, and R. Turner. paper
  52. EvoGrad: Efficient gradient-based meta-learning and hyperparameter optimization, in NeurIPS, 2021. O. Bohdal, Y. Yang, and T. Hospedales. paper
  53. Towards enabling meta-learning from target models, in NeurIPS, 2021. S. Lu, H. Ye, L. Gan, and D. Zhan. paper
  54. The role of global labels in few-shot classification and how to infer them, in NeurIPS, 2021. R. Wang, M. Pontil, and C. Ciliberto. paper
  55. How to train your MAML to excel in few-shot classification, in ICLR, 2022. H. Ye, and W. Chao. paper code
  56. Meta-learning with fewer tasks through task interpolation, in ICLR, 2022. H. Yao, L. Zhang, and C. Finn. paper code
  57. Continuous-time meta-learning with forward mode differentiation, in ICLR, 2022. T. Deleu, D. Kanaa, L. Feng, G. Kerg, Y. Bengio, G. Lajoie, and P. Bacon. paper
  58. Bootstrapped meta-learning, in ICLR, 2022. S. Flennerhag, Y. Schroecker, T. Zahavy, H. v. Hasselt, D. Silver, and S. Singh. paper
  59. Learning prototype-oriented set representations for meta-learning, in ICLR, 2022. D. d. Guo, L. Tian, M. Zhang, M. Zhou, and H. Zha. paper
  60. Dynamic kernel selection for improved generalization and memory efficiency in meta-learning, in CVPR, 2022. A. Chavan, R. Tiwari, U. Bamba, and D. K. Gupta. paper code
  61. What matters for meta-learning vision regression tasks?, in CVPR, 2022. N. Gao, H. Ziesche, N. A. Vien, M. Volpp, and G. Neumann. paper code
  62. Multidimensional belief quantification for label-efficient meta-learning, in CVPR, 2022. D. S. Pandey, and Q. Yu. paper

Learning Search Steps

  1. Optimization as a model for few-shot learning, in ICLR, 2017. S. Ravi and H. Larochelle. paper code
  2. Meta Navigator: Search for a good adaptation policy for few-shot learning, in ICCV, 2021. C. Zhang, H. Ding, G. Lin, R. Li, C. Wang, and C. Shen. paper

Applications

Computer Vision

  1. Learning robust visual-semantic embeddings, in CVPR, 2017. Y.-H. Tsai, L.-K. Huang, and R. Salakhutdinov. paper
  2. One-shot action localization by learning sequence matching network, in CVPR, 2018. H. Yang, X. He, and F. Porikli. paper
  3. Incremental few-shot learning for pedestrian attribute recognition, in EMNLP, 2018. L. Xiang, X. Jin, G. Ding, J. Han, and L. Li. paper
  4. Few-shot video-to-video synthesis, in NeurIPS, 2019. T.-C. Wang, M.-Y. Liu, A. Tao, G. Liu, J. Kautz, and B. Catanzaro. paper code
  5. Few-shot object detection via feature reweighting, in ICCV, 2019. B. Kang, Z. Liu, X. Wang, F. Yu, J. Feng, and T. Darrell. paper code
  6. Few-shot unsupervised image-to-image translation, in ICCV, 2019. M.-Y. Liu, X. Huang, A. Mallya, T. Karras, T. Aila, J. Lehtinen, and J. Kautz. paper code
  7. Feature weighting and boosting for few-shot segmentation, in ICCV, 2019. K. Nguyen, and S. Todorovic. paper
  8. Few-shot adaptive gaze estimation, in ICCV, 2019. S. Park, S. D. Mello, P. Molchanov, U. Iqbal, O. Hilliges, and J. Kautz. paper
  9. AMP: Adaptive masked proxies for few-shot segmentation, in ICCV, 2019. M. Siam, B. N. Oreshkin, and M. Jagersand. paper code
  10. Few-shot generalization for single-image 3D reconstruction via priors, in ICCV, 2019. B. Wallace, and B. Hariharan. paper
  11. Few-shot adversarial learning of realistic neural talking head models, in ICCV, 2019. E. Zakharov, A. Shysheya, E. Burkov, and V. Lempitsky. paper code
  12. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation, in ICCV, 2019. C. Zhang, G. Lin, F. Liu, J. Guo, Q. Wu, and R. Yao. paper
  13. Time-conditioned action anticipation in one shot, in CVPR, 2019. Q. Ke, M. Fritz, and B. Schiele. paper
  14. Few-shot learning with localization in realistic settings, in CVPR, 2019. D. Wertheimer, and B. Hariharan. paper code
  15. Improving few-shot user-specific gaze adaptation via gaze redirection synthesis, in CVPR, 2019. Y. Yu, G. Liu, and J.-M. Odobez. paper
  16. CANet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning, in CVPR, 2019. C. Zhang, G. Lin, F. Liu, R. Yao, and C. Shen. paper code
  17. Multi-level Semantic Feature Augmentation for One-shot Learning, in TIP, 2019. Z. Chen, Y. Fu, Y. Zhang, Y.-G. Jiang, X. Xue, and L. Sigal. paper code
  18. Few-shot pill recognition, in CVPR, 2020. S. Ling, A. Pastor, J. Li, Z. Che, J. Wang, J. Kim, and P. L. Callet. paper
  19. LT-Net: Label transfer by learning reversible voxel-wise correspondence for one-shot medical image segmentation, in CVPR, 2020. S. Wang, S. Cao, D. Wei, R. Wang, K. Ma, L. Wang, D. Meng, and Y. Zheng. paper
  20. 3FabRec: Fast few-shot face alignment by reconstruction, in CVPR, 2020. B. Browatzki, and C. Wallraven. paper
  21. Few-shot video classification via temporal alignment, in CVPR, 2020. K. Cao, J. Ji, Z. Cao, C.-Y. Chang, J. C. Niebles. paper
  22. One-shot adversarial attacks on visual tracking with dual attention, in CVPR, 2020. X. Chen, X. Yan, F. Zheng, Y. Jiang, S.-T. Xia, Y. Zhao, and R. Ji. paper
  23. FGN: Fully guided network for few-shot instance segmentation, in CVPR, 2020. Z. Fan, J.-G. Yu, Z. Liang, J. Ou, C. Gao, G.-S. Xia, and Y. Li. paper
  24. CRNet: Cross-reference networks for few-shot segmentation, in CVPR, 2020. W. Liu, C. Zhang, G. Lin, and F. Liu. paper
  25. Revisiting pose-normalization for fine-grained few-shot recognition, in CVPR, 2020. L. Tang, D. Wertheimer, and B. Hariharan. paper
  26. Few-shot learning of part-specific probability space for 3D shape segmentation, in CVPR, 2020. L. Wang, X. Li, and Y. Fang. paper
  27. Semi-supervised learning for few-shot image-to-image translation, in CVPR, 2020. Y. Wang, S. Khan, A. Gonzalez-Garcia, J. van de Weijer, and F. S. Khan. paper
  28. Multi-domain learning for accurate and few-shot color constancy, in CVPR, 2020. J. Xiao, S. Gu, and L. Zhang. paper
  29. One-shot domain adaptation for face generation, in CVPR, 2020. C. Yang, and S.-N. Lim. paper
  30. MetaPix: Few-shot video retargeting, in ICLR, 2020. J. Lee, D. Ramanan, and R. Girdhar. paper
  31. Few-shot human motion prediction via learning novel motion dynamics, in IJCAI, 2020. C. Zang, M. Pei, and Y. Kong. paper
  32. Shaping visual representations with language for few-shot classification, in ACL, 2020. J. Mu, P. Liang, and N. D. Goodman. paper
  33. MarioNETte: Few-shot face reenactment preserving identity of unseen targets, in AAAI, 2020. S. Ha, M. Kersner, B. Kim, S. Seo, and D. Kim. paper
  34. One-shot learning for long-tail visual relation detection, in AAAI, 2020. W. Wang, M. Wang, S. Wang, G. Long, L. Yao, G. Qi, and Y. Chen. paper code
  35. Differentiable meta-learning model for few-shot semantic segmentation, in AAAI, 2020. P. Tian, Z. Wu, L. Qi, L. Wang, Y. Shi, and Y. Gao. paper
  36. Part-aware prototype network for few-shot semantic segmentation, in ECCV, 2020. Y. Liu, X. Zhang, S. Zhang, and X. He. paper code
  37. Prototype mixture models for few-shot semantic segmentation, in ECCV, 2020. B. Yang, C. Liu, B. Li, J. Jiao, and Q. Ye. paper code
  38. Self-supervision with superpixels: Training few-shot medical image segmentation without annotation, in ECCV, 2020. C. Ouyang, C. Biffi, C. Chen, T. Kart, H. Qiu, and D. Rueckert. paper code
  39. Few-shot action recognition with permutation-invariant attention, in ECCV, 2020. H. Zhang, L. Zhang, X. Qi, H. Li, P. H. S. Torr, and P. Koniusz. paper
  40. Few-shot compositional font generation with dual memory, in ECCV, 2020. J. Cha, S. Chun, G. Lee, B. Lee, S. Kim, and H. Lee. paper code
  41. Few-shot object detection and viewpoint estimation for objects in the wild, in ECCV, 2020. Y. Xiao, and R. Marlet. paper
  42. Few-shot scene-adaptive anomaly detection, in ECCV, 2020. Y. Lu, F. Yu, M. K. K. Reddy, and Y. Wang. paper code
  43. Few-shot semantic segmentation with democratic attention networks, in ECCV, 2020. H. Wang, X. Zhang, Y. Hu, Y. Yang, X. Cao, and X. Zhen. paper
  44. Few-shot single-view 3-D object reconstruction with compositional priors, in ECCV, 2020. M. Michalkiewicz, S. Parisot, S. Tsogkas, M. Baktashmotlagh, A. Eriksson, and E. Belilovsky. paper
  45. COCO-FUNIT: Few-shot unsupervised image translation with a content conditioned style encoder, in ECCV, 2020. K. Saito, K. Saenko, and M. Liu. paper code
  46. Deep complementary joint model for complex scene registration and few-shot segmentation on medical images, in ECCV, 2020. Y. He, T. Li, G. Yang, Y. Kong, Y. Chen, H. Shu, J. Coatrieux, J. Dillenseger, and S. Li. paper
  47. Multi-scale positive sample refinement for few-shot object detection, in ECCV, 2020. J. Wu, S. Liu, D. Huang, and Y. Wang. paper code
  48. Large-scale few-shot learning via multi-modal knowledge discovery, in ECCV, 2020. S. Wang, J. Yue, J. Liu, Q. Tian, and M. Wang. paper
  49. Graph convolutional networks for learning with few clean and many noisy labels, in ECCV, 2020. A. Iscen, G. Tolias, Y. Avrithis, O. Chum, and C. Schmid. paper
  50. Self-supervised few-shot learning on point clouds, in NeurIPS, 2020. C. Sharma, and M. Kaul. paper code
  51. Restoring negative information in few-shot object detection, in NeurIPS, 2020. Y. Yang, F. Wei, M. Shi, and G. Li. paper code
  52. Few-shot image generation with elastic weight consolidation, in NeurIPS, 2020. Y. Li, R. Zhang, J. Lu, and E. Shechtman. paper
  53. Few-shot visual reasoning with meta-analogical contrastive learning, in NeurIPS, 2020. Y. Kim, J. Shin, E. Yang, and S. J. Hwang. paper
  54. CrossTransformers: spatially-aware few-shot transfer, in NeurIPS, 2020. C. Doersch, A. Gupta, and A. Zisserman. paper
  55. Make one-shot video object segmentation efficient again, in NeurIPS, 2020. T. Meinhardt, and L. Leal-Taixé. paper code
  56. Frustratingly simple few-shot object detection, in ICML, 2020. X. Wang, T. E. Huang, J. Gonzalez, T. Darrell, and F. Yu. paper code
  57. Adversarial style mining for one-shot unsupervised domain adaptation, in NeurIPS, 2020. Y. Luo, P. Liu, T. Guan, J. Yu, and Y. Yang. paper code
  58. Disentangling 3D prototypical networks for few-shot concept learning, in ICLR, 2021. M. Prabhudesai, S. Lal, D. Patil, H. Tung, A. W. Harley, and K. Fragkiadaki. paper
  59. Learning normal dynamics in videos with meta prototype network, in CVPR, 2021. H. Lv, C. Chen, Z. Cui, C. Xu, Y. Li, and J. Yang. paper code
  60. Learning dynamic alignment via meta-filter for few-shot learning, in CVPR, 2021. C. Xu, Y. Fu, C. Liu, C. Wang, J. Li, F. Huang, L. Zhang, and X. Xue. paper
  61. Delving deep into many-to-many attention for few-shot video object segmentation, in CVPR, 2021. H. Chen, H. Wu, N. Zhao, S. Ren, and S. He. paper code
  62. Adaptive prototype learning and allocation for few-shot segmentation, in CVPR, 2021. G. Li, V. Jampani, L. Sevilla-Lara, D. Sun, J. Kim, and J. Kim. paper code
  63. FAPIS: A few-shot anchor-free part-based instance segmenter, in CVPR, 2021. K. Nguyen, and S. Todorovic. paper
  64. FSCE: Few-shot object detection via contrastive proposal encoding, in CVPR, 2021. B. Sun, B. Li, S. Cai, Y. Yuan, and C. Zhang. paper code
  65. Few-shot 3D point cloud semantic segmentation, in CVPR, 2021. N. Zhao, T. Chua, and G. H. Lee. paper code
  66. Generalized few-shot object detection without forgetting, in CVPR, 2021. Z. Fan, Y. Ma, Z. Li, and J. Sun. paper
  67. Few-shot human motion transfer by personalized geometry and texture modeling, in CVPR, 2021. Z. Huang, X. Han, J. Xu, and T. Zhang. paper code
  68. Labeled from unlabeled: Exploiting unlabeled data for few-shot deep HDR deghosting, in CVPR, 2021. K. R. Prabhakar, G. Senthil, S. Agrawal, R. V. Babu, and R. K. S. S. Gorthi. paper
  69. Few-shot transformation of common actions into time and space, in CVPR, 2021. P. Yang, P. Mettes, and C. G. M. Snoek. paper code
  70. Temporal-relational CrossTransformers for few-shot action recognition, in CVPR, 2021. T. Perrett, A. Masullo, T. Burghardt, M. Mirmehdi, and D. Damen. paper
  71. pixelNeRF: Neural radiance fields from one or few images, in CVPR, 2021. A. Yu, V. Ye, M. Tancik, and A. Kanazawa. paper code
  72. Hallucination improves few-shot object detection, in CVPR, 2021. W. Zhang, and Y. Wang. paper
  73. Few-shot object detection via classification refinement and distractor retreatment, in CVPR, 2021. Y. Li, H. Zhu, Y. Cheng, W. Wang, C. S. Teo, C. Xiang, P. Vadakkepat, and T. H. Lee. paper
  74. Dense relation distillation with context-aware aggregation for few-shot object detection, in CVPR, 2021. H. Hu, S. Bai, A. Li, J. Cui, and L. Wang. paper code
  75. Few-shot segmentation without meta-learning: A good transductive inference is all you need? , in CVPR, 2021. M. Boudiaf, H. Kervadec, Z. I. Masud, P. Piantanida, I. B. Ayed, and J. Dolz. paper code
  76. Few-shot image generation via cross-domain correspondence, in CVPR, 2021. U. Ojha, Y. Li, J. Lu, A. A. Efros, Y. J. Lee, E. Shechtman, and R. Zhang. paper
  77. Self-guided and cross-guided learning for few-shot segmentation, in CVPR, 2021. B. Zhang, J. Xiao, and T. Qin. paper code
  78. Anti-aliasing semantic reconstruction for few-shot semantic segmentation, in CVPR, 2021. B. Liu, Y. Ding, J. Jiao, X. Ji, and Q. Ye. paper
  79. Beyond max-margin: Class margin equilibrium for few-shot object detection, in CVPR, 2021. B. Li, B. Yang, C. Liu, F. Liu, R. Ji, and Q. Ye. paper code
  80. Incremental few-shot instance segmentation, in CVPR, 2021. D. A. Ganea, B. Boom, and R. Poppe. paper code
  81. Scale-aware graph neural network for few-shot semantic segmentation, in CVPR, 2021. G. Xie, J. Liu, H. Xiong, and L. Shao. paper
  82. Semantic relation reasoning for shot-stable few-shot object detection, in CVPR, 2021. C. Zhu, F. Chen, U. Ahmed, Z. Shen, and M. Savvides. paper
  83. Accurate few-shot object detection with support-query mutual guidance and hybrid loss, in CVPR, 2021. L. Zhang, S. Zhou, J. Guan, and J. Zhang. paper
  84. Transformation invariant few-shot object detection, in CVPR, 2021. A. Li, and Z. Li. paper
  85. MetaHTR: Towards writer-adaptive handwritten text recognition, in CVPR, 2021. A. K. Bhunia, S. Ghose, A. Kumar, P. N. Chowdhury, A. Sain, and Y. Song. paper
  86. What if we only use real datasets for scene text recognition? Toward scene text recognition with fewer labels, in CVPR, 2021. J. Baek, Y. Matsui, and K. Aizawa. paper code
  87. Few-shot font generation with localized style representations and factorization, in AAAI, 2021. S. Park, S. Chun, J. Cha, B. Lee, and H. Shim. paper code
  88. Attributes-guided and pure-visual attention alignment for few-shot recognition, in AAAI, 2021. S. Huang, M. Zhang, Y. Kang, and D. Wang. paper code
  89. One-shot face reenactment using appearance adaptive normalization, in AAAI, 2021. G. Yao, Y. Yuan, T. Shao, S. Li, S. Liu, Y. Liu, M. Wang, and K. Zhou. paper
  90. FL-MSRE: A few-shot learning based approach to multimodal social relation extraction, in AAAI, 2021. H. Wan, M. Zhang, J. Du, Z. Huang, Y. Yang, and J. Z. Pan. paper code
  91. StarNet: Towards weakly supervised few-shot object detection, in AAAI, 2021. L. Karlinsky, J. Shtok, A. Alfassy, M. Lichtenstein, S. Harary, E. Schwartz, S. Doveh, P. Sattigeri, R. Feris, A. Bronstein, and R. Giryes. paper code
  92. Progressive one-shot human parsing, in AAAI, 2021. H. He, J. Zhang, B. Thuraisingham, and D. Tao. paper code
  93. Knowledge is power: Hierarchical-knowledge embedded meta-learning for visual reasoning in artistic domains, in KDD, 2021. W. Zheng, L. Yan, C. Gou, and F.-Y. Wang. paper
  94. MEDA: Meta-learning with data augmentation for few-shot text classification, in IJCAI, 2021. P. Sun, Y. Ouyang, W. Zhang, and X.-Y. Dai. paper
  95. Learning implicit temporal alignment for few-shot video classification, in IJCAI, 2021. S. Zhang, J. Zhou, and X. He. paper code
  96. Few-shot neural human performance rendering from sparse RGBD videos, in IJCAI, 2021. A. Pang, X. Chen, H. Luo, M. Wu, J. Yu, and L. Xu. paper
  97. Uncertainty-aware few-shot image classification, in IJCAI, 2021. Z. Zhang, C. Lan, W. Zeng, Z. Chen, and S. Chan. paper
  98. Few-shot learning with part discovery and augmentation from unlabeled images, in IJCAI, 2021. W. Chen, C. Si, W. Wang, L. Wang, Z. Wang, and T. Tan. paper
  99. Few-shot partial-label learning, in IJCAI, 2021. Y. Zhao, G. Yu, L. Liu, Z. Yan, L. Cui, and C. Domeniconi. paper
  100. One-shot affordance detection, in IJCAI, 2021. H. Luo, W. Zhai, J. Zhang, Y. Cao, and D. Tao. paper
  101. DeFRCN: Decoupled faster R-CNN for few-shot object detection, in ICCV, 2021. L. Qiao, Y. Zhao, Z. Li, X. Qiu, J. Wu, and C. Zhang. paper
  102. Learning meta-class memory for few-shot semantic segmentation, in ICCV, 2021. Z. Wu, X. Shi, G. Lin, and J. Cai. paper
  103. UVStyle-Net: Unsupervised few-shot learning of 3D style similarity measure for B-Reps, in ICCV, 2021. P. Meltzer, H. Shayani, A. Khasahmadi, P. K. Jayaraman, A. Sanghi, and J. Lambourne. paper
  104. LoFGAN: Fusing local representations for few-shot image generation, in ICCV, 2021. Z. Gu, W. Li, J. Huo, L. Wang, and Y. Gao. paper
  105. Recurrent mask refinement for few-shot medical image segmentation, in ICCV, 2021. H. Tang, X. Liu, S. Sun, X. Yan, and X. Xie. paper code
  106. H3D-Net: Few-shot high-fidelity 3D head reconstruction, in ICCV, 2021. E. Ramon, G. Triginer, J. Escur, A. Pumarola, J. Garcia, X. Giró-i-Nieto, and F. Moreno-Noguer. paper
  107. Learned spatial representations for few-shot talking-head synthesis, in ICCV, 2021. M. Meshry, S. Suri, L. S. Davis, and A. Shrivastava. paper
  108. Putting NeRF on a diet: Semantically consistent few-shot view synthesis, in ICCV, 2021. A. Jain, M. Tancik, and P. Abbeel. paper
  109. Hypercorrelation squeeze for few-shot segmentation, in ICCV, 2021. J. Min, D. Kang, and M. Cho. paper code
  110. Few-shot semantic segmentation with cyclic memory network, in ICCV, 2021. G. Xie, H. Xiong, J. Liu, Y. Yao, and L. Shao. paper
  111. Simpler is better: Few-shot semantic segmentation with classifier weight transformer, in ICCV, 2021. Z. Lu, S. He, X. Zhu, L. Zhang, Y. Song, and T. Xiang. paper code
  112. Unsupervised few-shot action recognition via action-appearance aligned meta-adaptation, in ICCV, 2021. J. Patravali, G. Mittal, Y. Yu, F. Li, and M. Chen. paper
  113. Multiple heads are better than one: few-shot font generation with multiple localized experts, in ICCV, 2021. S. Park, S. Chun, J. Cha, B. Lee, and H. Shim. paper code
  114. Mining latent classes for few-shot segmentation, in ICCV, 2021. L. Yang, W. Zhuo, L. Qi, Y. Shi, and Y. Gao. paper code
  115. Partner-assisted learning for few-shot image classification, in ICCV, 2021. J. Ma, H. Xie, G. Han, S. Chang, A. Galstyan, and W. Abd-Almageed. paper
  116. Hierarchical graph attention network for few-shot visual-semantic learning, in ICCV, 2021. C. Yin, K. Wu, Z. Che, B. Jiang, Z. Xu, and J. Tang. paper
  117. Video pose distillation for few-shot, fine-grained sports action recognition, in ICCV, 2021. J. Hong, M. Fisher, M. Gharbi, and K. Fatahalian. paper
  118. Universal-prototype enhancing for few-shot object detection, in ICCV, 2021. A. Wu, Y. Han, L. Zhu, and Y. Yang. paper code
  119. Query adaptive few-shot object detection with heterogeneous graph convolutional networks, in ICCV, 2021. G. Han, Y. He, S. Huang, J. Ma, and S. Chang. paper
  120. Few-shot visual relationship co-localization, in ICCV, 2021. R. Teotia, V. Mishra, M. Maheshwari, and A. Mishra. paper code
  121. Shallow Bayesian meta learning for real-world few-shot recognition, in ICCV, 2021. X. Zhang, D. Meng, H. Gouk, and T. M. Hospedales. paper code
  122. Super-resolving cross-domain face miniatures by peeking at one-shot exemplar, in ICCV, 2021. P. Li, X. Yu, and Y. Yang. paper
  123. Few-shot segmentation via cycle-consistent transformer, in NeurIPS, 2021. G. Zhang, G. Kang, Y. Yang, and Y. Wei. paper
  124. Generalized and discriminative few-shot object detection via SVD-dictionary enhancement, in NeurIPS, 2021. A. WU, S. Zhao, C. Deng, and W. Liu. paper
  125. Re-ranking for image retrieval and transductive few-shot classification, in NeurIPS, 2021. X. SHEN, Y. Xiao, S. Hu, O. Sbai, and M. Aubry. paper
  126. Neural view synthesis and matching for semi-supervised few-shot learning of 3D pose, in NeurIPS, 2021. A. Wang, S. Mei, A. L. Yuille, and A. Kortylewski. paper
  127. MetaAvatar: Learning animatable clothed human models from few depth images, in NeurIPS, 2021. S. Wang, M. Mihajlovic, Q. Ma, A. Geiger, and S. Tang. paper
  128. Few-shot object detection via association and discrimination, in NeurIPS, 2021. Y. Cao, J. Wang, Y. Jin, T. Wu, K. Chen, Z. Liu, and D. Lin. paper
  129. Rectifying the shortcut learning of background for few-shot learning, in NeurIPS, 2021. X. Luo, L. Wei, L. Wen, J. Yang, L. Xie, Z. Xu, and Q. Tian. paper
  130. D2C: Diffusion-decoding models for few-shot conditional generation, in NeurIPS, 2021. A. Sinha, J. Song, C. Meng, and S. Ermon. paper
  131. Few-shot backdoor attacks on visual object tracking, in ICLR, 2022. Y. Li, H. Zhong, X. Ma, Y. Jiang, and S. Xia. paper code
  132. Temporal alignment prediction for supervised representation learning and few-shot sequence classification, in ICLR, 2022. B. Su, and J. Wen. paper code
  133. Learning non-target knowledge for few-shot semantic segmentation, in CVPR, 2022. Y. Liu, N. Liu, Q. Cao, X. Yao, J. Han, and L. Shao. paper
  134. Learning what not to segment: A new perspective on few-shot segmentation, in CVPR, 2022. C. Lang, G. Cheng, B. Tu, and J. Han. paper code
  135. Few-shot keypoint detection with uncertainty learning for unseen species, in CVPR, 2022. C. Lu, and P. Koniusz. paper
  136. XMP-Font: Self-supervised cross-modality pre-training for few-shot font generation, in CVPR, 2022. W. Liu, F. Liu, F. Ding, Q. He, and Z. Yi. paper
  137. Spatio-temporal relation modeling for few-shot action recognition, in CVPR, 2022. A. Thatipelli, S. Narayan, S. Khan, R. M. Anwer, F. S. Khan, and B. Ghanem. paper code
  138. Attribute group editing for reliable few-shot image generation, in CVPR, 2022. G. Ding, X. Han, S. Wang, S. Wu, X. Jin, D. Tu, and Q. Huang. paper code
  139. Few-shot backdoor defense using Shapley estimation, in CVPR, 2022. J. Guan, Z. Tu, R. He, and D. Tao. paper
  140. Hybrid relation guided set matching for few-shot action recognition, in CVPR, 2022. X. Wang, S. Zhang, Z. Qing, M. Tang, Z. Zuo, C. Gao, R. Jin, and N. Sang. paper code
  141. Label, verify, correct: A simple few shot object detection method, in CVPR, 2022. P. Kaul, W. Xie, and A. Zisserman. paper
  142. InfoNeRF: Ray entropy minimization for few-shot neural volume rendering, in CVPR, 2022. M. Kim, S. Seo, and B. Han. paper
  143. A closer look at few-shot image generation, in CVPR, 2022. Y. Zhao, H. Ding, H. Huang, and N. Cheung. paper code
  144. Motion-modulated temporal fragment alignment network for few-shot action recognition, in CVPR, 2022. J. Wu, T. Zhang, Z. Zhang, F. Wu, and Y. Zhang. paper
  145. Kernelized few-shot object detection with efficient integral aggregation, in CVPR, 2022. S. Zhang, L. Wang, N. Murray, and P. Koniusz. paper code
  146. FS6D: Few-shot 6D pose estimation of novel objects, in CVPR, 2022. Y. He, Y. Wang, H. Fan, J. Sun, and Q. Chen. paper
  147. Look closer to supervise better: One-shot font generation via component-based discriminator, in CVPR, 2022. Y. Kong, C. Luo, W. Ma, Q. Zhu, S. Zhu, N. Yuan, and L. Jin. paper
  148. Generalized few-shot semantic segmentation, in CVPR, 2022. Z. Tian, X. Lai, L. Jiang, S. Liu, M. Shu, H. Zhao, and J. Jia. paper code
  149. Which images to label for few-shot medical landmark detection?, in CVPR, 2022. Q. Quan, Q. Yao, J. Li, and S. K. Zhou. paper
  150. Dynamic prototype convolution network for few-shot semantic segmentation, in CVPR, 2022. J. Liu, Y. Bao, G. Xie, H. Xiong, J. Sonke, and E. Gavves. paper
  151. OSOP: A multi-stage one shot object pose estimation framework, in CVPR, 2022. I. Shugurov, F. Li, B. Busam, and S. Ilic. paper
  152. Semantic-aligned fusion transformer for one-shot object detection, in CVPR, 2022. Y. Zhao, X. Guo, and Y. Lu. paper
  153. OnePose: One-shot object pose estimation without CAD models, in CVPR, 2022. J. Sun, Z. Wang, S. Zhang, X. He, H. Zhao, G. Zhang, and X. Zhou. paper code
  154. Few-shot object detection with fully cross-transformer, in CVPR, 2022. G. Han, J. Ma, S. Huang, L. Chen, and S. Chang. paper
  155. Learning to memorize feature hallucination for one-shot image generation, in CVPR, 2022. Y. Xie, Y. Fu, Y. Tai, Y. Cao, J. Zhu, and C. Wang. paper
  156. Few-shot font generation by learning fine-grained local styles, in CVPR, 2022. L. Tang, Y. Cai, J. Liu, Z. Hong, M. Gong, M. Fan, J. Han, J. Liu, E. Ding, and J. Wang. paper
  157. Balanced and hierarchical relation learning for one-shot object detection, in CVPR, 2022. H. Yang, S. Cai, H. Sheng, B. Deng, J. Huang, X. Hua, Y. Tang, and Y. Zhang. paper
  158. Few-shot head swapping in the wild, in CVPR, 2022. C. Shu, H. Wu, H. Zhou, J. Liu, Z. Hong, C. Ding, J. Han, J. Liu, E. Ding, and J. Wang. paper
  159. Integrative few-shot learning for classification and segmentation, in CVPR, 2022. D. Kang, and M. Cho. paper
  160. Attribute surrogates learning and spectral tokens pooling in transformers for few-shot learning, in CVPR, 2022. Y. He, W. Liang, D. Zhao, H. Zhou, W. Ge, Y. Yu, and W. Zhang. paper code
  161. Task discrepancy maximization for fine-grained few-shot classification, in CVPR, 2022. S. Lee, W. Moon, and J. Heo. paper

Robotics

  1. Towards one shot learning by imitation for humanoid robots, in ICRA, 2010. Y. Wu and Y. Demiris. paper
  2. Learning manipulation actions from a few demonstrations, in ICRA, 2013. N. Abdo, H. Kretzschmar, L. Spinello, and C. Stachniss. paper
  3. Learning assistive strategies from a few user-robot interactions: Model-based reinforcement learning approach, in ICRA, 2016. M. Hamaya, T. Matsubara, T. Noda, T. Teramae, and J. Morimoto. paper
  4. One-shot imitation learning, in NeurIPS, 2017. Y. Duan, M. Andrychowicz, B. Stadie, J. Ho, J. Schneider, I. Sutskever, P. Abbeel, and W. Zaremba. paper
  5. Meta-learning language-guided policy learning, in ICLR, 2019. J. D. Co-Reyes, A. Gupta, S. Sanjeev, N. Altieri, J. DeNero, P. Abbeel, and S. Levine. paper
  6. Meta reinforcement learning with autonomous inference of subtask dependencies, in ICLR, 2020. S. Sohn, H. Woo, J. Choi, and H. Lee. paper
  7. Watch, try, learn: Meta-learning from demonstrations and rewards, in ICLR, 2020. A. Zhou, E. Jang, D. Kappler, A. Herzog, M. Khansari, P. Wohlhart, Y. Bai, M. Kalakrishnan, S. Levine, and C. Finn. paper
  8. Few-shot Bayesian imitation learning with logical program policies, in AAAI, 2020. T. Silver, K. R. Allen, A. K. Lew, L. P. Kaelbling, and J. Tenenbaum. paper
  9. One solution is not all you need: Few-shot extrapolation via structured MaxEnt RL, in NeurIPS, 2020. S. Kumar, A. Kumar, S. Levine, and C. Finn. paper
  10. Bowtie networks: Generative modeling for joint few-shot recognition and novel-view synthesis, in ICLR, 2021. Z. Bao, Y. Wang, and M. Hebert. paper
  11. Demonstration-conditioned reinforcement learning for few-shot imitation, in ICML, 2021. C. R. Dance, J. Perez, and T. Cachet. paper
  12. Hierarchical few-shot imitation with skill transition models, in ICLR, 2022. K. Hakhamaneshi, R. Zhao, A. Zhan, P. Abbeel, and M. Laskin. paper

Natural Language Processing

  1. High-risk learning: Acquiring new word vectors from tiny data, in EMNLP, 2017. A. Herbelot and M. Baroni. paper
  2. MetaEXP: Interactive explanation and exploration of large knowledge graphs, in TheWebConf, 2018. F. Behrens, S. Bischoff, P. Ladenburger, J. Rückin, L. Seidel, F. Stolp, M. Vaichenker, A. Ziegler, D. Mottin, F. Aghaei, E. Müller, M. Preusse, N. Müller, and M. Hunger. paper code
  3. Few-shot representation learning for out-of-vocabulary words, in ACL, 2019. Z. Hu, T. Chen, K.-W. Chang, and Y. Sun. paper
  4. Learning to customize model structures for few-shot dialogue generation tasks, in ACL, 2020. Y. Song, Z. Liu, W. Bi, R. Yan, and M. Zhang. paper
  5. Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network, in ACL, 2020. Y. Hou, W. Che, Y. Lai, Z. Zhou, Y. Liu, H. Liu, and T. Liu. paper
  6. Meta-reinforced multi-domain state generator for dialogue systems, in ACL, 2020. Y. Huang, J. Feng, M. Hu, X. Wu, X. Du, and S. Ma. paper
  7. Few-shot knowledge graph completion, in AAAI, 2020. C. Zhang, H. Yao, C. Huang, M. Jiang, Z. Li, and N. V. Chawla. paper
  8. Universal natural language processing with limited annotations: Try few-shot textual entailment as a start, in EMNLP, 2020. W. Yin, N. F. Rajani, D. Radev, R. Socher, and C. Xiong. paper code
  9. Simple and effective few-shot named entity recognition with structured nearest neighbor learning, in EMNLP, 2020. Y. Yang, and A. Katiyar. paper code
  10. Discriminative nearest neighbor few-shot intent detection by transferring natural language inference, in EMNLP, 2020. J. Zhang, K. Hashimoto, W. Liu, C. Wu, Y. Wan, P. Yu, R. Socher, and C. Xiong. paper code
  11. Few-shot learning for opinion summarization, in EMNLP, 2020. A. Bražinskas, M. Lapata, and I. Titov. paper code
  12. Adaptive attentional network for few-shot knowledge graph completion, in EMNLP, 2020. J. Sheng, S. Guo, Z. Chen, J. Yue, L. Wang, T. Liu, and H. Xu. paper code
  13. Few-shot complex knowledge base question answering via meta reinforcement learning, in EMNLP, 2020. Y. Hua, Y. Li, G. Haffari, G. Qi, and T. Wu. paper code
  14. Self-supervised meta-learning for few-shot natural language classification tasks, in EMNLP, 2020. T. Bansal, R. Jha, T. Munkhdalai, and A. McCallum. paper code
  15. Uncertainty-aware self-training for few-shot text classification, in NeurIPS, 2020. S. Mukherjee, and A. Awadallah. paper code
  16. Learning to extrapolate knowledge: Transductive few-shot out-of-graph link prediction, in NeurIPS, 2020:. J. Baek, D. B. Lee, and S. J. Hwang. paper code
  17. MetaNER: Named entity recognition with meta-learning, in TheWebConf, 2020. J. Li, S. Shang, and L. Shao. paper
  18. Conditionally adaptive multi-task learning: Improving transfer learning in NLP using fewer parameters & less data, in ICLR, 2021. J. Pilault, A. E. hattami, and C. Pal. paper code
  19. Revisiting few-sample BERT fine-tuning, in ICLR, 2021. T. Zhang, F. Wu, A. Katiyar, K. Q. Weinberger, and Y. Artzi. paper code
  20. Few-shot conversational dense retrieval, in SIGIR, 2021. S. Yu, Z. Liu, C. Xiong, T. Feng, and Z. Liu. paper code
  21. Relational learning with gated and attentive neighbor aggregator for few-shot knowledge graph completion, in SIGIR, 2021. G. Niu, Y. Li, C. Tang, R. Geng, J. Dai, Q. Liu, H. Wang, J. Sun, F. Huang, and L. Si. paper
  22. Few-shot language coordination by modeling theory of mind, in ICML, 2021. H. Zhu, G. Neubig, and Y. Bisk. paper code
  23. Graph-evolving meta-learning for low-resource medical dialogue generation, in AAAI, 2021. S. Lin, P. Zhou, X. Liang, J. Tang, R. Zhao, Z. Chen, and L. Lin. paper
  24. KEML: A knowledge-enriched meta-learning framework for lexical relation classification, in AAAI, 2021. C. Wang, M. Qiu, J. Huang, and X. He. paper
  25. Few-shot learning for multi-label intent detection, in AAAI, 2021. Y. Hou, Y. Lai, Y. Wu, W. Che, and T. Liu. paper code
  26. SALNet: Semi-supervised few-shot text classification with attention-based lexicon construction, in AAAI, 2021. J.-H. Lee, S.-K. Ko, and Y.-S. Han. paper
  27. Learning from my friends: Few-shot personalized conversation systems via social networks, in AAAI, 2021. Z. Tian, W. Bi, Z. Zhang, D. Lee, Y. Song, and N. L. Zhang. paper code
  28. Relative and absolute location embedding for few-shot node classification on graph, in AAAI, 2021. Z. Liu, Y. Fang, C. Liu, and S. C.H. Hoi. paper
  29. Few-shot question answering by pretraining span selection, in ACL-IJCNLP, 2021. O. Ram, Y. Kirstain, J. Berant, A. Globerson, and O. Levy. paper code
  30. A closer look at few-shot crosslingual transfer: The choice of shots matters, in ACL-IJCNLP, 2021. M. Zhao, Y. Zhu, E. Shareghi, I. Vulic, R. Reichart, A. Korhonen, and H. Schütze. paper code
  31. Learning from miscellaneous other-classwords for few-shot named entity recognition, in ACL-IJCNLP, 2021. M. Tong, S. Wang, B. Xu, Y. Cao, M. Liu, L. Hou, and J. Li. paper code
  32. Distinct label representations for few-shot text classification, in ACL-IJCNLP, 2021. S. Ohashi, J. Takayama, T. Kajiwara, and Y. Arase. paper code
  33. Entity concept-enhanced few-shot relation extraction, in ACL-IJCNLP, 2021. S. Yang, Y. Zhang, G. Niu, Q. Zhao, and S. Pu. paper code
  34. On training instance selection for few-shot neural text generation, in ACL-IJCNLP, 2021. E. Chang, X. Shen, H.-S. Yeh, and V. Demberg. paper code
  35. Unsupervised neural machine translation for low-resource domains via meta-learning, in ACL-IJCNLP, 2021. C. Park, Y. Tae, T. Kim, S. Yang, M. A. Khan, L. Park, and J. Choo. paper code
  36. Meta-learning with variational semantic memory for word sense disambiguation, in ACL-IJCNLP, 2021. Y. Du, N. Holla, X. Zhen, C. Snoek, and E. Shutova. paper code
  37. Multi-label few-shot learning for aspect category detection, in ACL-IJCNLP, 2021. M. Hu, S. Z. H. Guo, C. Xue, H. Gao, T. Gao, R. Cheng, and Z. Su. paper
  38. TextSETTR: Few-shot text style extraction and tunable targeted restyling, in ACL-IJCNLP, 2021. P. Rileya, N. Constantb, M. Guob, G. Kumarc, D. Uthusb, and Z. Parekh. paper
  39. Few-shot text ranking with meta adapted synthetic weak supervision, in ACL-IJCNLP, 2021. S. Sun, Y. Qian, Z. Liu, C. Xiong, K. Zhang, J. Bao, Z. Liu, and P. Bennett. paper code
  40. PROTAUGMENT: Intent detection meta-learning through unsupervised diverse paraphrasing, in ACL-IJCNLP, 2021. T. Dopierre, C. Gravier, and W. Logerais. paper code
  41. AUGNLG: Few-shot natural language generation using self-trained data augmentation, in ACL-IJCNLP, 2021. X. Xu, G. Wang, Y.-B. Kim, and S. Lee. paper code
  42. Meta self-training for few-shot neural sequence labeling, in KDD, 2021. Y. Wang, S. Mukherjee, H. Chu, Y. Tu, M. Wu, J. Gao, and A. H. Awadallah. paper code
  43. Knowledge-enhanced domain adaptation in few-shot relation classification, in KDD, 2021. J. Zhang, J. Zhu, Y. Yang, W. Shi, C. Zhang, and H. Wang. paper code
  44. Few-shot text classification with triplet networks, data augmentation, and curriculum learning, in NAACL-HLT, 2021. J. Wei, C. Huang, S. Vosoughi, Y. Cheng, and S. Xu. paper code
  45. Few-shot intent classification and slot filling with retrieved examples, in NAACL-HLT, 2021. D. Yu, L. He, Y. Zhang, X. Du, P. Pasupat, and Q. Li. paper
  46. Non-parametric few-shot learning for word sense disambiguation, in NAACL-HLT, 2021. H. Chen, M. Xia, and D. Chen. paper code
  47. Towards few-shot fact-checking via perplexity, in NAACL-HLT, 2021. N. Lee, Y. Bang, A. Madotto, and P. Fung. paper
  48. ConVEx: Data-efficient and few-shot slot labeling, in NAACL-HLT, 2021. M. Henderson, and I. Vulic. paper
  49. Few-shot text generation with natural language instructions, in EMNLP, 2021. T. Schick, and H. Schütze. paper
  50. Towards realistic few-shot relation extraction, in EMNLP, 2021. S. Brody, S. Wu, and A. Benton. paper code
  51. Few-shot emotion recognition in conversation with sequential prototypical networks, in EMNLP, 2021. G. Guibon, M. Labeau, H. Flamein, L. Lefeuvre, and C. Clavel. paper code
  52. Learning prototype representations across few-shot tasks for event detection, in EMNLP, 2021. V. Lai, F. Dernoncourt, and T. H. Nguyen. paper
  53. Exploring task difficulty for few-shot relation extraction, in EMNLP, 2021. J. Han, B. Cheng, and W. Lu. paper code
  54. Honey or poison? Solving the trigger curse in few-shot event detection via causal intervention, in EMNLP, 2021. J. Chen, H. Lin, X. Han, and L. Sun. paper code
  55. Nearest neighbour few-shot learning for cross-lingual classification, in EMNLP, 2021. M. S. Bari, B. Haider, and S. Mansour. paper
  56. Knowledge-aware meta-learning for low-resource text classification, in EMNLP, 2021. H. Yao, Y. Wu, M. Al-Shedivat, and E. P. Xing. paper code
  57. Few-shot named entity recognition: An empirical baseline study, in EMNLP, 2021. J. Huang, C. Li, K. Subudhi, D. Jose, S. Balakrishnan, W. Chen, B. Peng, J. Gao, and J. Han. paper
  58. MetaTS: Meta teacher-student network for multilingual sequence labeling with minimal supervision, in EMNLP, 2021. Z. Li, D. Zhang, T. Cao, Y. Wei, Y. Song, and B. Yin. paper
  59. Meta-LMTC: Meta-learning for large-scale multi-label text classification, in EMNLP, 2021. R. Wang, X. Su, S. Long, X. Dai, S. Huang, and J. Chen. paper
  60. Ontology-enhanced prompt-tuning for few-shot learning., in TheWebConf, 2022. H. Ye, N. Zhang, S. Deng, X. Chen, H. Chen, F. Xiong, X. Chen, and H. Chen. paper
  61. EICO: Improving few-shot text classification via explicit and implicit consistency regularization, in Findings of ACL, 2022. L. Zhao, and C. Yao. paper
  62. Dialogue summaries as dialogue states (DS2), template-guided summarization for few-shot dialogue state tracking, in Findings of ACL, 2022. J. Shin, H. Yu, H. Moon, A. Madotto, and J. Park. paper code
  63. A few-shot semantic parser for wizard-of-oz dialogues with the precise thingtalk representation, in Findings of ACL, 2022. G. Campagna, S. J. Semnani, R. Kearns, L. J. K. Sato, S. Xu, and M. S. Lam. paper
  64. Multi-stage prompting for knowledgeable dialogue generation, in Findings of ACL, 2022. Z. Liu, M. Patwary, R. Prenger, S. Prabhumoye, W. Ping, M. Shoeybi, and B. Catanzaro. paper code
  65. Few-shot named entity recognition with self-describing networks, in ACL, 2022. J. Chen, Q. Liu, H. Lin, X. Han, and L. Sun. paper code
  66. CLIP models are few-shot learners: Empirical studies on VQA and visual entailment, in ACL, 2022. H. Song, L. Dong, W. Zhang, T. Liu, and F. Wei. paper
  67. CONTaiNER: Few-shot named entity recognition via contrastive learning, in ACL, 2022. S. S. S. Das, A. Katiyar, R. J. Passonneau, and R. Zhang. paper code
  68. Few-shot controllable style transfer for low-resource multilingual settings, in ACL, 2022. K. Krishna, D. Nathani, X. Garcia, B. Samanta, and P. Talukdar. paper
  69. Label semantic aware pre-training for few-shot text classification, in ACL, 2022. A. Mueller, J. Krone, S. Romeo, S. Mansour, E. Mansimov, Y. Zhang, and D. Roth. paper
  70. Inverse is better! Fast and accurate prompt for few-shot slot tagging, in Findings of ACL, 2022. Y. Hou, C. Chen, X. Luo, B. Li, and W. Che. paper
  71. Label semantics for few shot named entity recognition, in Findings of ACL, 2022. J. Ma, M. Ballesteros, S. Doss, R. Anubhai, S. Mallya, Y. Al-Onaizan, and D. Roth. paper
  72. Hierarchical recurrent aggregative generation for few-shot NLG, in Findings of ACL, 2022. G. Zhou, G. Lampouras, and I. Iacobacci. paper
  73. Towards few-shot entity recognition in document images: A label-aware sequence-to-sequence framework, in Findings of ACL, 2022. Z. Wang, and J. Shang. paper
  74. A good prompt is worth millions of parameters: Low-resource prompt-based learning for vision-language models, in ACL, 2022. W. Jin, Y. Cheng, Y. Shen, W. Chen, and X. Ren. paper code
  75. Generated knowledge prompting for commonsense reasoning, in ACL, 2022. J. Liu, A. Liu, X. Lu, S. Welleck, P. West, R. L. Bras, Y. Choi, and H. Hajishirzi. paper code
  76. End-to-end modeling via information tree for one-shot natural language spatial video grounding, in ACL, 2022. M. Li, T. Wang, H. Zhang, S. Zhang, Z. Zhao, J. Miao, W. Zhang, W. Tan, J. Wang, P. Wang, S. Pu, and F. Wu. paper
  77. Leveraging task transferability to meta-learning for clinical section classification with limited data, in ACL, 2022. Z. Chen, J. Kim, R. Bhakta, and M. Y. Sir. paper
  78. Improving meta-learning for low-resource text classification and generation via memory imitation, in ACL, 2022. Y. Zhao, Z. Tian, H. Yao, Y. Zheng, D. Lee, Y. Song, J. Sun, and N. L. Zhang. paper
  79. A simple yet effective relation information guided approach for few-shot relation extraction, in Findings of ACL, 2022. Y. Liu, J. Hu, X. Wan, and T. Chang. paper code
  80. Decomposed meta-learning for few-shot named entity recognition, in Findings of ACL, 2022. T. Ma, H. Jiang, Q. Wu, T. Zhao, and C. Lin. paper code
  81. Meta-learning for fast cross-lingual adaptation in dependency parsing, in ACL, 2022. A. Langedijk, V. Dankers, P. Lippe, S. Bos, B. C. Guevara, H. Yannakoudakis, and E. Shutova. paper code
  82. Enhancing cross-lingual natural language inference by prompt-learning from cross-lingual templates, in ACL, 2022. K. Qi, H. Wan, J. Du, and H. Chen. paper code

Acoustic Signal Processing

  1. One-shot learning of generative speech concepts, in CogSci, 2014. B. Lake, C.-Y. Lee, J. Glass, and J. Tenenbaum. paper
  2. Machine speech chain with one-shot speaker adaptation, INTERSPEECH, 2018. A. Tjandra, S. Sakti, and S. Nakamura. paper
  3. Investigation of using disentangled and interpretable representations for one-shot cross-lingual voice conversion, INTERSPEECH, 2018. S. H. Mohammadi and T. Kim. paper
  4. Few-shot audio classification with attentional graph neural networks, INTERSPEECH, 2019. S. Zhang, Y. Qin, K. Sun, and Y. Lin. paper
  5. One-shot voice conversion with disentangled representations by leveraging phonetic posteriorgrams, INTERSPEECH, 2019. S. H. Mohammadi, and T. Kim. paper
  6. One-shot voice conversion with global speaker embeddings, INTERSPEECH, 2019. H. Lu, Z. Wu, D. Dai, R. Li, S. Kang, J. Jia, and H. Meng. paper
  7. One-shot voice conversion by separating speaker and content representations with instance normalization, INTERSPEECH, 2019. J.-C. Chou, and H.-Y. Lee. paper
  8. Audio2Head: Audio-driven one-shot talking-head generation with natural head motion, in IJCAI, 2021. S. Wang, L. Li, Y. Ding, C. Fan, and X. Yu. paper

Recommendation

  1. A meta-learning perspective on cold-start recommendations for items, in NeurIPS, 2017. M. Vartak, A. Thiagarajan, C. Miranda, J. Bratman, and H. Larochelle. paper
  2. MeLU: Meta-learned user preference estimator for cold-start recommendation, in KDD, 2019. H. Lee, J. Im, S. Jang, H. Cho, and S. Chung. paper code
  3. Sequential scenario-specific meta learner for online recommendation, in KDD, 2019. Z. Du, X. Wang, H. Yang, J. Zhou, and J. Tang. paper code
  4. Few-shot learning for new user recommendation in location-based social networks, in TheWebConf, 2020. R. Li, X. Wu, X. Chen, and W. Wang. paper
  5. MAMO: Memory-augmented meta-optimization for cold-start recommendation, in KDD, 2020. M. Dong, F. Yuan, L. Yao, X. Xu, and L. Zhu. paper code
  6. Meta-learning on heterogeneous information networks for cold-start recommendation, in KDD, 2020. Y. Lu, Y. Fang, and C. Shi. paper code
  7. MetaSelector: Meta-learning for recommendation with user-level adaptive model selection, in TheWebConf, 2020. M. Luo, F. Chen, P. Cheng, Z. Dong, X. He, J. Feng, and Z. Li. paper
  8. Fast adaptation for cold-start collaborative filtering with meta-learning, in ICDM, 2020. T. Wei, Z. Wu, R. Li, Z. Hu, F. Feng, X. H. Sun, and W. Wang. paper
  9. Preference-adaptive meta-learning for cold-start recommendation, in IJCAI, 2021. L. Wang, B. Jin, Z. Huang, H. Zhao, D. Lian, Q. Liu, and E. Chen. paper
  10. Meta-learning helps personalized product search., in TheWebConf, 2022. B. Wu, Z. Meng, Q. Zhang, and S. Liang. paper
  11. Alleviating cold-start problem in CTR prediction with a variational embedding learning framework., in TheWebConf, 2022. X. Xu, C. Yang, Q. Yu, Z. Fang, J. Wang, C. Fan, Y. He, C. Peng, Z. Lin, and J. Shao. paper
  12. PNMTA: A pretrained network modulation and task adaptation approach for user cold-start recommendation., in TheWebConf, 2022. H. Pang, F. Giunchiglia, X. Li, R. Guan, and X. Feng. paper

Others

  1. Low data drug discovery with one-shot learning, ACS Central Science, 2017. H. Altae-Tran, B. Ramsundar, A. S. Pappu, and V. Pande. paper
  2. SMASH: One-shot model architecture search through hypernetworks, in ICLR, 2018. A. Brock, T. Lim, J. Ritchie, and N. Weston. paper
  3. SPARC: Self-paced network representation for few-shot rare category characterization, in KDD, 2018. D. Zhou, J. He, H. Yang, and W. Fan. paper
  4. MetaPred: Meta-learning for clinical risk prediction with limited patient electronic health records, in KDD, 2019. X. S. Zhang, F. Tang, H. H. Dodge, J. Zhou, and F. Wang. paper code
  5. AffnityNet: Semi-supervised few-shot learning for disease type prediction, in AAAI, 2019. T. Ma, and A. Zhang. paper
  6. Learning from multiple cities: A meta-learning approach for spatial-temporal prediction, in TheWebConf, 2019. H. Yao, Y. Liu, Y. Wei, X. Tang, and Z. Li. paper code
  7. Federated meta-learning for fraudulent credit card detection, in IJCAI, 2020. W. Zheng, L. Yan, C. Gou, and F. Wang. paper
  8. Differentially private meta-learning, in ICLR, 2020. J. Li, M. Khodak, S. Caldas, and A. Talwalkar. paper
  9. Towards fast adaptation of neural architectures with meta learning, in ICLR, 2020. D. Lian, Y. Zheng, Y. Xu, Y. Lu, L. Lin, P. Zhao, J. Huang, and S. Gao. paper
  10. Using optimal embeddings to learn new intents with few examples: An application in the insurance domain, in KDD, 2020:. S. Acharya, and G. Fung. paper
  11. Meta-learning for query conceptualization at web scale, in KDD, 2020. F. X. Han, D. Niu, H. Chen, W. Guo, S. Yan, and B. Long. paper
  12. Few-sample and adversarial representation learning for continual stream mining, in TheWebConf, 2020. Z. Wang, Y. Wang, Y. Lin, E. Delord, and L. Khan. paper
  13. Few-shot graph learning for molecular property prediction, in TheWebConf, 2021. Z. Guo, C. Zhang, W. Yu, J. Herr, O. Wiest, M. Jiang, and N. V. Chawla. paper code
  14. Taxonomy-aware learning for few-shot event detection, in TheWebConf, 2021. J. Zheng, F. Cai, W. Chen, W. Lei, and H. Chen. paper
  15. Learning from graph propagation via ordinal distillation for one-shot automated essay scoring, in TheWebConf, 2021. Z. Jiang, M. Liu, Y. Yin, H. Yu, Z. Cheng, and Q. Gu. paper
  16. Few-shot network anomaly detection via cross-network meta-learning, in TheWebConf, 2021. K. Ding, Q. Zhou, H. Tong, and H. Liu. paper
  17. Few-shot knowledge validation using rules, in TheWebConf, 2021. M. Loster, D. Mottin, P. Papotti, J. Ehmüller, B. Feldmann, and F. Naumann. paper
  18. Graph learning regularization and transfer learning for few-shot event detection, in SIGIR, 2021. V. D. Lai, M. V. Nguyen, T. H. Nguyen, and F. Dernoncourt. paper code
  19. Progressive network grafting for few-shot knowledge distillation, in AAAI, 2021. C. Shen, X. Wang, Y. Yin, J. Song, S. Luo, and M. Song. paper code
  20. Curriculum meta-learning for next POI recommendation, in KDD, 2021. Y. Chen, X. Wang, M. Fan, J. Huang, S. Yang, and W. Zhu. paper code
  21. MFNP: A meta-optimized model for few-shot next POI recommendation, in IJCAI, 2021. H. Sun, J. Xu, K. Zheng, P. Zhao, P. Chao, and X. Zhou. paper
  22. Physics-aware spatiotemporal modules with auxiliary tasks for meta-learning, in IJCAI, 2021. S. Seo, C. Meng, S. Rambhatla, and Y. Liu. paper
  23. Property-aware relation networks for few-shot molecular property prediction, in NeurIPS, 2021. Y. Wang, A. Abuduweili, Q. Yao, and D. Dou. paper code
  24. Few-shot data-driven algorithms for low rank approximation, in NeurIPS, 2021. P. Indyk, T. Wagner, and D. Woodruff. paper
  25. Non-Gaussian Gaussian processes for few-shot regression, in NeurIPS, 2021. M. Sendera, J. Tabor, A. Nowak, A. Bedychaj, M. Patacchiola, T. Trzcinski, P. Spurek, and M. Zieba. paper
  26. HELP: Hardware-adaptive efficient latency prediction for NAS via meta-learning, in NeurIPS, 2021. H. Lee, S. Lee, S. Chong, and S. J. Hwang. paper
  27. Learning to learn dense Gaussian processes for few-shot learning, in NeurIPS, 2021. Z. Wang, Z. Miao, X. Zhen, and Q. Qiu. paper
  28. A meta-learning based stress category detection framework on social media., in TheWebConf, 2022. X. Wang, L. Cao, H. Zhang, L. Feng, Y. Ding, and N. Li. paper

Theories

  1. Learning to learn around a common mean, in NeurIPS, 2018. G. Denevi, C. Ciliberto, D. Stamos, and M. Pontil. paper
  2. Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm, in ICLR, 2018. C. Finn and S. Levine. paper
  3. A theoretical analysis of the number of shots in few-shot learning, in ICLR, 2020. T. Cao, M. T. Law, and S. Fidler. paper
  4. Rapid learning or feature reuse? Towards understanding the effectiveness of MAML, in ICLR, 2020. A. Raghu, M. Raghu, S. Bengio, and O. Vinyals. paper
  5. Robust meta-learning for mixed linear regression with small batches, in NeurIPS, 2020. W. Kong, R. Somani, S. Kakade, and S. Oh. paper
  6. One-shot distributed ridge regression in high dimensions, in ICML, 2020. Y. Sheng, and E. Dobriban. paper
  7. Bridging the gap between practice and PAC-Bayes theory in few-shot meta-learning, in NeurIPS, 2021. N. Ding, X. Chen, T. Levinboim, S. Goodman, and R. Soricut. paper
  8. Generalization bounds for meta-learning: An information-theoretic analysis, in NeurIPS, 2021. Q. CHEN, C. Shui, and M. Marchand. paper
  9. Generalization bounds for meta-learning via PAC-Bayes and uniform stability, in NeurIPS, 2021. A. Farid, and A. Majumdar. paper
  10. Unraveling model-agnostic meta-learning via the adaptation learning rate, in ICLR, 2022. Y. Zou, F. Liu, and Q. Li. paper
  11. On the importance of firth bias reduction in few-shot classification, in ICLR, 2022. S. Ghaffari, E. Saleh, D. Forsyth, and Y. Wang. paper code
  12. Global convergence of MAML and theory-inspired neural architecture search for few-shot learning, in CVPR, 2022. H. Wang, Y. Wang, R. Sun, and B. Li. paper

Few-shot Learning and Zero-shot Learning

  1. Label-embedding for attribute-based classification, in CVPR, 2013. Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid. paper
  2. A unified semantic embedding: Relating taxonomies and attributes, in NeurIPS, 2014. S. J. Hwang and L. Sigal. paper
  3. Multi-attention network for one shot learning, in CVPR, 2017. P. Wang, L. Liu, C. Shen, Z. Huang, A. van den Hengel, and H. T. Shen. paper
  4. Few-shot and zero-shot multi-label learning for structured label spaces, in EMNLP, 2018. A. Rios and R. Kavuluru. paper
  5. Learning compositional representations for few-shot recognition, in ICCV, 2019. P. Tokmakov, Y.-X. Wang, and M. Hebert. paper code
  6. Large-scale few-shot learning: Knowledge transfer with class hierarchy, in CVPR, 2019. A. Li, T. Luo, Z. Lu, T. Xiang, and L. Wang. paper
  7. Generalized zero- and few-shot learning via aligned variational autoencoders, in CVPR, 2019. E. Schonfeld, S. Ebrahimi, S. Sinha, T. Darrell, and Z. Akata. paper code
  8. F-VAEGAN-D2: A feature generating framework for any-shot learning, in CVPR, 2019. Y. Xian, S. Sharma, B. Schiele, and Z. Akata. paper
  9. TGG: Transferable graph generation for zero-shot and few-shot learning, in ACM MM, 2019. C. Zhang, X. Lyu, and Z. Tang. paper
  10. Adaptive cross-modal few-shot learning, in NeurIPS, 2019. C. Xing, N. Rostamzadeh, B. N. Oreshkin, and P. O. Pinheiro. paper
  11. Learning meta model for zero- and few-shot face anti-spoofing, in AAAI, 2020. Y. Qin, C. Zhao, X. Zhu, Z. Wang, Z. Yu, T. Fu, F. Zhou, J. Shi, and Z. Lei. paper
  12. RD-GAN: Few/Zero-shot chinese character style transfer via radical decomposition and rendering, in ECCV, 2020. Y. Huang, M. He, L. Jin, and Y. Wang. paper
  13. An empirical study on large-scale multi-label text classification including few and zero-shot labels, in EMNLP, 2020. I. Chalkidis, M. Fergadiotis, S. Kotitsas, P. Malakasiotis, N. Aletras, and I. Androutsopoulos. paper
  14. Multi-label few/zero-shot learning with knowledge aggregated from multiple label graphs, in EMNLP, 2020. J. Lu, L. Du, M. Liu, and J. Dipnall. paper
  15. Emergent complexity and zero-shot transfer via unsupervised environment design, in NeurIPS, 2020. M. Dennis, N. Jaques, E. Vinitsky, A. Bayen, S. Russell, A. Critch, and S. Levine. paper
  16. Learning graphs for knowledge transfer with limited labels, in CVPR, 2021. P. Ghosh, N. Saini, L. S. Davis, and A. Shrivastava. paper
  17. Improving zero and few-shot abstractive summarization with intermediate fine-tuning and data augmentation, in NAACL-HLT, 2021. A. R. Fabbri, S. Han, H. Li, H. Li, M. Ghazvininejad, S. R. Joty, D. R. Radev, and Y. Mehdad. paper
  18. Label verbalization and entailment for effective zero and few-shot relation extraction, in EMNLP, 2021. O. Sainz, O. L. d. Lacalle, G. Labaka, A. Barrena, and E. Agirre. paper code
  19. An empirical investigation of word alignment supervision for zero-shot multilingual neural machine translation, in EMNLP, 2021. A. Raganato, R. Vázquez, M. Creutz, and J. Tiedemann. paper
  20. Bridge to target domain by prototypical contrastive learning and label confusion: Re-explore zero-shot learning for slot filling, in EMNLP, 2021. L. Wang, X. Li, J. Liu, K. He, Y. Yan, and W. Xu. paper code
  21. A label-aware BERT attention network for zero-shot multi-intent detection in spoken language understanding, in EMNLP, 2021. T. Wu, R. Su, and B. Juang. paper
  22. Zero-shot dialogue disentanglement by self-supervised entangled response selection, in EMNLP, 2021. T. Chi, and A. I. Rudnicky. paper code
  23. Robust retrieval augmented generation for zero-shot slot filling, in EMNLP, 2021. M. R. Glass, G. Rossiello, M. F. M. Chowdhury, and A. Gliozzo. paper code
  24. Everything is all it takes: A multipronged strategy for zero-shot cross-lingual information extraction, in EMNLP, 2021. M. Yarmohammadi, S. Wu, M. Marone, H. Xu, S. Ebner, G. Qin, Y. Chen, J. Guo, C. Harman, K. Murray, A. S. White, M. Dredze, and B. V. Durme. paper code
  25. An empirical study on multiple information sources for zero-shot fine-grained entity typing, in EMNLP, 2021. Y. Chen, H. Jiang, L. Liu, S. Shi, C. Fan, M. Yang, and R. Xu. paper
  26. Zero-shot dialogue state tracking via cross-task transfer, in EMNLP, 2021. Z. Lin, B. Liu, A. Madotto, S. Moon, Z. Zhou, P. Crook, Z. Wang, Z. Yu, E. Cho, R. Subba, and P. Fung. paper code
  27. Finetuned language models are zero-shot learners, in ICLR, 2022. J. Wei, M. Bosma, V. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V. Le. paper code
  28. Zero-shot stance detection via contrastive learning., in TheWebConf, 2022. B. Liang, Z. Chen, L. Gui, Y. He, M. Yang, and R. Xu. paper code
  29. Reframing instructional prompts to GPTk’s language, in Findings of ACL, 2022. D. Khashabi, C. Baral, Y. Choi, and H. Hajishirzi. paper
  30. JointCL: A joint contrastive learning framework for zero-shot stance detection, in ACL, 2022. B. Liang, Q. Zhu, X. Li, M. Yang, L. Gui, Y. He, and R. Xu. paper code
  31. Knowledgeable prompt-tuning: Incorporating knowledge into prompt verbalizer for text classification, in ACL, 2022. S. Hu, N. Ding, H. Wang, Z. Liu, J. Wang, J. Li, W. Wu, and M. Sun. paper code
  32. Uni-Perceiver: Pre-training unified architecture for generic perception for zero-shot and few-shot tasks, in CVPR, 2022. X. Zhu, J. Zhu, H. Li, X. Wu, H. Li, X. Wang, and J. Dai. paper

Variants of Few-shot Learning

  1. Continuous adaptation via meta-learning in nonstationary and competitive environments, in ICLR, 2018. M. Al-Shedivat, T. Bansal, Y. Burda, I. Sutskever, I. Mordatch, and P. Abbeel. paper
  2. Deep online learning via meta-learning: Continual adaptation for model-based RL, in ICLR, 2018. A. Nagabandi, C. Finn, and S. Levine. paper
  3. Incremental few-shot learning with attention attractor networks, in NeurIPS, 2019. M. Ren, R. Liao, E. Fetaya, and R. S. Zemel. paper code
  4. Bidirectional one-shot unsupervised domain mapping, in ICCV, 2019. T. Cohen, and L. Wolf. paper
  5. XtarNet: Learning to extract task-adaptive representation for incremental few-shot learning, in ICML, 2020. S. W. Yoon, D. Kim, J. Seo, and J. Moon. paper code
  6. Few-shot class-incremental learning, in CVPR, 2020. X. Tao, X. Hong, X. Chang, S. Dong, X. Wei, and Y. Gong. paper
  7. Wandering within a world: Online contextualized few-shot learning, in ICLR, 2021. M. Ren, M. L. Iuzzolino, M. C. Mozer, and R. Zemel. paper
  8. Repurposing pretrained models for robust out-of-domain few-shot learning, in ICLR, 2021. N. Kwon, H. Na, G. Huang, and S. Lacoste-Julien. paper code
  9. Prototypical cross-domain self-supervised learning for few-shot unsupervised domain adaptation, in CVPR, 2021. X. Yue, Z. Zheng, S. Zhang, Y. Gao, T. Darrell, K. Keutzer, and A. S. Vincentelli. paper
  10. Self-promoted prototype refinement for few-shot class-incremental learning, in CVPR, 2021. K. Zhu, Y. Cao, W. Zhai, J. Cheng, and Z. Zha. paper
  11. Semantic-aware knowledge distillation for few-shot class-incremental learning, in CVPR, 2021. A. Cheraghian, S. Rahman, P. Fang, S. K. Roy, L. Petersson, and M. Harandi. paper
  12. Few-shot incremental learning with continually evolved classifiers, in CVPR, 2021. C. Zhang, N. Song, G. Lin, Y. Zheng, P. Pan, and Y. Xu. paper
  13. Learning a universal template for few-shot dataset generalization, in ICML, 2021. E. Triantafillou, H. Larochelle, R. Zemel, and V. Dumoulin. paper
  14. GP-Tree: A gaussian process classifier for few-shot incremental learning, in ICML, 2021. I. Achituve, A. Navon, Y. Yemini, G. Chechik, and E. Fetaya. paper code
  15. Addressing catastrophic forgetting in few-shot problems, in ICML, 2021. P. Yap, H. Ritter, and D. Barber. paper code
  16. Few-shot conformal prediction with auxiliary tasks, in ICML, 2021. A. Fisch, T. Schuster, T. Jaakkola, and R. Barzilay. paper code
  17. Few-shot lifelong learning, in AAAI, 2021. P. Mazumder, P. Singh, and P. Rai. paper
  18. Few-shot class-incremental learning via relation knowledge distillation, in AAAI, 2021. S. Dong, X. Hong, X. Tao, X. Chang, X. Wei, and Y. Gong. paper
  19. Few-shot one-class classification via meta-learning, in AAAI, 2021. A. Frikha, D. Krompass, H. Koepken, and V. Tresp. paper code
  20. Practical one-shot federated learning for cross-silo setting, in IJCAI, 2021. Q. Li, B. He, and D. Song. paper code
  21. Incremental few-shot text classification with multi-round new classes: Formulation, dataset and system, in NAACL-HLT, 2021. C. Xia, W. Yin, Y. Feng, and P. S. Yu. paper
  22. Continual few-shot learning for text classification, in EMNLP, 2021. R. Pasunuru, V. Stoyanov, and M. Bansal. paper code
  23. Self-training with few-shot rationalization, in EMNLP, 2021. M. M. Bhat, A. Sordoni, and S. Mukherjee. paper
  24. Diverse distributions of self-supervised tasks for meta-learning in NLP, in EMNLP, 2021. T. Bansal, K. P. Gunasekaran, T. Wang, T. Munkhdalai, and A. McCallum. paper
  25. Generalized and incremental few-shot learning by explicit learning and calibration without forgetting, in ICCV, 2021. A. Kukleva, H. Kuehne, and B. Schiele. paper
  26. Meta learning on a sequence of imbalanced domains with difficulty awareness, in ICCV, 2021. Z. Wang, T. Duan, L. Fang, Q. Suo, and M. Gao. paper code
  27. Synthesized feature based few-shot class-incremental learning on a mixture of subspaces, in ICCV, 2021. A. Cheraghian, S. Rahman, S. Ramasinghe, P. Fang, C. Simon, L. Petersson, and M. Harandi. paper
  28. Few-shot and continual learning with attentive independent mechanisms, in ICCV, 2021. E. Lee, C. Huang, and C. Lee. paper code
  29. Low-shot validation: Active importance sampling for estimating classifier performance on rare categories, in ICCV, 2021. F. Poms, V. Sarukkai, R. T. Mullapudi, N. S. Sohoni, W. R. Mark, D. Ramanan, and K. Fatahalian. paper
  30. Overcoming catastrophic forgetting in incremental few-shot learning by finding flat minima, in NeurIPS, 2021. G. SHI, J. CHEN, W. Zhang, L. Zhan, and X. Wu. paper
  31. Variational continual Bayesian meta-learning, in NeurIPS, 2021. Q. Zhang, J. Fang, Z. Meng, S. Liang, and E. Yilmaz. paper
  32. LFPT5: A unified framework for lifelong few-shot language learning based on prompt tuning of T5, in ICLR, 2022. C. Qin, and S. Joty. paper code
  33. Subspace regularizers for few-shot class incremental learning, in ICLR, 2022. A. F. Akyürek, E. Akyürek, D. Wijaya, and J. Andreas. paper code
  34. Meta discovery: Learning to discover novel classes given very limited data, in ICLR, 2022. H. Chi, F. Liu, W. Yang, L. Lan, T. Liu, B. Han, G. Niu, M. Zhou, and M. Sugiyama. paper
  35. Topological transduction for hybrid few-shot learning., in TheWebConf, 2022. J. Chen, and A. Zhang. paper
  36. Continual few-shot relation learning via embedding space regularization and data augmentation, in ACL, 2022. C. Qin, and S. Joty. paper code
  37. Few-shot class-incremental learning for named entity recognition, in ACL, 2022. R. Wang, T. Yu, H. Zhao, S. Kim, S. Mitra, R. Zhang, and R. Henao. paper
  38. Task-adaptive negative envision for few-shot open-set recognition, in CVPR, 2022. S. Huang, J. Ma, G. Han, and S. Chang. paper code
  39. Forward compatible few-shot class-incremental learning, in CVPR, 2022. D. Zhou, F. Wang, H. Ye, L. Ma, S. Pu, and D. Zhan. paper code
  40. Sylph: A hypernetwork framework for incremental few-shot object detection, in CVPR, 2022. L. Yin, J. M. Perez-Rua, and K. J. Liang. paper
  41. Constrained few-shot class-incremental learning, in CVPR, 2022. M. Hersche, G. Karunaratne, G. Cherubini, L. Benini, A. Sebastian, and A. Rahimi. paper
  42. iFS-RCNN: An incremental few-shot instance segmenter, in CVPR, 2022. K. Nguyen, and S. Todorovic. paper
  43. MetaFSCIL: A meta-learning approach for few-shot class incremental learning, in CVPR, 2022. Z. Chi, L. Gu, H. Liu, Y. Wang, Y. Yu, and J. Tang. paper
  44. Few-shot incremental learning for label-to-image translation, in CVPR, 2022. P. Chen, Y. Zhang, Z. Li, and L. Sun. paper
  45. Revisiting learnable affines for batch norm in few-shot transfer learning, in CVPR, 2022. M. Yazdanpanah, A. A. Rahman, M. Chaudhary, C. Desrosiers, M. Havaei, E. Belilovsky, and S. E. Kahou. paper
  46. Few-shot learning with noisy labels, in CVPR, 2022. K. J. Liang, S. B. Rangrej, V. Petrovic, and T. Hassner. paper
  47. Improving adversarially robust few-shot image classification with generalizable representations, in CVPR, 2022. J. Dong, Y. Wang, J. Lai, and X. Xie. paper

Datasets/Benchmarks

  1. FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation, in EMNLP, 2018. X. Han, H. Zhu, P. Yu, Z. Wang, Y. Yao, Z. Liu, and M. Sun. paper code
  2. Meta-World: A benchmark and evaluation for multi-task and meta reinforcement learning, arXiv preprint, 2019. T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine. paper code
  3. The Omniglot challenge: A 3-year progress report, in Current Opinion in Behavioral Sciences, 2019. B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. paper code
  4. FewRel 2.0: Towards more challenging few-shot relation classification, in EMNLP-IJCNLP, 2019. T. Gao, X. Han, H. Zhu, Z. Liu, P. Li, M. Sun, and J. Zhou. paper code
  5. META-DATASET: A dataset of datasets for learning to learn from few examples, in ICLR, 2020. E. Triantafillou, T. Zhu, V. Dumoulin, P. Lamblin, U. Evci, K. Xu, R. Goroshin, C. Gelada, K. Swersky, P. Manzagol, and H. Larochelle. paper code
  6. Few-shot object detection with attention-rpn and multi-relation detector, in CVPR, 2020. Q. Fan, W. Zhuo, C.-K. Tang, Y.-W. Tai. paper code
  7. FSS-1000: A 1000-class dataset for few-shot segmentation, in CVPR, 2020. X. Li, T. Wei, Y. P. Chen, Y.-W. Tai, and C.-K. Tang. paper code
  8. Impact of base dataset design on few-shot image classification, in ECCV, 2020. O. Sbai, C. Couprie, and M. Aubry. paper code
  9. A large-scale benchmark for few-shot program induction and synthesis, in ICML, 2021. F. Alet, J. Lopez-Contreras, J. Koppel, M. Nye, A. Solar-Lezama, T. Lozano-Perez, L. Kaelbling, and J. Tenenbaum. paper code
  10. FEW-NERD: A few-shot named entity recognition dataset, in ACL-IJCNLP, 2021. N. Ding, G. Xu, Y. Chen, X. Wang, X. Han, P. Xie, H. Zheng, and Z. Liu. paper code
  11. CrossFit: A few-shot learning challenge for cross-task generalization in NLP, in EMNLP, 2021. Q. Ye, B. Y. Lin, and X. Ren. paper code
  12. ORBIT: A real-world few-shot dataset for teachable object recognition, in ICCV, 2021. D. Massiceti, L. Zintgraf, J. Bronskill, L. Theodorou, M. T. Harris, E. Cutrell, C. Morrison, K. Hofmann, and S. Stumpf. paper code
  13. FLEX: Unifying evaluation for few-shot NLP, in NeurIPS, 2021. J. Bragg, A. Cohan, K. Lo, and I. Beltagy. paper
  14. Two sides of meta-learning evaluation: In vs. out of distribution, in NeurIPS, 2021. A. Setlur, O. Li, and V. Smith. paper
  15. Realistic evaluation of transductive few-shot learning, in NeurIPS, 2021. O. Veilleux, M. Boudiaf, P. Piantanida, and I. B. Ayed. paper
  16. FewNLU: Benchmarking state-of-the-art methods for few-shot natural language understanding, in ACL, 2022. Y. Zheng, J. Zhou, Y. Qian, M. Ding, C. Liao, L. Jian, R. Salakhutdinov, J. Tang, S. Ruder, and Z. Yang. paper code
  17. Bongard-HOI: Benchmarking few-shot visual reasoning for human-object interactions, in CVPR, 2022. H. Jiang, X. Ma, W. Nie, Z. Yu, Y. Zhu, and A. Anandkumar. paper code

Software Library

  1. PaddleFSL, a library for few-shot learning written in PaddlePaddlelink
  2. Torchmeta, a library for few-shot learning & meta-learning written in PyTorchlink
  3. learn2learn, a library for meta-learning written in PyTorchlink
  4. keras-fsl, a library for few-shot learning written in Tensorflowlink

Few-Shot Learning (FSL): 小样本学习简介及其应用

摘自: https://research.aimultiple.com/few-shot-learning/

论文 :A Survey on Few-Shot Learning: https://arxiv.org/abs/1904.05046

wss介绍视频:https://www.youtube.com/c/ShusenWang

课件:https://github.com/wangshusen/DeepLearning

  如果手机需要成千上万张照片来训练才能进行人脸识别解锁,这是很不友好的。在机器学习应用领域,小样本学习(Few-shot Learning)(在刚刚描述的情况下称为单样本学习(one-shot learning))是一个热门话题,它能够基于少量的训练样本去预测。本文将讨论以下几个方面:

  • 什么是少样本学习(FSL)?
  • 它为什么如此重要?
  • 少样本学习有哪些应用?
  • 它是如何工作的?
  • 少样本学习和零样本学习有什么区别?
  • 少样本学习有哪些不同的方法?
  • 它是如何在 Python 中实现的?
  • 机器学习的未来

case:以相似度函数来进行图片分类:

训练:可以在大规模数据集中学习不同类别的相似性,使得同一类的相似度高,不同类别相似度低。

测试:输入query(测试图片)和 surport set(带标签的图片,要进行比较的不同类别的数据集不等于训练集)目的就是要让模型识别query和 surport set 中那个更相似。

1. 什么是小样本学习?

        小样本学习(Few-shot learning, FSL),在少数资料中也被称为low-shot learning(LSL)。小样本学习是一种训练数据集包含有限信息的机器学习问题。

        对于机器学习应用来说,通常的做法是提供尽可能多的数据。这是因为在大多数机器学习应用中,输入更多的数据训练能使模型的预测效果更好。然而,小样本学习的目标是使用数量较少的训练集来构建准确的机器学习模型。由于输入数据的维度是一个决定资源消耗成本(如,时间成本,计算成本等)的因素,我们可以通过使用小样本学习来降低数据分析/机器学习消耗成本。

2. 小样本学习为什么重要 ?

  • 类似人的学习方式:人在看过少量例子后就可以认出手写字符之间的不同。然而,计算机需要大量的数据去“分类”它看到的东西,并识别出手写字符之间的不同。小样本学习是一种test base的方法,我们期望它能像人一样从少量的样本中学习。
  • 稀有样本学习:小样本学习能用于稀有样本的学习。例如,当对动物图片进行分类时,用小样本学习训练的机器学习模型,在只得到少量的先验信息后,可以正确地对稀有样本的图像进行分类。
  • 降低数据收集和计算成本:由于小样本学习仅需要少量的数据来训练模型,消除了数据收集和标记相关的高成本。训练数据量少意味着训练数据集的维数低,这可以显着降低计算成本。

3. 小样本学习(Few-shot Learning)和零样本学习(Zero-shot Learning)的区别 

  小样本学习的目的是在有少量训练数据的情况下能获得准确分类测试样本的模型。零样本学习的目的是预测训练数据集中没有出现过的类别。零样本学习和小样本学习有很多共同的应用,例如:

  • 图像分类(image classification)
  • 语义分割(semantic segmentation)
  • 图像生成(image generation)
  • 目标检测(object detection)
  • 自然语言处理(natural language processing)

还有一种叫单样本学习(one-shot learning)的,它经常会和零样本学习混在一起。单样本学习是小样本学习问题的一个特例,它的目的是从一个训练样本或图片中学习到有关物体类别的信息。单样本学习的一个例子是,智能手机中使用的人脸识别技术。

4. 小样本学习的方法

5. 小样本学习的应用

5.1 计算机视觉:计算机视觉探索如何从数字图像或视频中获得高级理解。小样本学习在计算机视觉中主要用于处理以下问题:

5.2 自然语言处理:小样本学习使自然语言处理应用程序能够用很少的文本数据样本来完成任务。例如:

5.3 机器人:为了让机器人的行为更像人类,它们应该能够从少量的示例中归纳出信息。因此,小样本学习在训练机器人完成特定任务中扮演了一个关键角色,例如:

  • 通过模仿一个动作来学习该动作-learning a movement by imitating a single demonstration。IEEE****
  • 从少量示例中学习操作动作-learning manipulation actions from a few demonstrations。IEEE*****
  • 视觉导航-visual navigation。PMLR
  • 连续控制-continuous control。NIPS*****

5.4 声信号处理:包含有关声音信息的数据可以通过声信号处理进行分析,小样本在该方向的应用有:

5.5 其它应用

6. Python实现

机器学习的未来

IBM研究表明,机器学习在未来将围绕以下领域发展:

  • 经典机器学习:一次处理一个数据集、一个任务和一个繁重训练的问题
  • 基于小样本的机器学习:处理大量的离线训练,然后在类似的任务上轻松学习
  • 发展中的机器学习:持续学习各种任务。