Writing Challenging Examples

  1. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh. ACL 2020. [pdf]
  2. Evaluating Models’ Local Decision Boundaries via Contrast Sets. Matt Gardner, Yoav Artzi, Victoria Basmov, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hannaneh Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou. Findings of EMNLP 2020. [pdf]
  3. Learning the Difference that Makes a Difference with Counterfactually-Augmented Data. Divyansh Kaushik, Eduard Hovy, Zachary C. Lipton. ICLR 2020. [pdf]
  4. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs. Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner. NAACL 2019. [pdf]
  5. Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples. Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, Jordan Boyd-Graber. TACL 2019. [pdf]
  6. Adversarial NLI: A New Benchmark for Natural Language Understanding. Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, Douwe Kiela. ACL 2020. [pdf]
  7. Dynabench: Rethinking Benchmarking in NLP. Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush, Sebastian Riedel, Zeerak Waseem, Pontus Stenetorp, Robin Jia, Mohit Bansal, Christopher Potts, Adina Williams. NAACL 2021. [pdf]
  8. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference. Rowan Zellers, Yonatan Bisk, Roy Schwartz, Yejin Choi. EMNLP 2018. [pdf]
  9. Adversarial Filters of Dataset Biases. Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, Rowan Zellers, Matthew E. Peters, Ashish Sabharwal, Yejin Choi. ICML 2020. [pdf]

Finding Lack of Robustness (Attacks)

  1. Generating Natural Adversarial Examples. Zhengli Zhao, Dheeru Dua, Sameer Singh. ICLR 2018. [pdf]
  2. Semantically Equivalent Adversarial Rules for Debugging NLP models. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. ACL 2018.
  3. Generating Natural Language Adversarial Examples. Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, Kai-Wei Chang. EMNLP 2018. [pdf]
  4. Universal Adversarial Triggers for Attacking and Analyzing NLP. Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh. EMNLP-IJCNLP 2019. [pdf]
  5. Weight Poisoning Attacks on Pre-trained Models. Keita Kurita, Paul Michel, Graham Neubig. ACL 2020 [link]
  6. Generating Label Cohesive and Well-Formed Adversarial Claims. Pepa Atanasova, Dustin Wright, Isabelle Augenstein. EMNLP 2020. [link]
  7. Towards Controllable Biases in Language Generation. Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. EMNLP-Finding 2020. [link]
  8. Adversarial Semantic Collisions. Congzheng Song, Alexander M. Rush, Vitaly Shmatikov. EMNLP 2020. [link]
  9. Concealed Data Poisoning Attacks on NLP Models. Eric Wallace, Tony Z. Zhao, Shi Feng, Sameer Singh. NAACL 2021 [link]
  10. Universal Adversarial Attacks with Natural Triggers for Text Classification. Liwei Song, Xinwei Yu, Hsuan-Tung Peng, Karthik Narasimhan. NAACL 2021. [link]
  11. Surveys:
    1. Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey. Wei Emma Zhang, Quan Z. Sheng, Ahoud Alhazmi, Chenliang Li. [link]
    2. Towards a Robust Deep Neural Network in Texts: A Survey. Wenqi Wang, Run Wang, Lina Wang, Zhibo Wang, Aoshuang Ye. [link]
    3. Analysis Methods in Neural Language Processing: A Survey. Yonatan Belinkov, James Glass. TACL 2019. [link]
    4. Adversarial Attacks and Defenses in Images, Graphs and Text: A Review. Han Xu, Yao Ma, Haochen Liu, Debayan Deb, Hui Liu, Jiliang Tang, Anil K. Jain. [link]
    5. Adversarial Attacks and Defense on Texts: A Survey. Aminul Huq, Mst. Tasnim Pervin. [link]

Robustness to Spurious Correlation (Defense)

Adversrial Training (Defenses)

  1. Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble. Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-Wei Chang, Xuanjing Huang. ACL-IJCNLP 2021. [pdf]
  2. Adversarial Training with Fast Gradient Projection Method against Synonym Substitution based Text Attacks. Xiaosen Wang, Yichen Yang, Yihe Deng, Kun He. AAAI 2021. [pdf]
  3. Adversarial Training for Free!. Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, Tom Goldstein. NeurIPS 2019. [pdf]
  4. FreeLB: Enhanced Adversarial Training for Language Understanding. Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Tom Goldstein, Jingjing Liu. CoRR 2019. [pdf]
  5. Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification. Yichao Zhou, Jyun-Yu Jiang, Kai-Wei Chang, Wei Wang. EMNLP-IJCNLP 2019. [pdf]
  6. Towards Improving Adversarial Training of NLP Models. Jin Yong Yoo, Yanjun Qi. EMNLP-Finding 2021. [pdf]
  7. Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution. Zongyi Li, Jianhan Xu, Jiehang Zeng, Linyang Li, Xiaoqing Zheng, Qi Zhang, Kai-Wei Chang, Cho-Jui Hsieh. EMNLP 2021.[link]

Certified Robustness (Defenses)

  1. Certified Robustness to Adversarial Word Substitutions. Robin Jia, Aditi Raghunathan, Kerem Göksel, Percy Liang. EMNLP 2019. [pdf]
  2. Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation. Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli. EMNLP 2019. [pdf]
  3. Robustness Verification for Transformers. Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, Cho-Jui Hsieh. ICLR 2020. [pdf]
  4. Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond. Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, Cho-Jui Hsieh. NeurIPS 2020. [pdf]
  5. Robust Encodings: A Framework for Combating Adversarial Typos. Erik Jones, Robin Jia, Aditi Raghunathan, Percy Liang. ACL 2020. [pdf]
  6. Certified Adversarial Robustness via Randomized Smoothing. Jeremy M Cohen, Elan Rosenfeld, J. Zico Kolter. ICML 2019. [pdf]
  7. SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions. Mao Ye, Chengyue Gong, Qiang Liu. ACL 2020. [pdf]
  1. Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning. Linyang Li, Demin Song, Xiaonan Li, Jiehang Zeng, Ruotian Ma, Xipeng Qiu. [link]
  2. Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer. Fanchao Qi, Yangyi Chen, Xurui Zhang, Mukai Li, Zhiyuan Liu, Maosong Sun. [link]
  3. Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder. Alvin Chan, Yi Tay, Yew-Soon Ong, Aston Zhang. [link]
  4. Mitigating Data Poisoning in Text Classification with Differential Privacy. Chang Xu, Jun Wang, Francisco Guzmán, Benjamin Rubinstein, Trevor Cohn. [link]
  5. Multi-granularity Textual Adversarial Attack with Behavior Cloning. Yangyi Chen, Jin Su, Wei Wei. [link]
  6. RockNER: A Simple Method to Create Adversarial Examples for Evaluating the Robustness of Named Entity Recognition Models. Bill Yuchen Lin, Wenyang Gao, Jun Yan, Ryan Moreno, Xiang Ren. [link]
  7. SeqAttack: On Adversarial Attacks for Named Entity Recognition. Walter Simoncini, Gerasimos Spanakis. [link]
  8. Gradient-based Adversarial Attacks against Text Transformers. Chuan Guo, Alexandre Sablayrolles, Hervé Jégou, Douwe Kiela. [link]
  9. Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods. Peru Bhardwaj, John Kelleher, Luca Costabello, Declan O’Sullivan. [link]
  10. Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution. Zongyi Li, Jianhan Xu, Jiehang Zeng, Linyang Li, Xiaoqing Zheng, Qi Zhang, Kai-Wei Chang, Cho-Jui Hsieh. [link]
  11. On the Transferability of Adversarial Attacks against Neural Text Classifier. Liping Yuan, Xiaoqing Zheng, Yi Zhou, Cho-Jui Hsieh, Kai-Wei Chang. [link]
  12. Don’t Search for a Search Method — Simple Heuristics Suffice for Adversarial Text Attacks. Nathaniel Berger, Stefan Riezler, Sebastian Ebert, Artem Sokolov. [link]
  13. Towards Improving Adversarial Training of NLP Models. Jin Yong Yoo, Yanjun Qi. [pdf]

Some other related papers can be found at https://github.com/thunlp/TAADpapers.