2021 KDD Workshop on Programming Language Processing (PLP)

 Virtual Conference
 Workshop Date: 8:00am - 12:00pm, August 15, 2021 (Singapore Time)


  • PLP 2021 Final Files Submission through the hyper-link. Please complete this form by 1 August 2021. Thank you.
  • Data Mining and Knowledge Discovery has accepted our special issue on programming language processing. The planned submission deadline is October 31, 2021.
  • Data Mining and Knowledge Discovery has expressed an interest on the topic of programming language processing. We are preparing a formal special issue proposal. If successful, high-quality workshop papers will be recommended to this special issue.
  • It has been decided to postpone the paper submission deadline to June 3rd, 2021. We look forward to receiving your final contributions.


Programming language origins in natural language. Different from natural language that is used by humans amongst themselves, programming languages allow humans to tell machines what to do. The meaningful identifier names and natural language documentation allow other developers to understand the author’s intent and then maintain and extend the code. At the same time, the substantial information contained in the code enables the intervention of machine learning algorithms in a variety of software engineering tasks. However, the mining of programming languages could not exactly follow the manner of natural language processing, because of their difference. Programming languages need a high degree of expertise, completeness and precision because computer cannot think outside the statement while natural language may be informal and allow minor errors. The programming language syntax is also not based on natural language grammar. We have witnessed an increasing number of successful machine learning techniques for natural language processing, e.g., GPT (Generative Pre- Training) by Open AI, and BERT (Bidirectional Encoder Representations from Transformers) for language understanding. In this deep learning era, what are the challenges and opportunities to deploy such NLP breakthroughs in programming language processing? What is the current more specialised model for programming language processing? How do machine learning and software engineering researchers apply the knowledge in collaboration to further the field and improve intelligence of the code? We propose to invite world-leading experts from both machine learning and software engineering to discuss and debate the path forward for mining the value of programming languages.

Topic of Interest

This workshop will provide a premium platform for researchers from both academia and industry to exchange ideas on opportunities, challenges, and cutting-edge techniques of machine learning for software engineering applications and systems. Papers will be accepted under the topics including, but not limited to, the following three broad categories:

Novel Machine Learning Techniques for Programming Language
  • Weakly supervised machine learning for programming languages
  • Pretrained models for programming languages
  • Deep generative models for programming languages
  • Graph convolutional neural networks for programming languages
  • Sequence modelling for programming languages
  • Machine translation for programming languages

Novel Machine Learning Applications to Software Engineering Problems
  • Deployment of languages to different platforms
  • Code generation, optimization, and synthesis
  • Software language validation
  • Compilation and interpretation techniques
  • Software language design and implementation
  • Testing techniques for languages
  • Simulation techniques for languages

Novel Machine Learning Systems of Software Engineering Tasks
  • Code recommendation systems
  • Dialogue and Interactive Systems
  • Performance benchmarks
  • User studies evaluating usability
  • Programming tools, including refactoring editors, checkers, compilers and debuggers
  • Techniques in secure, parallel, distributed, embedded or mobile environments

Call for Papers

Submissions should follow the SIGKDD formatting requirements and will be evaluated using the SIGKDD Research Track evaluation criteria. Preference will be given to papers that are reproducible, and authors are encouraged to share their data and code publicly whenever possible. Submissions are strongly recommended to be no more than 4 pages, excluding references or supplementary materials (all in a single pdf). The appropriateness of using additional pages over the recommended length will be judged by reviewers. Papers must be submitted in PDF format to easychair https://easychair.org/conferences/?conf=plp2021 and formatted according to the new Standard ACM Conference Proceedings Template .

The review process is single-round and double-blind (submission files have to be anonymized). The program committee will select the papers based on originality, presentation, and technical quality for spotlight and/or poster presentation. Concurrent submissions to other journals and conferences are acceptable.

Any questions may be directed to: c.xu@sydney.edu.au or slivia.ma@uq.edu.au.


Venue: Virtual. (Zoom link will be provided in the KDD virtual conference app)  

Start Time End Time Title Speaker
8:00 am 8:10 am Introduction and Welcome TBD
8:50 am 9:30 am Keynote: TBD Wray Buntine (Professor, Monash University)
8:10 am 8:50 am Keynote: Pre-trained Models and Benchmark for Code Intelligence Nan Duan (Principal Researcher, Microsoft Research Asia)
9:30 am 9:40 am Spotlight: A Survey on Semantic Parsing for Machine Programming Celine Lee
9:40 am 9:50 am Spotlight: SmartConDetect: Highly Accurate Smart Contract Code Vulnerability Detection Mechanism using BERT Sowon Jeon
9:50 am 10:00 am Spotlight: Convolutional neural network based cipher algorithm attack scheme **
10:00 am 10:40 am GatherTown Poster Session / Coffee Break N/A
10:40 am 11:20 am Keynote: TBD Yang Liu (Professor, Nanyang Technological University)
11:20 am 11:30 am Spotlight: A Compiler Fingerprint Extraction-oriented Approach to Binary File Analysis **
11:30 am 11:40 am Spotlight: A Methodology for Refined Evaluation of ML-based Code Completion Approaches Kim Tuyen Le
11:40 am 11:50 am Spotlight: DOBF: A Deobfuscation Pre-Training Objective for Programming Languages Baptiste Roziere
11:50 pm 12:00 pm Closing Remarks TBD

Keynote Speakers

Wray Buntine

Bio: Wray Buntine is a full professor at Monash University in February 2014 after 7 years at NICTA in Canberra Australia. At Monash he was foundation director of the Master of Data Science, and is now directory of the Machine Learning Group. He was previously at Helsinki Institute for Information Technology where he ran a semantic search project, NASA Ames Research Center, University of California, Berkeley, and Google. In the '90s he was involved in a number of startups for both Wall Street and Silicon Valley. He is known for his theoretical and applied work and in probabilistic methods for document and text analysis, social networks, data mining and machine learning. He is on several journal editorial boards and has been senior programme committee member for premier conferences such as IJCAI, UAI, ACML and SIGKDD. He has over 200 academic publications, several software products and two patents.

Nan Duan

Bio: Dr. Nan DUAN is currently a principal researcher & research manager at Microsoft Research Asia. He is also an adjunct professor at Tianjin University. His research interests include question answering, semantic parsing, large-scale pre-trained models, code intelligence and machine reasoning. He served as evaluation chair of NLPCC and SAC or AC of ACL/EMNLP/NAACL. He is a senior member of CCF and the CCF Distinguished Speaker. He was awarded the CCF-NLPCC Distinguished Young Scientist. He leads the benchmark dataset efforts such as XGLUE and CodeXGLUE. He published 100+ research papers and holds 10+ patents. His research has been applied in many Microsoft products.

Yang Liu

Bio: Dr. Yang Liu obtained his bachelor and ph.d degree in the National University of Singapore in 2005 and 2010, respectively. In 2012, he joined Nanyang Technological University as a Nanyang Assistant Professor. He is currently a full professor, director of the cybersecurity lab, Program Director of HP-NTU Corporate Lab and Deputy Director of the National Satellite of Excellence of Singapore. In 2019, he received the University Leadership Forum Chair professorship at NTU. Dr. Liu specializes in software verification, security and software engineering. His research has bridged the gap between the theory and practical usage of formal methods and program analysis to evaluate the design and implementation of software for high assurance and security. By now, he has more than 300 publications in top tier conferences and journals. He has received a number of prestigious awards including MSRA Fellowship, TRF Fellowship, Nanyang Assistant Professor, Tan Chin Tuan Fellowship, Nanyang Research Award (Young Investigator) 2018, NRF Investigatorship 2020 and 10 best paper awards and one most influence system award in top software engineering conferences like ASE, FSE and ICSE.

Accepted Papers

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages
Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec and Guillaume Lample
Convolutional neural network based cipher algorithm attack scheme
Yichuan Wang, Xin Wang, Xinhong Hei, Lei Zhu, Wenjiang Ji and Wenbin Hu
A Compiler Fingerprint Extraction-oriented Approach to Binary File Analysis
Xinhong Hei, Yilei Yao, Yichuan Wang, Jinpei Yan, Wenjiang Ji, Lei Zhu and Yanning Du
A Survey on Semantic Parsing for Machine Programming
Celine Lee, Justin Gottschlich and Dan Roth
A Methodology for Refined Evaluation of ML-based Code Completion Approaches
Kim Tuyen Le, Gabriel Rashidi and Artur Andrzejak
SmartConDetect: Highly Accurate Smart Contract Code Vulnerability Detection Mechanism using BERT
Sowon Jeon, Gilhee Lee, Hyoungshick Kim and Simon Woo

Key Dates

  • Paper Submission deadline: May 20th, 2021     June 3rd, 2021
  • Acceptance Notice: June 10th, 2021
  • Camera Ready Submission: June 20th, 2021
  • Final Files Submission: August 1st, 2021
  • Workshop Date: 8:00am - 12:00pm, August 15, 2021 (Singapore Time)

All deadlines are 11.59 pm UTC -12h ("Anywhere on Earth").


  • Chang Xu, University of Sydney, Australia
  • Siqi Ma, University of Queensland, Australia
  • David Lo, Singapore Management University