Workshop Date: 8:00am - 12:00pm, August 15, 2021 (Singapore Time)
Programming language origins in natural language. Different from natural language that is used by humans amongst themselves, programming languages allow humans to tell machines what to do. The meaningful identifier names and natural language documentation allow other developers to understand the author’s intent and then maintain and extend the code. At the same time, the substantial information contained in the code enables the intervention of machine learning algorithms in a variety of software engineering tasks. However, the mining of programming languages could not exactly follow the manner of natural language processing, because of their difference. Programming languages need a high degree of expertise, completeness and precision because computer cannot think outside the statement while natural language may be informal and allow minor errors. The programming language syntax is also not based on natural language grammar. We have witnessed an increasing number of successful machine learning techniques for natural language processing, e.g., GPT (Generative Pre- Training) by Open AI, and BERT (Bidirectional Encoder Representations from Transformers) for language understanding. In this deep learning era, what are the challenges and opportunities to deploy such NLP breakthroughs in programming language processing? What is the current more specialised model for programming language processing? How do machine learning and software engineering researchers apply the knowledge in collaboration to further the field and improve intelligence of the code? We propose to invite world-leading experts from both machine learning and software engineering to discuss and debate the path forward for mining the value of programming languages.
Submissions should follow the SIGKDD formatting requirements and will be evaluated using the SIGKDD Research Track evaluation criteria. Preference will be given to papers that are reproducible, and authors are encouraged to share their data and code publicly whenever possible. Submissions are strongly recommended to be no more than 4 pages, excluding references or supplementary materials (all in a single pdf). The appropriateness of using additional pages over the recommended length will be judged by reviewers. Papers must be submitted in PDF format to easychair https://easychair.org/conferences/?conf=plp2021 and formatted according to the new Standard ACM Conference Proceedings Template .
The review process is single-round and double-blind (submission files have to be anonymized). The program committee will select the papers based on originality, presentation, and technical quality for spotlight and/or poster presentation. Concurrent submissions to other journals and conferences are acceptable.
Venue: Virtual. (Zoom link will be provided in the KDD virtual conference app)
|Start Time||End Time||Title||Speaker|
|8:00 am||8:10 am||Introduction and Welcome||David Lo|
|8:10 am||8:50 am||Keynote: Deep Learning for Programming [Slides]||Wray Buntine (Professor, Monash University)|
|8:50 am||9:30 am||Keynote: Pre-trained Models and Benchmark for Code Intelligence [Slides]||Nan Duan (Principal Researcher, Microsoft Research Asia)|
|9:30 am||9:40 am||Spotlight: A Survey on Semantic Parsing for Machine Programming||Celine Lee|
|9:40 am||9:50 am||Spotlight: SmartConDetect: Highly Accurate Smart Contract Code Vulnerability Detection Mechanism using BERT||Sowon Jeon|
|9:50 am||10:00 am||Spotlight: Convolutional neural network based cipher algorithm attack scheme||Xin Wang|
|10:00 am||10:40 am||GatherTown Poster Session / Coffee Break ( https://gather.town/invite?token=3inCNFLf)||N/A|
|10:40 am||11:20 am||Keynote: Secure Deep Learning Engineering: A Road towards Quality Assurance of Intelligent Systems||Yang Liu (Professor, Nanyang Technological University)|
|11:20 am||11:30 am||Spotlight: A Compiler Fingerprint Extraction-oriented Approach to Binary File Analysis||Yilei Yao|
|11:30 am||11:40 am||Spotlight: A Methodology for Refined Evaluation of ML-based Code Completion Approaches||Kim Tuyen Le|
|11:40 am||11:50 am||Spotlight: DOBF: A Deobfuscation Pre-Training Objective for Programming Languages||Baptiste Roziere|
|11:50 pm||12:00 pm||Closing Remarks||Chang Xu|
Bio: Wray Buntine is a full professor at Monash University in February 2014 after 7 years at NICTA in Canberra Australia. At Monash he was foundation director of the Master of Data Science, and is now directory of the Machine Learning Group. He was previously at Helsinki Institute for Information Technology where he ran a semantic search project, NASA Ames Research Center, University of California, Berkeley, and Google. In the '90s he was involved in a number of startups for both Wall Street and Silicon Valley. He is known for his theoretical and applied work and in probabilistic methods for document and text analysis, social networks, data mining and machine learning. He is on several journal editorial boards and has been senior programme committee member for premier conferences such as IJCAI, UAI, ACML and SIGKDD. He has over 200 academic publications, several software products and two patents.
Bio: Dr. Nan DUAN is currently a principal researcher & research manager at Microsoft Research Asia. He is also an adjunct professor at Tianjin University. His research interests include question answering, semantic parsing, large-scale pre-trained models, code intelligence and machine reasoning. He served as evaluation chair of NLPCC and SAC or AC of ACL/EMNLP/NAACL. He is a senior member of CCF and the CCF Distinguished Speaker. He was awarded the CCF-NLPCC Distinguished Young Scientist. He leads the benchmark dataset efforts such as XGLUE and CodeXGLUE. He published 100+ research papers and holds 10+ patents. His research has been applied in many Microsoft products.
Bio: Dr. Yang Liu obtained his bachelor and ph.d degree in the National University of Singapore in 2005 and 2010, respectively. In 2012, he joined Nanyang Technological University as a Nanyang Assistant Professor. He is currently a full professor, director of the cybersecurity lab, Program Director of HP-NTU Corporate Lab and Deputy Director of the National Satellite of Excellence of Singapore. In 2019, he received the University Leadership Forum Chair professorship at NTU. Dr. Liu specializes in software verification, security and software engineering. His research has bridged the gap between the theory and practical usage of formal methods and program analysis to evaluate the design and implementation of software for high assurance and security. By now, he has more than 300 publications in top tier conferences and journals. He has received a number of prestigious awards including MSRA Fellowship, TRF Fellowship, Nanyang Assistant Professor, Tan Chin Tuan Fellowship, Nanyang Research Award (Young Investigator) 2018, NRF Investigatorship 2020 and 10 best paper awards and one most influence system award in top software engineering conferences like ASE, FSE and ICSE.
DOBF: A Deobfuscation Pre-Training Objective for Programming Languages|
Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec and Guillaume Lample
Convolutional neural network based cipher algorithm attack scheme|
Yichuan Wang, Xin Wang, Xinhong Hei, Lei Zhu, Wenjiang Ji and Wenbin Hu
A Compiler Fingerprint Extraction-oriented Approach to Binary File Analysis|
Xinhong Hei, Yilei Yao, Yichuan Wang, Jinpei Yan, Wenjiang Ji, Lei Zhu and Yanning Du
A Survey on Semantic Parsing for Machine Programming|
Celine Lee, Justin Gottschlich and Dan Roth
A Methodology for Refined Evaluation of ML-based Code Completion Approaches|
Kim Tuyen Le, Gabriel Rashidi and Artur Andrzejak
SmartConDetect: Highly Accurate Smart Contract Code Vulnerability Detection Mechanism using BERT|
Sowon Jeon, Gilhee Lee, Hyoungshick Kim and Simon Woo
All deadlines are 11.59 pm UTC -12h ("Anywhere on Earth").