KDD 2024
NL2Code
Monday, August 26 (9:00 AM – 1:00 PM)
@ Centre de Convencions Internacional de Barcelona
Overview
Large language models (LLMs) is an active area of research that has
had a significant impact on both academia and industry. Both
proprietary and open models, such as Code Llama, have demonstrated
significant capability for code development tasks such as code
completion, test generation, and code summarization.
However, the next leap will involve reasoning and planning with LLM
trained on code. Reasoning is of core importance to code development
and future LLM coding capabilities. The inputs to the reasoning
process are multifaceted. Common ones include the source code and
error logs for code translation and debugging. Additional
information could be gained through static analysis of the code,
such as abstract syntax tree (AST), a tree representation of the
structure of the source code. Yet another source of information is
the runtime profiler, where information regarding where the runtime
is spent is collected.
There are many advanced reasoning techniques in the code development
domain. Notable examples include self-debug and flow-engineering.
Planning is also a critical component for applying LLM for code
development. There are many tools that have been developed in the
past to assist code development. Examples include tools for test
case generation, tools for security risk identification from source
code, and tools for runtime performance bottleneck analysis. How to
combine the tools together to accomplish a complex task is a
planning problem and a very important research direction.
Topics we will be covering
Here are some of the topics we will be covering...
- Code Generation with LLMs
- Code Completion and Auto-suggestion
- Natural Language Interfaces for Programming: Design and evaluation of natural language interfaces that leverage LLMs for translating human-readable descriptions or queries into executable code.
- Code Understanding and Summarization to enhance code comprehension and maintainability.
- Program Analysis and Debugging: such as bug detection, program slicing, and static code analysis.
Call for Papers and Demonstration
Submit papers or demos by June 13th, 2024
Author Notification by July 12th, 2024
Camera Ready Deadline on July 27th...
...on challenges in code generation, understanding, and optimization using LLMs. Accepted submissions undergo double-blind review, with selected papers presented during the workshop.
Accepted Papers
NL2KQL: From Natural Language to Kusto Query
Xinye Tang, Amir H. Abdi, Jeremias Eichelbaum, Mahan Das, Alex Klein, Nihal Irmak Pakis, William Blum, Daniel L. Mace, Tanvi Raja, Namrata Padmanabhan, Ye Xing (Microsoft)
A Comparative Study of DSL Code Generation: Fine-Tuning vs. Optimized Retrieval Augmentation
CHHAYA METHANI (Microsoft); Nastaran Bassamzadeh (Microsoft)
Data Representation Driven LLM Feature Engineer
Hebin Liang (Tianjin university); Jinyi Liu (Tianjin University); Zilin Cao ( Tianjin University ); Yifu Yuan (Tianjin University); Fei Ni (Tianjin University); Jianye Hao (Tianjin University); Yan Zheng (Tianjin University)
CompoundCoder: A Compound AI System Tool for Generating High-Quality Synthetic Code
Lipika Ramaswamy (Gretel.ai); Yev Meyer (Gretel); Andre Manoel (Gretel); Alexander Watson (Gretel.ai); Dhruv Nathawani (Gretel.ai)
A Framework for Training Large Language Models for Code Generation via Proximal Policy Optimization
Chi Zhang (Bytedance); Guangming Sheng (the University of Hong Kong); Siyao Liu ( Bytedance ); Jiahao Li ( Bytedance ); FengZi Yuan (Bytedance); Zherui Liu (ByteDance); Xin Liu (Bytedance); Xiaoying Jia (Bytedance); Yanghua Peng (ByteDance Inc.); Haibin Lin (Bytedance); Chuan Wu (The University of Hong Kong)
From Critique to Clarity: A Pathway to Faithful and Personalized Code Explanations with Large Language Models
Zhuang Luo (WPI); Zexing Xu (University of Illinois at Urbana-Champaign); Yichuan Li (WPI); Rasoul Etesami (University of Illinois Urbana-Champaign); Kyumin Lee (Worcester Polytechnic Institute)
Program Outline
Here is the program outline with details on each of the events happening.
9:00am-11:00am
Keynote Talks
11:00am-11:10am
Break
11:10am-12:00pm
Panel Discussion
12:00pm-1pm
Presentation
1
[9:00am-9:40am] Copilot Taking Off
Join Shuyin, Github Copilot product leader, who will be sharing the journey and insights of launching the world's first at scale AI developer tool, and future directions it may evolve.
[9:40am-10:20am] The Future of NL2Code: Powering the Shift from Code to Conversation
Join Alex Watson as he explores the future of natural language to code systems, discussing how these innovations are transforming experiences.
[10:20am-11:00am] Reducing LLM Hallucination in Program Analysis Tasks
Xiangyu Zhang will share recent efforts in reducing LLM hallucination in program analysis tasks such as decompilation, data-flow analysis, and bug finding.
2
[11:10am-12:00pm] Panel Discussion with Shuyin Zhao, Alex Watson and Xiangyu Zhang
Join us for an engaging panel discussion that covering the latest trends and insights in the industry.
3
[12:00pm-1:00pm] Presenting the Accepted Papers
This session will be about the 6 accepted papers, each allocated 10 mins.
Speakers
Here are the speakers who will be presenting at the event.
Shuyin Zhao
VP of Product for GitHub Copilot
Alex Watson
Co-founder and Chief Product Officer at Gretel
Xiangyu Zhang
Professor at Purdue specializing in AI security, software analysis and cyber forensics
Meet the Organizers
Arjun Guha
Associate Professor at Northeastern University and Visiting Professor at Roblox Research
Jun (Luke) Huan
Principal Scientist at AWS AI. Dr. Huan works on AI and Data Science
Murali Krishna Ramanathan
Principal Applied Scientist at AWS
Cong Shen
Assistant Professor at the University of Virginia
Jie Tang
Full Professor in the Department of Computer Science at the Tsinghua University
Omer Tripp
Principal Scientist at AWS
Ye Xing
Principal Machine Learning Engineer Manager at Microsoft
Katherine Lin
Principal Applied Scientist Manager at Microsoft
Wee Hyong Tok
Partner Director, Microsoft, working on Data and AI products
Johannes Gehrke
Technical Fellow at Microsoft, working on AI for the Microsoft Copilots