KDD 2024

NL2Code

Monday, August 26 (9:00 AM – 1:00 PM)

@ Centre de Convencions Internacional de Barcelona

Overview

Large language models (LLMs) is an active area of research that has had a significant impact on both academia and industry. Both proprietary and open models, such as Code Llama, have demonstrated significant capability for code development tasks such as code completion, test generation, and code summarization.

However, the next leap will involve reasoning and planning with LLM trained on code. Reasoning is of core importance to code development and future LLM coding capabilities. The inputs to the reasoning process are multifaceted. Common ones include the source code and error logs for code translation and debugging. Additional information could be gained through static analysis of the code, such as abstract syntax tree (AST), a tree representation of the structure of the source code. Yet another source of information is the runtime profiler, where information regarding where the runtime is spent is collected.

There are many advanced reasoning techniques in the code development domain. Notable examples include self-debug and flow-engineering. Planning is also a critical component for applying LLM for code development. There are many tools that have been developed in the past to assist code development. Examples include tools for test case generation, tools for security risk identification from source code, and tools for runtime performance bottleneck analysis. How to combine the tools together to accomplish a complex task is a planning problem and a very important research direction.

Topics we will be covering

Here are some of the topics we will be covering...

  • Code Generation with LLMs
  • Code Completion and Auto-suggestion
  • Natural Language Interfaces for Programming: Design and evaluation of natural language interfaces that leverage LLMs for translating human-readable descriptions or queries into executable code.
  • Code Understanding and Summarization to enhance code comprehension and maintainability.
  • Program Analysis and Debugging: such as bug detection, program slicing, and static code analysis.

Call for Papers and Demonstration

Submit papers or demos by June 13th, 2024
Author Notification by July 12th, 2024
Camera Ready Deadline on July 27th...

...on challenges in code generation, understanding, and optimization using LLMs. Accepted submissions undergo double-blind review, with selected papers presented during the workshop.

Accepted Papers

NL2KQL: From Natural Language to Kusto Query

Xinye Tang, Amir H. Abdi, Jeremias Eichelbaum, Mahan Das, Alex Klein, Nihal Irmak Pakis, William Blum, Daniel L. Mace, Tanvi Raja, Namrata Padmanabhan, Ye Xing (Microsoft)

A Comparative Study of DSL Code Generation: Fine-Tuning vs. Optimized Retrieval Augmentation

CHHAYA METHANI (Microsoft); Nastaran Bassamzadeh (Microsoft)

Data Representation Driven LLM Feature Engineer

Hebin Liang (Tianjin university); Jinyi Liu (Tianjin University); Zilin Cao ( Tianjin University ); Yifu Yuan (Tianjin University); Fei Ni (Tianjin University); Jianye Hao (Tianjin University); Yan Zheng (Tianjin University)

CompoundCoder: A Compound AI System Tool for Generating High-Quality Synthetic Code

Lipika Ramaswamy (Gretel.ai); Yev Meyer (Gretel); Andre Manoel (Gretel); Alexander Watson (Gretel.ai); Dhruv Nathawani (Gretel.ai)

A Framework for Training Large Language Models for Code Generation via Proximal Policy Optimization

Chi Zhang (Bytedance); Guangming Sheng (the University of Hong Kong); Siyao Liu ( Bytedance ); Jiahao Li ( Bytedance ); FengZi Yuan (Bytedance); Zherui Liu (ByteDance); Xin Liu (Bytedance); Xiaoying Jia (Bytedance); Yanghua Peng (ByteDance Inc.); Haibin Lin (Bytedance); Chuan Wu (The University of Hong Kong)

From Critique to Clarity: A Pathway to Faithful and Personalized Code Explanations with Large Language Models

Zhuang Luo (WPI); Zexing Xu (University of Illinois at Urbana-Champaign); Yichuan Li (WPI); Rasoul Etesami (University of Illinois Urbana-Champaign); Kyumin Lee (Worcester Polytechnic Institute)

Program Outline

Here is the program outline with details on each of the events happening.

9:00am-11:00am

Keynote Talks

11:00am-11:10am

Break

11:10am-12:00pm

Panel Discussion

12:00pm-1pm

Presentation

1

expand_more

[9:00am-9:40am] Copilot Taking Off

Join Shuyin, Github Copilot product leader, who will be sharing the journey and insights of launching the world's first at scale AI developer tool, and future directions it may evolve.

[9:40am-10:20am] The Future of NL2Code: Powering the Shift from Code to Conversation

Join Alex Watson as he explores the future of natural language to code systems, discussing how these innovations are transforming experiences.

[10:20am-11:00am] Reducing LLM Hallucination in Program Analysis Tasks

Xiangyu Zhang will share recent efforts in reducing LLM hallucination in program analysis tasks such as decompilation, data-flow analysis, and bug finding.

2

expand_more

[11:10am-12:00pm] Panel Discussion with Shuyin Zhao, Alex Watson and Xiangyu Zhang

Join us for an engaging panel discussion that covering the latest trends and insights in the industry.

3

expand_more

[12:00pm-1:00pm] Presenting the Accepted Papers

This session will be about the 6 accepted papers, each allocated 10 mins.

Speakers

Here are the speakers who will be presenting at the event.

Shuyin Zhao

Shuyin Zhao

VP of Product for GitHub Copilot

Alex Watson

Alex Watson

Co-founder and Chief Product Officer at Gretel

Xiangyu Zhang

Xiangyu Zhang

Professor at Purdue specializing in AI security, software analysis and cyber forensics

Meet the Organizers

Arjun Guha

Arjun Guha

Associate Professor at Northeastern University and Visiting Professor at Roblox Research

Jun (Luke) Huan

Jun (Luke) Huan

Principal Scientist at AWS AI. Dr. Huan works on AI and Data Science

Murali Krishna Ramanathan

Murali Krishna Ramanathan

Principal Applied Scientist at AWS

Cong Shen

Cong Shen

Assistant Professor at the University of Virginia

Jie Tang

Jie Tang

Full Professor in the Department of Computer Science at the Tsinghua University

Omer Tripp

Omer Tripp

Principal Scientist at AWS

Ye Xing

Ye Xing

Principal Machine Learning Engineer Manager at Microsoft

Katherine Lin

Katherine Lin

Principal Applied Scientist Manager at Microsoft

Wee Hyong Tok

Wee Hyong Tok

Partner Director, Microsoft, working on Data and AI products

Johannes Gehrke

Johannes Gehrke

Technical Fellow at Microsoft, working on AI for the Microsoft Copilots