5 months ago

Optimizing Large Language Models for OpenAPI Code Completion

Petryshyn Bohdan ; Lukoševičius Mantas

Abstract

Recent advancements in Large Language Models (LLMs) and their utilization incode generation tasks have significantly reshaped the field of softwaredevelopment. Despite the remarkable efficacy of code completion solutions inmainstream programming languages, their performance lags when applied to lessubiquitous formats such as OpenAPI definitions. This study evaluates theOpenAPI completion performance of GitHub Copilot, a prevalent commercial codecompletion tool, and proposes a set of task-specific optimizations leveragingMeta's open-source model Code Llama. A semantics-aware OpenAPI completionbenchmark proposed in this research is used to perform a series of experimentsthrough which the impact of various prompt-engineering and fine-tuningtechniques on the Code Llama model's performance is analyzed. The fine-tunedCode Llama model reaches a peak correctness improvement of 55.2% over GitHubCopilot despite utilizing 25 times fewer parameters than the commercialsolution's underlying Codex model. Additionally, this research proposes anenhancement to a widely used code infilling training technique, addressing theissue of underperformance when the model is prompted with context sizes smallerthan those used during training. The dataset, the benchmark, and the modelfine-tuning code are made publicly available.

Code Repositories

BohdanPetryshyn/code-llama-fim-fine-tuning

Official

pytorch

Mentioned in GitHub

BohdanPetryshyn/openapi-completion-benchmark

Official

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
openapi-code-completion-on-openapi-code	Code Llama 7B	Correctness, avg., %: 31.1 Correctness, max., %: 36 Validness, avg., %: 60.7 Validness, max., %: 64
openapi-code-completion-on-openapi-code	Code Llama 7B, fine-tuned with document splitting	Correctness, avg., %: 34 Correctness, max., %: 42 Validness, avg., %: 69.1 Validness, max., %: 76
openapi-code-completion-on-openapi-code	GitHub Copilot	Correctness, avg., %: 29 Correctness, max., %: 29 Validness, avg., %: 68 Validness, max., %: 68
openapi-code-completion-on-openapi-code	Code Llama 7B, fine-tuned at 4096 tokens	Correctness, avg., %: 32 Correctness, max., %: 45 Validness, avg., %: 63.1 Validness, max., %: 84

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette