5 months ago

CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing

Elnaggar Ahmed ; Ding Wei ; Jones Llion ; Gibbs Tom ; Feher Tamas ; Angerer Christoph ; Severini Silvia ; Matthes Florian ; Rost Burkhard

Abstract

Currently, a growing number of mature natural language processingapplications make people's life more convenient. Such applications are built bysource code - the language in software engineering. However, the applicationsfor understanding source code language to ease the software engineering processare under-researched. Simultaneously, the transformer model, especially itscombination with transfer learning, has been proven to be a powerful techniquefor natural language processing tasks. These breakthroughs point out apromising direction for process source code and crack software engineeringtasks. This paper describes CodeTrans - an encoder-decoder transformer modelfor tasks in the software engineering domain, that explores the effectivenessof encoder-decoder transformer models for six software engineering tasks,including thirteen sub-tasks. Moreover, we have investigated the effect ofdifferent training strategies, including single-task learning, transferlearning, multi-task learning, and multi-task learning with fine-tuning.CodeTrans outperforms the state-of-the-art models on all the tasks. To expeditefuture works in the software engineering domain, we have published ourpre-trained models of CodeTrans. https://github.com/agemagician/CodeTrans

Code Repositories

agemagician/CodeTrans

Official

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
api-sequence-recommendation-on-deepapi	CodeTrans-MT-TF-Large	BLEU-4: 73.39
code-comment-generation-on-deepcom	CodeTrans-TF-Large	Smoothed BLEU-4: 39.50
code-documentation-generation-on	CodeTrans-MT-Base	Smoothed BLEU-4: 20.39
code-documentation-generation-on-1	CodeTrans-MT-Large	Smoothed BLEU-4: 21.87
code-documentation-generation-on-2	CodeTrans-TF-Large	Smoothed BLEU-4: 19.54
code-documentation-generation-on-3	CodeTrans-MT-Base	Smoothed BLEU-4: 26.23
code-documentation-generation-on-4	CodeTrans-MT-Base	Smoothed BLEU-4: 15.26
code-documentation-generation-on-5	CodeTrans-TF-Large	Smoothed BLEU-4: 18.98
git-commit-message-generation-on-commitgen	CodeTrans-TF-Large	BLEU-4: 44.41
program-synthesis-on-algolisp	CodeTrans-MT-TF-Small	Accuracy: 90.31
source-code-summarization-on-summarizing-1	CodeTrans-MT-Large	Smoothed BLEU-4: 23.57
source-code-summarization-on-summarizing-2	CodeTrans-MT-Base	Smoothed BLEU-4: 13.37
source-code-summarization-on-summarizing-3	CodeTrans-MT-TF-Large	Smoothed BLEU-4: 19.98

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette