HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Contrastive Code Representation Learning

Paras Jain; Ajay Jain; Tianjun Zhang; Pieter Abbeel; Joseph E. Gonzalez; Ion Stoica

Contrastive Code Representation Learning

Abstract

Recent work learns contextual representations of source code by reconstructing tokens from their context. For downstream semantic understanding tasks like summarizing code in English, these representations should ideally capture program functionality. However, we show that the popular reconstruction-based BERT model is sensitive to source code edits, even when the edits preserve semantics. We propose ContraCode: a contrastive pre-training task that learns code functionality, not form. ContraCode pre-trains a neural network to identify functionally similar variants of a program among many non-equivalent distractors. We scalably generate these variants using an automated source-to-source compiler as a form of data augmentation. Contrastive pre-training improves JavaScript summarization and TypeScript type inference accuracy by 2% to 13%. We also propose a new zero-shot JavaScript code clone detection dataset, showing that ContraCode is both more robust and semantically meaningful. On it, we outperform RoBERTa by 39% AUROC in an adversarial setting and up to 5% on natural code.

Code Repositories

parasj/contracode
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
code-summarization-on-codesearchnetContraCode
F1: 17.24
method-name-prediction-on-codesearchnetContraCode
F1: 17.24
type-prediction-on-deeptyperContraCode
Accuracy@5: 84.60

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Contrastive Code Representation Learning | Papers | HyperAI