AIセキュリティポータル K Program
Towards Code Watermarking with Dual-Channel Transformations
Share
Abstract
The expansion of the open source community and the rise of large language models have raised ethical and security concerns on the distribution of source code, such as misconduct on copyrighted code, distributions without proper licenses, or misuse of the code for malicious purposes. Hence it is important to track the ownership of source code, in which watermarking is a major technique. Yet, drastically different from natural languages, source code watermarking requires far stricter and more complicated rules to ensure the readability as well as the functionality of the source code. Hence we introduce SrcMarker, a watermarking system to unobtrusively encode ID bitstrings into source code, without affecting the usage and semantics of the code. To this end, SrcMarker performs transformations on an AST-based intermediate representation that enables unified transformations across different programming languages. The core of the system utilizes learning-based embedding and extraction modules to select rule-based transformations for watermarking. In addition, a novel feature-approximation technique is designed to tackle the inherent non-differentiability of rule selection, thus seamlessly integrating the rule-based transformations and learning-based networks into an interconnected system to enable end-to-end training. Extensive experiments demonstrate the superiority of SrcMarker over existing methods in various watermarking requirements.
Adversarial watermarking transformer: Towards tracing text provenance with data hiding
Sahar Abdelnabi, Mario Fritz
Published: 2021
Large-scale and language-oblivious code authorship identification
Mohammed Abuhamad, Tamer AbuHmed, Aziz Mohaisen, DaeHun Nyang
Published: 2018
Turning your weakness into a strength: Watermarking deep neural networks by backdooring
Y. Adi, C. Baum, M. Cisse, B. Pinkas, J. Keshet
Published: 2018
Multi-lingual evaluation of code generation models
Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, et al.
Published: 2022
Function level control flow obfuscation for software security
Vivek Balachandran, Ng Wee Keong, Sabu Emmanuel
Published: 2014
Hiding images in plain sight: Deep steganography
Shumeet Baluja
Published: 2017
Learning-based recursive aggregation of abstract syntax trees for code clone detection
Lutz Buch, Artur Andrzejak
Published: 2019
De-anonymizing programmers via code stylometry
Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, Clare Voss, Fabian Yamaguchi, Rachel Greenstadt
Published: 2015
A theory of dual channel constraints
Casey Casalnuovo, Earl T Barr, Santanu Kumar Dash, Prem Devanbu, Emily Morgan
Published: 2020
Natgen: generative pre-training by “naturalizing” source code
Saikat Chakraborty, Toufique Ahmed, Yangruibo Ding, Premkumar T Devanbu, Baishakhi Ray
Published: 2022
Practical linguistic steganography using contextual synonym substitution and a novel vertex coding method
Ching-Yun Chang, Stephen Clark
Published: 2014
Software watermarking for java program based on method name encoding
Jianping Chen, Kui Li, Wanzhi Wen, Weixu Chen, Chenxue Yan
Published: 2018
Hidden path: dynamic software watermarking based on control flow obfuscation
Zhe Chen, Chunfu Jia, Donghui Xu
Published: 2017
Learning phrase representations using rnn encoder–decoder for statistical machine translation
Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio
Published: 2014
Software watermarking in the frequency domain: implementation, analysis, and attacks
Christian Collberg, Tapas Ranjan Sahoo
Published: 2005
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
Published: 2019
Software watermarking: Progress and challenges
Ayan Dey, Sukriti Bhattacharya, Nabendu Chaki
Published: 2019
Rethinking deep neural network ownership verification: Embedding passports to defeat ambiguity attacks
Lixin Fan, Kam Woh Ng, Chee Seng Chan
Published: 2019
Generating steganographic images via adversarial training
Jamie Hayes, George Danezis
Published: 2017
On the naturalness of software
Abram Hindle, Earl T Barr, Mark Gabel, Zhendong Su, Premkumar Devanbu
Published: 2016
Codeattack: Code-based adversarial attacks for pre-trained programming language models
A. Jha, C. K. Reddy
Published: 2023
Entangled Watermarks as a Defense against Model Extraction
Hengrui Jia, Christopher A. Choquette-Choo, Varun Chandrasekaran, Nicolas Papernot
Published: 2020.2.28
Code authorship attribution: Methods and challenges
Vaibhavi Kalgutkar, Ratinder Kaur, Hugo Gonzalez, Natalia Stakhanova, Alina Matyukhina
Published: 2019
A review of text watermarking: theory, methods, and applications
Nurul Shamimi Kamaruddin, Amirrudin Kamsin, Lip Yee Por, Hameedur Rahman
Published: 2018
Softmark: Software watermarking via a binary function relocation
Honggoo Kang, Yonghwi Kwon, Sangjin Lee, Hyungjoon Koo
Published: 2021
Ropgen: Towards robust code authorship attribution via automatic coding style transformation
Zhen Li, Guenevere Chen, Chen Chen, Yayi Zou, Shouhuai Xu
Published: 2022
Decoupled weight decay regularization
Ilya Loshchilov, Frank Hutter
Published: 2018
Codexglue: A machine learning benchmark dataset for code understanding and generation
S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. Clement, D. Drain, D. Jiang, D. Tang
Published: 2022
Xmark: dynamic software watermarking using collatz conjecture
Haoyu Ma, Chunfu Jia, Shijia Li, Wantong Zheng, Dinghao Wu
Published: 2019
Syntactic tools for text watermarking
Hasan M Meral, Emre Sevinc, Bulent Sankur, A Sumru Ozsoy, Tunga Gungor
Published: 2007
A practical method for watermarking java programs
A. Monden, H. Iida, K. Matsumoto, K. Inoue, K. Torii
Published: 2000
Bleu: a method for automatic evaluation of machine translation.
Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu
Published: 2002
A survey of digital watermarking techniques, applications and attacks
Prabhishek Singh, Ramneet Singh Chadha
Published: 2013
Coprotector: Protect open-source code against unauthorized training usage with data poisoning
Zhensu Sun, Xiaoning Du, Fu Song, Mingze Ni, Li Li
Published: 2022
Software plagiarism detection with birthmarks based on dynamic key instruction sequences
Zhenzhou Tian, Qinghua Zheng, Ting Liu, Ming Fan, Eryue Zhuang, Zijiang Yang
Published: 2015
Words are not enough: sentence level natural language watermarking
Mercan Topkara, Umut Topkara, Mikhail J Atallah
Published: 2006
The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions
Umut Topkara, Mercan Topkara, Mikhail J Atallah
Published: 2006
Watermarking the outputs of structured prediction with an application in statistical machine translation
Ashish Venugopal, Jakob Uszkoreit, David Talbot, Franz Josef Och, Juri Ganitkevitch
Published: 2011
You see what I want you to see: poisoning vulnerabilities in neural code search
Yao Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Dezhong Yao, Hai Jin, Lichao Sun
Published: 2022
Bridging pre-trained models and downstream tasks for source code understanding
Deze Wang, Zhouyang Jia, Shanshan Li, Yue Yu, Yun Xiong, Wei Dong, Xiangke Liao
Published: 2022
Exception handling-based dynamic software watermarking
Yilong Wang, Daofu Gong, Bin Lu, Fei Xiang, Fenlin Liu
Published: 2018
Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation
Y. Wang, W. Wang, S. Joty, S. C. Hoi
Published: 2021
Tracing text provenance via context-aware lexical substitution
Xi Yang, Jie Zhang, Kejiang Chen, Weiming Zhang, Zehua Ma, Feng Wang, Nenghai Yu
Published: 2022
Natural attack for pre-trained models of code
Zhou Yang, Jieke Shi, Junda He, David Lo
Published: 2022
Share