These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Software supply-chain attacks are an important and ongoing concern in the
open source software ecosystem. These attacks maintain the standard
functionality that a component implements, but additionally hide malicious
functionality activated only when the component reaches its target environment.
Lexo addresses such stealthy attacks by automatically learning and regenerating
vulnerability-free versions of potentially malicious components. Lexo first
generates a set of input-output pairs to model a component's full observable
behavior, which it then uses to synthesize a new version of the original
component. The new component implements the original functionality but avoids
stealthy malicious behavior. Throughout this regeneration process, Lexo
consults several distinct instances of Large Language Models (LLMs), uses
correctness and coverage metrics to shepherd these instances, and guardrails
their results. Our evaluation on 100+ real-world packages, including high
profile stealthy supply-chain attacks, indicates that Lexo scales across
multiple domains, regenerates code efficiently (<100s on average), maintains
compatibility, and succeeds in eliminating malicious code in several real-world
supply-chain-attacks, even in cases when a state-of-the-art LLM fails to
eliminate malicious code when prompted to do so.