Large Language Models (LLMs) have shown significant challenges in detecting
and repairing vulnerable code, particularly when dealing with vulnerabilities
involving multiple aspects, such as variables, code flows, and code structures.
In this study, we utilize GitHub Copilot as the LLM and focus on buffer
overflow vulnerabilities. Our experiments reveal a notable gap in Copilot's
abilities when dealing with buffer overflow vulnerabilities, with a 76%
vulnerability detection rate but only a 15% vulnerability repair rate. To
address this issue, we propose context-aware prompt tuning techniques designed
to enhance LLM performance in repairing buffer overflow. By injecting a
sequence of domain knowledge about the vulnerability, including various
security and code contexts, we demonstrate that Copilot's successful repair
rate increases to 63%, representing more than four times the improvement
compared to repairs without domain knowledge.