Technological advancements have revolutionized numerous industries, including
transportation. While digitalization, automation, and connectivity have
enhanced safety and efficiency, they have also introduced new vulnerabilities.
With 95% of data breaches attributed to human error, promoting cybersecurity
awareness in transportation is increasingly critical. Despite numerous
cyberattacks on transportation systems worldwide, comprehensive and centralized
records of these incidents remain scarce. To address this gap and enhance cyber
awareness, this paper presents a large language model (LLM) based approach to
extract and organize transportation related cyber incidents from publicly
available datasets. A key contribution of this work is the use of generative AI
to transform unstructured, heterogeneous cyber incident data into structured
formats. Incidents were sourced from the Center for Strategic & International
Studies (CSIS) List of Significant Cyber Incidents, the University of Maryland
Cyber Events Database (UMCED), the European Repository of Cyber Incidents
(EuRepoC), the Maritime Cyber Attack Database (MCAD), and the U.S. DOT
Transportation Cybersecurity and Resiliency (TraCR) Examples of Cyber Attacks
in Transportation (2018 to 2022). These were classified by a fine tuned LLM
into five transportation modes: aviation, maritime, rail, road, and multimodal,
forming a transportation specific cyber incident database. Another key
contribution of this work is the development of a Retrieval Augmented
Generation question answering system, designed to enhance accessibility and
practical use by enabling users to query the curated database for specific
details on transportation related cyber incidents. By leveraging LLMs for both
data extraction and user interaction, this study contributes a novel,
accessible tool for improving cybersecurity awareness in the transportation
sector.