These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The Department of Defense (DoD) has significantly increased its investment in
the design, evaluation, and deployment of Artificial Intelligence and Machine
Learning (AI/ML) capabilities to address national security needs. While there
are numerous AI/ML successes in the academic and commercial sectors, many of
these systems have also been shown to be brittle and nonrobust. In a complex
and ever-changing national security environment, it is vital that the DoD
establish a sound and methodical process to evaluate the performance and
robustness of AI/ML models before these new capabilities are deployed to the
field. This paper reviews the AI/ML development process, highlights common best
practices for AI/ML model evaluation, and makes recommendations to DoD
evaluators to ensure the deployment of robust AI/ML capabilities for national
security needs.