Free resource analyzes the performance of ChatGPT, Google BARD, Claude, LLAMA2-based open LLMs
Skyhawk Security, the originator of cloud threat detection and response launched the industry’s first benchmark for evaluating large language models’ (LLMs) ability to identify and score cybersecurity threats within various cloud logs and telemetries. The resource also provides a ranking of these LLMs based on their performance. As part of efforts to strengthen the broader cloud security industry, the data will be regularly updated and available to view free of charge on Skyhawk’s website.
CIO INFLUENCE News: Cisco and Nutanix Forge Global Strategic Partnership to Simplify Hybrid Multicloud and Fuel Business Transformation
The benchmark and LLM leaderboard will be formally presented today during a session led by Skyhawk’s Director of AI and Research, Amir Shachar, at the Cloud Security Alliance’s SECtember conference. The session takes place at 1:30 p.m. Pacific in room 405.
“The importance of swiftly and effectively detecting cloud security threats cannot be overstated. We firmly believe that harnessing generative AI can greatly benefit security teams in that regard, however, not all large language models are created equal,” said Amir Shachar. “In creating this benchmark, we hope to increase confidence in the power of LLMs for cloud security by providing a clear view of how well these tools can classify malicious activities. We’re testing them for you on human-labeled attack flow sequences based on business-driven evaluation metrics. We also integrate human security researchers’ insights with self-improving LLM-based AI agents to enhance the classification process.”
In this benchmark, Skyhawk looks at ChatGPT, Google Bard, Falcon and other LLAMA2-based open LLMs. The goal was to see how accurately each of these LLMs predicted the maliciousness of an attack sequence that was extracted and created by Skyhawk Security’s machine learning models. The output from the models was compared to a sample of hundreds of human-labeled sequences and scored in three ways: Precision, Recall and F1 Score. The closer to “one” the scores are, the more accurate the predictability of the LLM.
CIO INFLUENCE News: Cisco Secure Application Delivers Business Risk Observability for Cloud Native Applications
The release of Skyhawk’s LLM benchmark reinforces the company’s dedication to innovating with generative AI in the cloud security space. The news comes on the heels of the launch of Skyhawk’s Shift Left CDR solution within its existing Skyhawk Synthesis Security Platform. The novel approach shifts the threat detection process to the “left,” or the perimeter, of the cloud network as well as IAM. Skyhawk’s cloud threat detection and response uses contextual analysis of the cloud infrastructure and determines potential paths hackers could take to a company’s “crown jewels.” This information enables security teams to identify serious threats much earlier in the incident and prioritize those that pose the highest risk to crown jewels to prevent them from becoming a breach.
CIO INFLUENCE News: Kyndryl and Cisco Expand Partnership Focusing on Cyber Resilience
[To share your insights with us, please write to sghosh@martechseries.com]