Cyber Defense Advisors

New critical AI vulnerabilities in TorchServe put thousands of AI models at risk

A trio of critical security issues were identified in TorchServe, an open source package for serving and scaling PyTorch models in production, that could lead to an attacker executing arbitrary codes on the affected systems.

Combinedly called ShellTorch, as coined by Oligo Security researchers who discovered them, the vulnerabilities can grant an attacker the privilege to view, modify, steal, and delete AI models and sensitive data on TorchServe server.

These vulnerabilities can completely compromise the AI infrastructure of the world’s biggest businesses, Oligo Security said. “These vulnerabilities can lead to a full chain Remote Code Execution (RCE), leaving countless thousands of services and end-users — including some of the world’s largest companies — open to unauthorized access and insertion of malicious AI models, and potentially a full server takeover.”

Two of the discovered vulnerabilities — CVE-2023-43654 and CVE-2023-1471 — carry CVSS scores of 9.8 and 9.9 respectively, while the third one doesn’t have a CVE entry yet.

Flaws allow remote code execution and server takeover

While serving models in production, TorchServe provisions fetching configuration files for the models from a remote URL using the workflow or model registration API. In one of the vulnerabilities (CVE-2023-43654), it was found that the API logic for an allowed list of domains accepts all domains as valid URLs, resulting in a server-side-request-forgery (SSRF).

“This allows an attacker to upload a malicious model that will be executed by the server, which results in arbitrary code execution,” Oligo Security said.

Another issue (CVE-2023-1471) involves TorchServe being vulnerable to a critical RCE via the SnakeYAML deserialization vulnerability, caused by misuse of the open source (Java) SnakeYAML library.

“AI models can include a YAML file to declare their desired configuration, so by uploading a model with a maliciously crafted YAML file, we were able to trigger an unsafe deserialization attack that resulted in code execution on the machine,” Oligo Security noted.

The third flaw (untagged yet) is a misconfiguration vulnerability within the management API of TorchServe, which is responsible for managing models at runtime. The API interface is configured to listen on the 0.0.0.0 port by default, making it accessible to external requests, both private and public.

All three vulnerabilities exploited in combination can allow remote code execution with high privileges, leading to a complete server takeover.

ShellTorch affects leading container environments

The deep learning container (DLC) by Amazon and Google has been found to be vulnerable to ShellTorch, according to Oligo. The managed services of Amazon and Google include compensating controls that reduce the exposure.

“AWS is aware of CVE-2023-43654 and CVE-2022-1471 in PyTorch TorchServe versions 0.3.0 to 0.8.1, which use a version of the SnakeYAML v1.31 open-source library,” Amazon said in an October 2 advisory against the vulnerabilities. “TorchServe version 0.8.2 resolves these issues. AWS recommends customers using PyTorch inference Deep Learning Containers (DLC) 1.13.1, 2.0.0, or 2.0.1 in EC2, EKS, or ECS released prior to September 11, 2023, update to TorchServe version 0.8.2.”

Amazon said that customers using PyTorch inference Deep Learning Containers (DLC) through Amazon SageMaker are not affected. Meta, the co-maintainer of the open source TorchServe library with Amazon, rapidly fixed the default management API to mitigate the third vulnerability. Oligo Security said that it worked with the maintainers of PyTorch for the responsible disclosure of these issues.

Vulnerabilities