Issue you'd like to raise.
TL;DR: The use of exec() in agents can lead to remote code execution vulnerabilities. Some Huggingface projects use such agents, despite the potential harm of LLM-generated Python code.
#1026 and #814 discuss the security concerns regarding the use of exec() in llm_math chain. The comments in #1026 proposed methods to sandbox the code execution, but due to environmental issues, the code was patched to replace exec() with numexpr.evaluate() (#2943). This restricted the execution capabilities to mathematical functionalities only. This bug was assigned the CVE number CVE-2023-29374.
As shown in the above issues, the usage of exec() in a chain can pose a significant security risk, especially when the chain is running on a remote machine. This seems common scenario for projects in Huggingface.
However, in the latest langchain, exec() is still used in PythonReplTool and PythonAstReplTool.
https://github.com/hwchase17/langchain/blob/aec642febb3daa7dbb6a19996aac2efa92bbf1bd/langchain/tools/python/tool.py#L55
https://github.com/hwchase17/langchain/blob/aec642febb3daa7dbb6a19996aac2efa92bbf1bd/langchain/tools/python/tool.py#L102
These functions are called by Pandas Dataframe Agent, Spark Dataframe Agent, CSV Agent. It seems they are intentionally designed to pass the LLM output to PythonAstTool or PythonAstReplTool to execute the LLM-generated code in the machine.
The documentation for these agents explicitly states that they should be used with caution since LLM-generated Python code can be potentially harmful. For instance:
https://github.com/hwchase17/langchain/blob/aec642febb3daa7dbb6a19996aac2efa92bbf1bd/docs/modules/agents/toolkits/examples/pandas.ipynb#L12
Despite this, I have observed several projects in Huggingface using create_pandas_dataframe_agent and create_csv_agent.
Suggestion:
Fixing this issue as done in llm_math chain seems challenging.
Simply restricting the LLM-generated code to Pandas and Spark execution might not be sufficient because there are still numerous malicious tasks that can be performed using those APIs. For instance, Pandas can read and write files.
Meanwhile, it seems crucial to emphasize the security concerns related to LLM-generated code for the overall security of LLM apps. Merely limiting execution to specific frameworks or APIs may not fully address the underlying security risks.
Issue you'd like to raise.
TL;DR: The use of exec() in agents can lead to remote code execution vulnerabilities. Some Huggingface projects use such agents, despite the potential harm of LLM-generated Python code.
#1026 and #814 discuss the security concerns regarding the use of
exec()in llm_math chain. The comments in #1026 proposed methods to sandbox the code execution, but due to environmental issues, the code was patched to replaceexec()withnumexpr.evaluate()(#2943). This restricted the execution capabilities to mathematical functionalities only. This bug was assigned the CVE number CVE-2023-29374.As shown in the above issues, the usage of
exec()in a chain can pose a significant security risk, especially when the chain is running on a remote machine. This seems common scenario for projects in Huggingface.However, in the latest langchain,
exec()is still used inPythonReplToolandPythonAstReplTool.https://github.com/hwchase17/langchain/blob/aec642febb3daa7dbb6a19996aac2efa92bbf1bd/langchain/tools/python/tool.py#L55
https://github.com/hwchase17/langchain/blob/aec642febb3daa7dbb6a19996aac2efa92bbf1bd/langchain/tools/python/tool.py#L102
These functions are called by Pandas Dataframe Agent, Spark Dataframe Agent, CSV Agent. It seems they are intentionally designed to pass the LLM output to
PythonAstToolorPythonAstReplToolto execute the LLM-generated code in the machine.The documentation for these agents explicitly states that they should be used with caution since LLM-generated Python code can be potentially harmful. For instance:
https://github.com/hwchase17/langchain/blob/aec642febb3daa7dbb6a19996aac2efa92bbf1bd/docs/modules/agents/toolkits/examples/pandas.ipynb#L12
Despite this, I have observed several projects in Huggingface using
create_pandas_dataframe_agentandcreate_csv_agent.Suggestion:
Fixing this issue as done in llm_math chain seems challenging.
Simply restricting the LLM-generated code to Pandas and Spark execution might not be sufficient because there are still numerous malicious tasks that can be performed using those APIs. For instance, Pandas can read and write files.
Meanwhile, it seems crucial to emphasize the security concerns related to LLM-generated code for the overall security of LLM apps. Merely limiting execution to specific frameworks or APIs may not fully address the underlying security risks.