MINT use different class to abstract out the API of different LLMs. You can find the list of implemented LLMs in mint/agents/__init__.py.
For closed-source models:
OpenAILMAgentfor OpenAI API.BardLMAgentfor Bard.ClaudeLMAgentfor Anthropic Claude.
For open-source models, we have VLLMAgent that can be used to evaluate any LLMs that can be served with VLLM or FastChat into an OpenAI-compatible API.
If you want to evaluate an open-source LLM that can be served with VLLM or FastChat: First, refer to docs/SERVING.md to learn about how to serve your model. Then, modify mint/configs/config_variables.py by adding a dictionary describing the model to be evaluated into EVALUATED_MODEL_LIST.
# For Chat Model
{
"agent_class": "VLLMAgent",
"config": {
"model_name": "<YOUR_MODEL_NAME>",
"chat_mode": True,
"max_tokens": 512,
"temperature": 0.0,
"openai.api_base": "<YOUR_API_BASE>",
"add_system_message": False,
},
}
# For Completion-only Model
{
"agent_class": "VLLMAgent",
"config": {
"model_name": "Llama-2-70b-hf",
"chat_mode": False,
"max_tokens": 512,
"temperature": 0.0,
"openai.api_base": "<YOUR_API_BASE>",
"add_system_message": False,
},
},If you want to evaluate another closed-source LLM with a different API schema than the existing implementation: You need to implement a new agent class that inherits from LMAgent (PR welcomed!).
You can use mint/agents/openai_lm_agent.py as an example, then add this model configuration to mint/configs/config_variables.py similar to the above.
We implemented three different feedback agent classes:
If you want to use an existing open-source model compatible with VLLM or FastChat, you can add a configuration similar to the above to FEEDBACK_PROVIDER_LIST in mint/configs/config_variables.py.
FEEDBACK_PROVIDER_LIST = [
...
{
"agent_class": "VLLMFeedbackAgent",
"model_name": "<YOUR_MODEL_NAME>",
"openai.api_base": "<YOUR_API_BASE>",
"chat_mode": True, # Set to False if your model is completion-only
},
...
]If needed, you can use these classes as an example to implement your own feedback agent class (PR welcomed!). Then, add this model configuration to FEEDBACK_PROVIDER_LIST in mint/configs/config_variables.py. For example:
FEEDBACK_PROVIDER_LIST = [
...
{
# Your custom feedback provider
"agent_class": "<YOUR_FEEDBACK_AGENT_CLASS>",
"model_name": "<YOUR_FEEDBACK_MODEL_NAME>",
},
...
]Optionally, you can change different experiment settings in mint/configs/config_variables.py.
This specifies the settings of the environment. Here is an example:
ENV_CONFIGS = [
...,
{
"max_steps": 5,
"use_tools": True,
"max_propose_solution": 2,
"count_down": True,
},
...
]where max_steps corresponds to the budget of interaction (k) in the paper, use_tools should always be True (no tool setting is not implemented yet), max_propose_solution is the maximum number of solutions that the evaluated LLM can propose, and count_down is whether to count down the remaining steps in the environment (read Section 2 in the paper for more detail).
This specifies the types of feedback we instruct the feedback-providing LLM to provide. Here are all the settings we currently support:
FEEDBACK_TYPES = [
{"pseudo_human_feedback": "no_GT", "feedback_form": "textual"}, # default setting
{"pseudo_human_feedback": "no_GT", "feedback_form": "binary"},
{"pseudo_human_feedback": "GT", "feedback_form": "binary"},
{"pseudo_human_feedback": "GT", "feedback_form": "textual"},
]pseudo_human_feedbackspecifies whether we provide a ground-truth solution of the problem to the feedback-providing LLM.no_GTmeans we do not provide a ground-truth solution (default setting), andGTmeans we provide ground-truth feedback.feedback_formspecifies the form of feedback we provide.textualmeans we provide textual feedback (default setting), andbinarymeans we instruct the feedback-provider to provide binary feedback.