永州网站建设哪里有,wordpress主题 红木,新站seo外包,做外贸做的很好的网站Executable Code Actions Elicit Better LLM Agents Github: https://github.com/xingyaoww/code-act
一、动机
大语言模型展现出很强的推理能力。但是现如今大模型作为Agent的时候#xff0c;在执行Action时依然还是通过text-based#xff08;文本模态#xff09;后者JSO…Executable Code Actions Elicit Better LLM Agents Github: https://github.com/xingyaoww/code-act
一、动机
大语言模型展现出很强的推理能力。但是现如今大模型作为Agent的时候在执行Action时依然还是通过text-based文本模态后者JSON的形式呈现。通过text-based或JSON来实现工具的理解调用、memory的管理等。
然而基于文本或JSON的动作空间通常比较局限且灵活性较差。例如某些动作可能需要借助变量暂存或者是一些较为复杂的动作取均值、排序等。 最近大模型也被发现能够在代码理解和生成任务上展现很强的能力。那么是否可以将代码作为Agent执行Action的基础呢
二、方法
本文提出CodeActCode as actions试图全部使用Python代码Jupyter Notebook内核环境来代替原始的基于文本或JSON的Action。 python实现Jupyter Notebook内核的调用示例https://github.com/xingyaoww/code-act/blob/main/mint/tools/python_tool.py CodeAct与Text/JSON的对比如下图所示
当给定一个指令、候选API工具时基于Text/JSON的Action推理过程则需要给出文本模态的推理解释涉及到的工具则转换为JSON格式并根据预先设置好的API函数执行调用从而获得环境的反馈基于CodeAct的Action此时直接生成相应的Python代码并且通过Python环境来执行代码获得结果可发现CodeAct可以降低多轮对话的轮次提高效率且准确性也高于Text/JSON。
前提假设code data广泛应用于当今大模型的预训练中。这些模型已经熟悉结构化编程语言允许LLM生成可执行的Python因为Python代码数量最多且软件包也很丰富代码作为Action
CodeAct与Python Interpreter集成可以执行代码行动并动态调整先前的行动或根据通过多轮交互代码执行收到的观察结果发出新行动。代码行动允许LLM利用现有软件包。CodeAct可以使用现成的Python包来扩展行动空间而不是手工制作的特定于任务的工具 Self-Refine/Debug根据执行反馈调整动作与JSON和预定义格式的文本相比代码本质上支持控制和数据流允许将中间结果存储为变量以供重用
CodeAct框架如下图所示 首先定义三个元素分别是
Agent大模型User提供Query的用户Environment这里是Python执行环境可以提供执行结果
CodeAct以多轮交互的形式实现这三个元素之间的交流。 For each turn of interaction, the agent receives an observation (input) either from the user (e.g., natural language instruction) or the environment (e.g., code execution result), optionally planning for its action through chain-of-thought (Wei et al., 2022), and emits an action (output) to either user in natural language or the environment. CodeAct employs Python code to consolidate all actions for agent-environment interaction. In CodeAct, each emitted action to the environment is a piece of Python code, and the agent will receive outputs of code execution (e.g., results, errors) as observation. 预实验一作为工具使用框架的前景
对照实验以了解哪种格式文本、JSON、CodeAct更有可能引导LLM生成正确的原子工具调用。
开源模型效果好闭源模型部分不如JSON作者认为这是因为openai的模型针对JSON格式数据优化过 预实验二CodeAct以更少的交互完成更多工作
实验部分主要基于作者自己提出的M3ToolEval benchmark82个样本
CodeAct通常具有更高的任务成功率17个已评估LLM中有12个。使用CodeAct执行任务所需的平均交互轮数也较低。与操作格式文本相比最佳模型gpt-4-1106可以减少1/5的交互 应用多轮交互和使用Libraries
作者展示了LLM agents如何与Python tools
使用现有软件在多轮交互中执行复杂的任务。得益于在预训练期间学到的Python codeLLM Agents可以自动导入正确的Python库来解决任务不需要用户提供的tools或demonstrations因为python代码作为action可以自行实现软件包调用并通过代码实现 完整的推理过程https://chat.xwang.dev/conversation/Vqn108G
增强开源LLM Agents
和Tool LLM一样的套路再整一个包含智能体与环境交互轨迹的指令微调数据集CodeActInstruct 主要来自如下四种数据源对原始训练数据做了一些调整后获得 实验结果 CodeActAgentMistral的性能优于类似规模7B/13B的开源模型有70B模型相似的性能。 总结这个文章主要是对existing codellms的技术包括代码执行反馈等 agent actions / planning tune轨迹的整合technical novelty适中但是整个工作覆盖的面很广
实现
CodeAct整体上作出的优化点应该是纯Prompt即设计Prompt来让大模型通过Python代码来执行Action。 CodeAct的指令提示 prompt母版
You are a helpful assistant assigned with the task of problem-solving. To achieve this, you will be using an interactive coding environment equipped with a variety of tool functions to assist you throughout the process.At each turn, you should first provide your step-by-step thinking for solving the task. Your thought process should be enclosed using thought tag, for example: thought I need to print Hello World! /thought.After that, you have two options:1) Interact with a Python programming environment and receive the corresponding output. Your code should be enclosed using execute tag, for example: execute print(Hello World!) /execute.
2) Directly provide a solution that adheres to the required format for the given task. Your solution should be enclosed using solution tag, for example: The answer is solution A /solution.You have {max_total_steps} chances to interact with the environment or propose a solution. You can only propose a solution {max_propose_solution} times.{tool_desc}---{in_context_example}---{task_prompt}以WikiTableQuestion任务为例1-shot exemplar可以定义如下 system prompt exemplar-input exemplar-output 核心代码如下
LLM Agent类
from .base import LMAgent
import openai
import logging
import traceback
from mint.datatypes import Action
import backoffLOGGER logging.getLogger(MINT)class OpenAILMAgent(LMAgent):def __init__(self, config):super().__init__(config)assert model_name in config.keys()backoff.on_exception(backoff.fibo,# https://platform.openai.com/docs/guides/error-codes/python-library-error-types(openai.error.APIError,openai.error.Timeout,openai.error.RateLimitError,openai.error.ServiceUnavailableError,openai.error.APIConnectionError,),)def call_lm(self, messages):# Prepend the prompt with the system messageresponse openai.ChatCompletion.create(modelself.config[model_name],messagesmessages,max_tokensself.config.get(max_tokens, 512),temperatureself.config.get(temperature, 0),stopself.stop_words,)return response.choices[0].message[content], response[usage]def act(self, state):messages state.historytry:lm_output, token_usage self.call_lm(messages)for usage_type, count in token_usage.items():state.token_counter[usage_type] countaction self.lm_output_to_action(lm_output)return actionexcept openai.error.InvalidRequestError: # mostly due to model context window limittb traceback.format_exc()return Action(f, False, errorfInvalidRequestError\n{tb})# except Exception as e:# tb traceback.format_exc()# return Action(f, False, errorfUnknown error\n{tb})环境类负责追溯当前的Agent的状态执行Python代码并获得结果等
import re
import logging
import traceback
from typing import Any, Dict, List, Mapping, Tuple, OptionalLOGGER logging.getLogger(MINT)from mint import agents
from mint.envs.base import BaseEnv
from mint.prompt import ToolPromptTemplate
from mint.datatypes import State, Action, StepOutput, FeedbackType
from mint.tools import Tool, get_toolset_description
from mint.tasks import Task
from mint.tools.python_tool import PythonREPL
from mint.utils.exception import ParseErrorINVALID_INPUT_MESSAGE (I dont understand your input. \nIf you want to execute code, please use execute YOUR_CODE_HERE /execute.\nIf you want to give me an answer, please use solution YOUR_SOLUTION_HERE /solution.\nFor example: The answer to the question is solution 42 /solution. \n
)class GeneralEnv(BaseEnv):def __init__(self,task: Task,tool_set: List[Tool],feedback_config: Dict[str, Any],environment_config: Dict[str, Any],):self.task: Task taskself.tool_set: List[Tool] tool_set getattr(self, tool_set, [])self.state State()self.config environment_config# Feedbackself.feedback_config feedback_configfeedback_agent_config feedback_config[feedback_agent_config]if feedback_config[pseudo_human_feedback] in [GT, no_GT]:self.feedback_agent: agents getattr(agents, feedback_agent_config[agent_class])(feedback_agent_config)else:self.feedback_agent Noneif self.feedback_config[pseudo_human_feedback] None:self.feedback_type FeedbackType.NO_FEEDBACKelif self.feedback_config[pseudo_human_feedback] no_GT:self.feedback_type FeedbackType.FEEDBACK_WO_GTelif self.feedback_config[pseudo_human_feedback] GT:self.feedback_type FeedbackType.FEEDBACK_WITH_GTelse:raise ValueError(fInvalid feedback type {self.feedback_config[pseudo_human_feedback]})self.env_outputs: List[StepOutput] []LOGGER.info(f{len(self.tool_set)} tools loaded: {, .join([t.name for t in self.tool_set])})# Initialize the Python REPLuser_ns {tool.name: tool.__call__ for tool in self.tool_set}user_ns.update(task.user_ns)self.python_repl PythonREPL(user_nsuser_ns,)def parse_action(self, action: Action) - Tuple[str, Dict[str, Any]]:Define the parsing logic.lm_output \n action.value \noutput {}try:if not action.use_tool:answer \n.join([i.strip()for i in re.findall(rsolution(.*?)/solution, lm_output, re.DOTALL)])if answer :raise ParseError(No answer found.)output[answer] answerelse:env_input \n.join([i.strip()for i in re.findall(rexecute(.*?)/execute, lm_output, re.DOTALL)])if env_input :raise ParseError(No code found.)output[env_input] env_inputexcept Exception as e:raise ParseError(e)return outputdef get_feedback(self, observation: str) - Tuple[str, FeedbackType]:if self.feedback_type FeedbackType.NO_FEEDBACK:return elif self.feedback_type FeedbackType.FEEDBACK_WO_GT:gt Noneelse:gt self.task.referencefeedback self.feedback_agent.act(self.state,observationobservation,formself.feedback_config[feedback_form],gtgt,task_in_context_exampleself.task.in_context_example(use_toolself.config[use_tools],with_feedbackTrue,),tool_descget_toolset_description(self.tool_set),)return feedback.valuedef check_task_success(self, answer: str) - bool:LOGGER.info(fREFERENCE ANSWER: {self.task.reference})return self.task.success(answer)def log_output(self, output: StepOutput) - None:if self.state.finished:returncontent output.to_str()self.state.history.append({role: user, content: content})self.state.latest_output output.to_dict()self.state.latest_output[content] contentdef handle_tool_call(self, action: Action) - str:Use tool to obtain observation.try:parsed self.parse_action(action)env_input parsed[env_input]obs self.python_repl(env_input).strip()self.env_outputs.append(StepOutput(observationobs))self.state.agent_action_count[use_tool] 1return obsexcept ParseError:self.state.agent_action_count[invalid_action] 1return INVALID_INPUT_MESSAGEexcept Exception as e:error_traceback traceback.format_exc()return f{error_traceback}def handle_propose_solution(self, action: Action) - Optional[str]:Propose answer to check the task success.It might set self.state.finished True if the task is successful.self.state.agent_action_count[propose_solution] 1try:parsed self.parse_action(action)task_success self.check_task_success(parsed[answer])if task_success:self.state.finished Trueself.state.success Trueself.state.terminate_reason task_success# NOTE: should not return the function now, because we need to log the output# Set state.finished True will terminate the episodeexcept ParseError:return INVALID_INPUT_MESSAGEexcept Exception as e:error_traceback traceback.format_exc()return f{error_traceback}def check_max_iteration(self):Check if the agent has reached the max iteration limit.It might set self.state.finished True if the agent has reached the max iteration limit.if self.state.finished:# ignore if the episode is already finished (e.g., task success)returnif (# propose solution max output solutionself.state.agent_action_count[propose_solution] self.config[max_propose_solution]):self.state.finished Trueself.state.success Falseself.state.terminate_reason max_propose_stepselif (# (propose_solution use_tool) max iteration limitsum(self.state.agent_action_count.values()) self.config[max_steps]):self.state.finished Trueself.state.success Falseself.state.terminate_reason max_stepsdef step(self, action: Action, loadedNone) - State:assert (not self.state.finished), Expecting state.finished False for env.step().# Update state by logging the actionif action.value:assistant_action (Assistant:\n action.valueif not action.value.lstrip().startswith(Assistant:)else action.value)self.state.history.append({role: assistant, content: assistant_action \n})if action.error:# Check if error (usually hit the max length)observation fAn error occurred. {action.error}self.state.finished Trueself.state.success Falseself.state.error action.errorself.state.terminate_reason errorLOGGER.error(fError:\n{action.error})elif action.use_tool:observation self.handle_tool_call(action)else:# It might set self.state.finished True if the task is successful.observation self.handle_propose_solution(action)# Check if the agent has reached the max iteration limit.# If so, set self.state.finished True# This corresponds to a no-op if the episode is already finishedself.check_max_iteration()# record the turn infoif self.config[count_down]:turn_info (self.config[max_steps] - sum(self.state.agent_action_count.values()),self.config[max_propose_solution]- self.state.agent_action_count[propose_solution],)else:turn_info None# Get feedback if the episode is not finishedif loaded ! None:feedback loaded[feedback]LOGGER.info(fLoaded feedback: {feedback})elif not self.state.finished:# This is the output without feedback# use to generate an observation for feedback agenttmp_output StepOutput(observationobservation,successself.state.success,turn_infoturn_info,)feedback self.get_feedback(observationtmp_output.to_str())else:feedback # Log the output to state regardless of whether the episode is finishedoutput StepOutput(observationobservation,feedbackfeedback,feedback_typeself.feedback_type,successself.state.success,turn_infoturn_info,)self.log_output(output)return self.statedef reset(self) - State:use_tool: bool self.config[use_tools]if use_tool and len(self.tool_set) 0:LOGGER.warning((No tool is provided when use_tools is True.\nIgnore this if you are running code generation.))user_prompt ToolPromptTemplate(use_tooluse_tool)(max_total_stepsself.config[max_steps],max_propose_solutionself.config[max_propose_solution],tool_descget_toolset_description(self.tool_set),in_context_exampleself.task.in_context_example(use_tooluse_tool,with_feedbackself.feedback_type ! FeedbackType.NO_FEEDBACK,),task_promptTask:\n self.task.prompt,)self.state.history [{role: user, content: user_prompt}]self.state.latest_output {content: user_prompt}self.state.agent_action_count {propose_solution: 0,use_tool: 0,invalid_action: 0,}if use_tool:# reset tool setfor tool in self.tool_set:tool.reset()return self.state# destructordef __del__(self):self.task.cleanup()逻辑执行
from mint.envs import GeneralEnv, AlfworldEnv
from mint.datatypes import Action, State
from mint.tasks import AlfWorldTask
from mint.tools import Tool
import mint.tasks as tasks
import mint.agents as agents
import logging
import os
import json
import pathlib
import importlib
import argparse
from typing import List, Dict, Any
from tqdm import tqdm
from tqdm.contrib.logging import logging_redirect_tqdm# Configure logging settings
logging.basicConfig(format%(asctime)s [%(levelname)s] %(name)s: %(message)s,datefmt%Y-%m-%d %H:%M:%S,
)
LOGGER logging.getLogger(MINT)def interactive_loop(task: tasks.Task,agent: agents.LMAgent,tools: List[Tool],feedback_config: Dict[str, Any],env_config: Dict[str, Any],interactive_mode: bool False,
):if isinstance(task, AlfWorldTask):LOGGER.info(loading Alfworld Env)env AlfworldEnv(task, tools, feedback_config, env_config)else:env GeneralEnv(task, tools, feedback_config, env_config)state: State env.reset()init_msg state.latest_output[content]if interactive_mode:# omit in-context examplesplited_msg init_msg.split(---)init_msg splited_msg[0] In-context Example Omitted splited_msg[2]LOGGER.info(f\nUser: \n\033[94m{state.latest_output[content]}\033[0m)num_steps 0if task.loaded_history is not None:for turn in task.loaded_history:action agent.lm_output_to_action(turn[lm_output])LOGGER.info(f\nLoaded LM Agent Action:\n\033[92m{action.value}\033[0m)state env.step(action, loadedturn)LOGGER.info(\033[1m User:\n \033[0m f\033[94m{state.latest_output[content]}\033[0m)num_steps 1while not state.finished:# agent actif interactive_mode:to_continue nwhile to_continue not in [y, Y]:to_continue input(\n Continue? (y/n) )action: Action agent.act(state)# color the action in green# LOGGER.info(f\nLM Agent Action:\n\033[92m{action.value}\033[0m)LOGGER.info(f\n\033[1m LM Agent Action:\n \033[0m f\n\033[92m{action.value}\033[0m)# environment stepstate: State env.step(action)# color the state in blueif not state.finished:user_msg state.latest_output[content]if Expert feedback: in user_msg:obs, feedback user_msg.split(Expert feedback:)feedback Expert feedback: feedback# color the observation in blue feedback in redLOGGER.info(\n \033[1m User:\n \033[0m f\033[94m{obs}\033[0m \n f\033[93m{feedback}\033[0m \n)else:# color the observation in blueLOGGER.info(\n \033[1m User:\n \033[0m f\033[94m{user_msg}\033[0m \n)num_steps 1LOGGER.info(fTask finished in {num_steps} steps. Success: {state.success})return statedef main(args: argparse.Namespace):with open(args.exp_config) as f:exp_config: Dict[str, Any] json.load(f)DEFAULT_FEEDBACK_CONFIG exp_config[feedback_config]DEFAULT_ENV_CONFIG exp_config[env_config]LOGGER.info(fExperiment config: {exp_config})# initialize all the taskstask_config: Dict[str, Any] exp_config[task]task_class: tasks.Task getattr(tasks, task_config[task_class])todo_tasks, n_tasks task_class.load_tasks(task_config[filepath],**task_config.get(extra_load_task_kwargs, {}))# initialize the agentagent_config: Dict[str, Any] exp_config[agent]agent: agents.LMAgent getattr(agents, agent_config[agent_class])(agent_config[config])# initialize the feedback agent (if exist)feedback_config: Dict[str, Any] exp_config.get(feedback, DEFAULT_FEEDBACK_CONFIG)# initialize all the toolstools: List[Tool] [getattr(importlib.import_module(module), class_name)()for module, class_name in task_config[tool_imports]]env_config: Dict[str, Any] exp_config.get(environment, DEFAULT_ENV_CONFIG)pathlib.Path(exp_config[output_dir]).mkdir(parentsTrue, exist_okTrue)if args.interactive:output_path os.path.join(exp_config[output_dir], results.interactive.jsonl)else:output_path os.path.join(exp_config[output_dir], results.jsonl)done_task_id set()if os.path.exists(output_path):with open(output_path) as f:for line in f:task_id json.loads(line)[task].get(task_id, )if task_id :task_id json.loads(line)[task].get(id, )done_task_id.add(task_id)LOGGER.info(fExisting output file found. {len(done_task_id)} tasks done.)if len(done_task_id) n_tasks:LOGGER.info(All tasks done. Exiting.)return# run the loop for all tasksLOGGER.info(fRunning interactive loop for {n_tasks} tasks.)n_tasks_remain n_tasks - len(done_task_id) # only run the remaining tasksLOGGER.info(fRunning for remaining {n_tasks_remain} tasks. (completed{len(done_task_id)}))if args.n_max_tasks is not None:n_tasks_remain min(n_tasks_remain, args.n_max_tasks - len(done_task_id))LOGGER.info(fRunning for remaining {n_tasks_remain} tasks due to command line arg n_max_tasks. (n_max_tasks{args.n_max_tasks}, completed{len(done_task_id)}))with open(output_path, a) as f, logging_redirect_tqdm():pbar tqdm(totaln_tasks_remain)for i, task in enumerate(todo_tasks):# # Only test 10 tasks in debug mode# if args.debug and i 3:# breakif i n_tasks_remain len(done_task_id):LOGGER.info(fFinished {n_tasks_remain} tasks. Exiting.)break# skip done tasksif task.task_id in done_task_id:continuestate interactive_loop(task, agent, tools, feedback_config, env_config, args.interactive)if not os.path.exists(exp_config[output_dir]):os.makedirs(exp_config[output_dir])f.write(json.dumps({state: state.to_dict(),task: task.to_dict()}) \n)f.flush() # make sure the output is written to filepbar.update(1)pbar.close()if __name__ __main__:parser argparse.ArgumentParser(Run the interactive loop.)parser.add_argument(--exp_config,typestr,default./configs/gpt-3.5-turbo-0613/Fgpt-3.5-turbo-16k-0613/PHFGT-textual/max5_p2toolcd/reasoning/scienceqa.json,helpConfig of experiment.,)parser.add_argument(--debug,actionstore_true,helpWhether to run in debug mode (10 ex per task).,)parser.add_argument(--n_max_tasks,typeint,helpNumber of tasks to run. If not specified, run all tasks.,)parser.add_argument(--interactive,actionstore_true,helpWhether to run in interactive mode for demo purpose.,)args parser.parse_args()LOGGER.setLevel(logging.DEBUG if args.debug else logging.INFO)main(args)最核心的就是这两行
action: Action agent.act(state) # 根据当前的环境的状态state对话历史让大模型给出思考以及Action生成Python代码
state: State env.step(action)# 根据action调用Python代码获得结果得到新的state思考
CodeAct貌似就是回到了最初的代码大模型中用户提出Query只不过不是明显的代码生成意图通过Prompt来引导大模型通过生成文本推理和Python代码也类似于PAL、PoT并给出Python代码的执行结果。执行错误就进行Re-Fine。CodeAct完全依赖于Python代码以及Python环境中随时可以pip install的软件包从而避免了人工实现API函数或者让模型写工具的问题。大多数的Action基本上也都可以用Python代码来覆盖。缺点依然有一些Action可能无法通过Python代码来实现例如搜索、或者本地的一些库也许可以但是需要大模型写一大段代码来做。