AI Agent 原理与实践

文章摘要 FakeGPT

加载中...|

此内容根据文章生成，并经过人工审核，仅用于文章内容的解释与总结投诉

概述

前两篇文章我们学习了 LLM 的基础知识和微调部署。但是，单纯的 LLM 只是一个"大脑"，它只能通过对话与世界交互。AI Agent 则是给 LLM 装上"手脚"和"记忆"，让它能够感知环境、调用工具、规划任务，从而自主地完成复杂目标。本文将深入介绍 AI Agent 的原理与实践。

什么是 AI Agent

Agent 的定义

AI Agent 是一个能够：

text

┌─────────────────────────────────────────────────────────┐
│                    AI Agent 的能力                        │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  感知  │  理解环境和目标                                 │
│  规划  │  分解任务、制定计划                             │
│  行动  │  调用工具执行操作                               │
│  观察  │  获取执行结果反馈                               │
│  学习  │  从经验中改进                                   │
│  记忆  │  存储和检索信息                                 │
│                                                         │
└─────────────────────────────────────────────────────────┘

LLM vs Agent

text

传统 LLM 对话：

用户 → LLM → 回答
      ↑
   只能输出文本



AI Agent：

用户 → Agent → 规划 → 工具1 ───┐
                   │               │
                   ├→ 工具2 ───────┤→ 整合结果 → 回答
                   │               │
                   └→ 工具3 ───────┘
                        ↑
                    能执行实际操作

典型应用场景

场景	Agent 能力
智能客服	查询订单、处理退款、调用知识库
代码助手	编写代码、运行测试、解释报错
个人助理	日程管理、邮件回复、信息检索
数据分析	读取文件、分析数据、生成报告
自动化办公	文档处理、流程审批、数据录入

Agent 的核心组件

四大组件架构

text

┌─────────────────────────────────────────────────────────┐
│                    Agent 四大组件                         │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐ │
│  │   LLM   │    │ Memory  │    │ Tools   │    │Planning │ │
│  │  (大脑)  │    │  (记忆)  │    │  (工具)  │    │ (规划)  │ │
│  └─────────┘    └─────────┘    └─────────┘    └─────────┘ │
│       │              │              │              │      │
│       └──────────────┴──────────────┴──────────────┘      │
│                         │                                │
│                         ▼                                │
│                    ┌─────────┐                           │
│                    │  Agent  │                           │
│                    └─────────┘                           │
└─────────────────────────────────────────────────────────┘

1. LLM - 大脑

LLM 是 Agent 的核心，负责：

text

┌─────────────────────────────────────────────────────────┐
│                    LLM 的作用                             │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  • 理解用户意图                                           │
│  • 制定行动计划                                           │
│  • 选择合适的工具                                         │
│  • 解析工具返回结果                                        │
│  • 生成最终回答                                           │
│                                                         │
└─────────────────────────────────────────────────────────┘

2. Memory - 记忆

让 Agent 能够"记住"过去的交互。

text

┌─────────────────────────────────────────────────────────┐
│                    Memory 的类型                          │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  短期记忆 (Short-term Memory)                            │
│  ├── 当前对话上下文                                       │
│  ├── 临时状态信息                                         │
│  └── 存储在上下文窗口中                                    │
│                                                         │
│  长期记忆 (Long-term Memory)                             │
│  ├── 向量数据库                                           │
│  ├── 关键知识存储                                         │
│  └── 可跨会话持久化                                       │
│                                                         │
└─────────────────────────────────────────────────────────┘

Memory 示例：

python

from langchain.memory import ConversationBufferMemory
from langchain.memory import VectorStoreMemory

# 短期记忆：对话缓冲
short_memory = ConversationBufferMemory()
short_memory.save_context({"input": "我叫张三"}, {"output": "你好张三"})

# 长期记忆：向量存储
long_memory = VectorStoreMemory(...)
long_memory.save_context(
    {"input": "我的生日是 1990年5月1日"},
    {"output": "已记录"}
)

3. Tools - 工具

让 Agent 能够执行实际操作。

text

┌─────────────────────────────────────────────────────────┐
│                    常用工具类型                            │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  信息检索                                              │
│     ├── 搜索引擎 (Google, Bing)                          │
│     ├── 向量检索                                          │
│     └── 数据库查询                                        │
│                                                         │
│  代码执行                                              │
│     ├── Python REPL                                      │
│     ├── 代码解释器                                        │
│     └── Shell 命令                                       │
│                                                         │
│  API 调用                                              │
│     ├── 天气查询                                          │
│     ├── 邮件发送                                          │
│     └── 日程管理                                          │
│                                                         │
│  文件操作                                              │
│     ├── 读取文档                                          │
│     ├── 写入文件                                          │
│     └── 格式转换                                          │
│                                                         │
└─────────────────────────────────────────────────────────┘

工具定义示例：

python

from langchain.tools import tool

@tool
def search_web(query: str) -> str:
    """搜索网络信息"""
    # 实现搜索逻辑
    return f"搜索 {query} 的结果..."

@tool
def calculate(expression: str) -> str:
    """计算数学表达式"""
    try:
        result = safe_eval(expression)  # 使用安全的计算方法
        return f"结果: {result}"
    except Exception as e:
        return f"计算错误: {str(e)}"

# 工具列表
tools = [search_web, calculate]

4. Planning - 规划

让 Agent 能够分解复杂任务。

text

┌─────────────────────────────────────────────────────────┐
│                    规划模式                               │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ReAct 模式 (推理 + 行动)                                 │
│  ┌─────────────────────────────────────────────────┐    │
│  │ Thought: 思考当前状态                             │    │
│  │ Action: 选择并执行动作                             │    │
│  │ Observation: 观察执行结果                          │    │
│  │ (重复直到完成目标)                                 │    │
│  └─────────────────────────────────────────────────┘    │
│                                                         │
│  任务分解模式                                             │
│  ┌─────────────────────────────────────────────────┐    │
│  │ 1. 分析用户目标                                    │    │
│  │ 2. 分解为子任务                                    │    │
│  │ 3. 按顺序执行子任务                                │    │
│  │ 4. 整合结果                                        │    │
│  └─────────────────────────────────────────────────┘    │
│                                                         │
└─────────────────────────────────────────────────────────┘

Agent 工作流程详解

ReAct 循环

ReAct (Reasoning + Acting) 是最常用的 Agent 模式：

text

用户: "帮我查一下今天的天气，然后决定是否需要带伞"

┌─────────────────────────────────────────────────────────┐
│  Thought 1: 用户想知道天气，需要使用天气工具             │
│  Action 1: weather.get_weather(location="北京")         │
│  Observation 1: 北京今天有雨，气温 15-20℃               │
├─────────────────────────────────────────────────────────┤
│  Thought 2: 今天有雨，应该建议带伞                       │
│  Action 2: respond(message="今天北京有雨，建议带伞")    │
│  Observation 2: 回复已发送                               │
├─────────────────────────────────────────────────────────┤
│  Thought 3: 任务已完成                                  │
│  Action 3: finish                                       │
└─────────────────────────────────────────────────────────┘

最终回复: "今天北京有雨，气温 15-20℃，建议出门带伞。"

完整工作流程

text

┌─────────────────────────────────────────────────────────┐
│                    Agent 工作流程                         │
└─────────────────────────────────────────────────────────┘

1. 接收用户输入
   │
   ▼
2. 理解意图 (LLM)
   │
   ▼
3. 检索记忆 (Memory)
   │
   ▼
4. 制定计划 (Planning)
   │
   ▼
5. 选择工具 (Tools)
   │
   ▼
6. 执行工具
   │
   ▼
7. 观察结果
   │
   ▼
8. 更新记忆
   │
   ▼
9. 判断是否完成？
   ├─ 否 → 返回步骤 4
   └─ 是 → 生成最终回答

主流 Agent 框架

框架对比

框架	语言	特点	学习曲线
LangChain	Python/JS	功能全面、生态丰富	中等
LlamaIndex	Python	数据连接强大	中等
AutoGPT	Python	自主性强	较高
BabyAGI	Python	任务管理专注	中等
CrewAI	Python	多 Agent 协作	较高
MetaGPT	Python	软件开发专注	较高

LangChain Agent

LangChain 是最流行的 Agent 框架：

python

from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import ChatOpenAI
from langchain import hub

# 1. 定义工具
from langchain.tools import Tool
from langchain_community.utilities import SerpAPIWrapper

search = SerpAPIWrapper()
tools = [
    Tool(
        name="Search",
        func=search.run,
        description="用于搜索网络信息"
    )
]

# 2. 初始化 LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# 3. 获取 Prompt 模板
prompt = hub.pull("hwchase17/openai-tools-agent")

# 4. 创建 Agent
agent = create_openai_tools_agent(llm, tools, prompt)

# 5. 创建执行器
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True
)

# 6. 运行
result = agent_executor.invoke({
    "input": "搜索最新的 AI 新闻"
})

print(result['output'])

AutoGPT

AutoGPT 让 Agent 完全自主运行：

python

from autogpt import AutoGPT

agent = AutoGPT(
    llm=ChatOpenAI(model="gpt-4o"),
    tools=[search_tool, write_tool],
    name="ResearchAgent",
    role="研究助手",
    goals=[
        "研究最新的 AI 发展",
        "生成研究报告"
    ]
)

result = agent.run()

BabyAGI

专注于任务管理的 Agent：

python

from babyagi import BabyAGI

agent = BabyAGI(
    llm=ChatOpenAI(model="gpt-4o"),
    objective="写一篇关于 AI Agent 的文章",
    tools=[search_tool, write_tool]
)

agent.run()

单 Agent vs 多 Agent

单 Agent 架构

text

┌─────────────────────────────────────────────────────────┐
│                    单 Agent 架构                          │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  用户 → Agent (全能) → 工具1                             │
│                   │ → 工具2                             │
│                   │ → 工具3                             │
│                   │ → 工具N                             │
│                                                         │
│  优点：简单、直接                                        │
│  缺点：复杂任务效果差                                    │
│                                                         │
└─────────────────────────────────────────────────────────┘

适用场景：

简单任务
工具数量少
快速原型开发

多 Agent 架构

text

┌─────────────────────────────────────────────────────────┐
│                    多 Agent 架构                          │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  用户                                                    │
│   │                                                      │
│   ▼                                                      │
│  ┌─────────────────────────────────────────────────┐    │
│  │           Manager Agent (协调者)                  │    │
│  └─────────────────────────────────────────────────┘    │
│       │           │           │                          │
│       ▼           ▼           ▼                          │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐                    │
│  │Research │ │  Writer │ │ Coder   │                    │
│  │ Agent   │ │ Agent   │ │ Agent   │                    │
│  └─────────┘ └─────────┘ └─────────┘                    │
│       │           │           │                          │
│       └───────────┴───────────┘                          │
│                   │                                      │
│                   ▼                                      │
│              最终结果                                     │
│                                                         │
│  优点：任务分工、可扩展                                   │
│  缺点：复杂、通信开销                                     │
│                                                         │
└─────────────────────────────────────────────────────────┘

适用场景：

复杂任务
需要专业分工
大规模应用

多 Agent 协作模式

text

┌─────────────────────────────────────────────────────────┐
│                    协作模式                               │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  层级式 (Hierarchical)                                   │
│  Manager → Worker1, Worker2, Worker3                     │
│                                                         │
│  顺序式 (Sequential)                                     │
│  Agent1 → Agent2 → Agent3 → Agent4                       │
│                                                         │
│  并行式 (Parallel)                                       │
│  ┌─────────┬─────────┬─────────┐                        │
│  │Agent1   │Agent2   │Agent3   │                        │
│  └─────────┴─────────┴─────────┘                        │
│         │         │         │                           │
│         └─────────┴─────────┘                           │
│                   │                                     │
│                   ▼                                     │
│              Aggregator                                 │
│                                                         │
│  对抗式 (Adversarial)                                    │
│  Agent1 (提出方案) ⟷ Agent2 (批判改进)                   │
│                                                         │
└─────────────────────────────────────────────────────────┘

Agent 设计模式

模式 1：工具调用模式

python

from langchain.agents import create_tool_calling_agent
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")
tools = [calculator, search, weather]

agent = create_tool_calling_agent(
    llm=llm,
    tools=tools,
    prompt=prompt
)

# Agent 自动选择工具
result = agent.invoke({"input": "北京今天的天气温度是多少？"})

模式 2：规划执行模式

python

from langchain.experimental.plan_and_execute import PlanAndExecuteAgent

agent = PlanAndExecuteAgent(
    llm=ChatOpenAI(model="gpt-4o"),
    tools=tools
)

# Agent 先制定完整计划，再执行
result = agent.invoke({"input": "帮我规划一次上海旅行"})

模式 3：反思模式

python

from langchain.agents import AgentExecutor

# 增加自我反思步骤
prompt = """
完成任务后，请反思：
1. 我完成了什么？
2. 有什么可以改进的？
3. 需要额外信息吗？
"""

agent = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=5  # 允许多次反思和重试
)

实战：构建第一个 AI Agent

场景：智能文档助手

创建一个能读取、搜索、总结文档的 Agent。

python

# doc_agent.py

from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_openai import ChatOpenAI
from langchain.tools import tool
from langchain.prompts import ChatPromptTemplate
import os
from pathlib import Path

# 1. 定义工具
@tool
def read_document(file_path: str) -> str:
    """读取文档内容"""
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            return f.read()
    except Exception as e:
        return f"错误: {str(e)}"

@tool
def list_documents(directory: str) -> str:
    """列出目录下的文档"""
    try:
        files = list(Path(directory).glob("*.md"))
        return "\n".join([f.name for f in files])
    except Exception as e:
        return f"错误: {str(e)}"

@tool
def search_document(file_path: str, keyword: str) -> str:
    """在文档中搜索关键词"""
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            content = f.read()

        lines = content.split('\n')
        results = []
        for i, line in enumerate(lines, 1):
            if keyword.lower() in line.lower():
                results.append(f"行 {i}: {line.strip()}")

        return "\n".join(results) if results else "未找到匹配内容"
    except Exception as e:
        return f"错误: {str(e)}"

@tool
def summarize_document(file_path: str) -> str:
    """使用 LLM 总结文档"""
    from langchain.chains.summarize import load_summarize_chain
    from langchain.docstore.document import Document

    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            content = f.read()

        llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
        chain = load_summarize_chain(llm, chain_type="stuff")

        summary = chain.run([Document(page_content=content)])
        return summary
    except Exception as e:
        return f"错误: {str(e)}"

# 2. 工具列表
tools = [read_document, list_documents, search_document, summarize_document]

# 3. 初始化 LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# 4. 定义 Prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", """你是一个智能文档助手，可以帮助用户：
- 读取文档内容
- 列出目录下的文档
- 在文档中搜索关键词
- 总结文档内容

请根据用户需求选择合适的工具。"""),
    ("placeholder", "{chat_history}"),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}")
])

# 5. 创建 Agent
agent = create_tool_calling_agent(llm, tools, prompt)

# 6. 创建执行器
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    handle_parsing_errors=True,
    max_iterations=5
)

# 7. 运行示例
if __name__ == "__main__":
    result = agent_executor.invoke({
        "input": "请帮我总结当前目录下的所有 md 文档"
    })
    print("\n" + "="*50)
    print("最终回答:")
    print(result["output"])

运行效果

bash

$ python doc_agent.py

> Entering new AgentExecutor chain...

Invoking: `list_documents` with `{'directory': '.'}`

观察: README.md doc_agent.py

Invoking: `summarize_document` with `{'file_path': 'README.md'}`

观察: [文档摘要...]

Invoking: `summarize_document` with `{'file_path': 'doc_agent.py'}`

观察: [代码摘要...]

> Finished chain.

最终回答:
我已经分析了当前目录下的文件：

1. README.md: [摘要内容]
2. doc_agent.py: [摘要内容]

Agent 评估与优化

评估维度

维度	指标
准确性	任务完成率、输出质量
效率	平均步数、token 消耗
可靠性	成功率、错误率
成本	API 调用成本

优化建议

text

┌─────────────────────────────────────────────────────────┐
│                    Agent 优化建议                         │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  1. 精简工具描述                                         │
│     • 清晰说明工具用途                                   │
│     • 指定输入输出格式                                   │
│                                                         │
│  2. 优化 Prompt                                          │
│     • 明确 Agent 角色                                    │
│     • 指定思考步骤                                       │
│     • 限制迭代次数                                       │
│                                                         │
│  3. 选择合适的模型                                        │
│     • 简单任务用小模型                                   │
│     • 复杂推理用大模型                                   │
│                                                         │
│  4. 添加监控和日志                                        │
│     • 记录工具调用                                       │
│     • 统计 token 消耗                                    │
│     • 追踪失败案例                                       │
│                                                         │
│  5. 使用记忆功能                                          │
│     • 存储常见问题答案                                   │
│     • 缓存工具调用结果                                   │
│                                                         │
└─────────────────────────────────────────────────────────┘

小结

AI Agent 是让 LLM 从"对话者"变成"行动者"的关键技术：

核心要点

四大组件
- LLM：大脑，负责思考和决策
- Memory：记忆，存储信息
- Tools：工具，执行操作
- Planning：规划，分解任务
工作流程
- 理解意图 → 制定计划 → 选择工具 → 执行操作 → 观察结果 → 循环改进
框架选择
- LangChain：全面生态，适合大多数场景
- AutoGPT：完全自主
- CrewAI：多 Agent 协作
设计建议
- 从简单开始，逐步增加复杂度
- 精心设计工具和 Prompt
- 添加监控和错误处理
- 考虑成本和性能平衡

下一篇文章将深入介绍 RAG 技术，教你如何让 LLM 能够利用外部知识库。

AI Agent 原理与实践https://indulgeback.github.io/posts/AI%E4%B8%8ELLM/3%E3%80%81AI%20Agent%20%E5%8E%9F%E7%90%86%E4%B8%8E%E5%AE%9E%E8%B7%B5

作者LeviLiu

发布于1/7

更新于3天前

许可协议 CC BY-NC-SA 4.0

署名-非商业性使用-相同方式共享 4.0 国际

LLM Agent LangChain AI应用

反馈与投诉

赞赏博主

评论隐私政策

AI Agent 原理与实践

概述 ​

什么是 AI Agent ​

Agent 的定义 ​

LLM vs Agent ​

典型应用场景 ​

Agent 的核心组件 ​

四大组件架构 ​

1. LLM - 大脑 ​

2. Memory - 记忆 ​

3. Tools - 工具 ​

4. Planning - 规划 ​

Agent 工作流程详解 ​

ReAct 循环 ​

完整工作流程 ​

主流 Agent 框架 ​

框架对比 ​

LangChain Agent ​

AutoGPT ​

BabyAGI ​

单 Agent vs 多 Agent ​

单 Agent 架构 ​

多 Agent 架构 ​

多 Agent 协作模式 ​

Agent 设计模式 ​

模式 1：工具调用模式 ​

模式 2：规划执行模式 ​

模式 3：反思模式 ​

实战：构建第一个 AI Agent ​

场景：智能文档助手 ​

运行效果 ​

Agent 评估与优化 ​

评估维度 ​

优化建议 ​

小结 ​

核心要点 ​

概述

什么是 AI Agent

Agent 的定义

LLM vs Agent

典型应用场景

Agent 的核心组件

四大组件架构

1. LLM - 大脑

2. Memory - 记忆

3. Tools - 工具

4. Planning - 规划

Agent 工作流程详解

ReAct 循环

完整工作流程

主流 Agent 框架

框架对比

LangChain Agent

AutoGPT

BabyAGI

单 Agent vs 多 Agent

单 Agent 架构

多 Agent 架构

多 Agent 协作模式

Agent 设计模式

模式 1：工具调用模式

模式 2：规划执行模式

模式 3：反思模式

实战：构建第一个 AI Agent

场景：智能文档助手

运行效果

Agent 评估与优化

评估维度

优化建议

小结

核心要点