这篇文章记录的是如何观察和了解 Agent 在现实中的实际运行状态。

Agent 与普通 API 有根本的不同。它不是简单的请求/响应就结束,而是一套多阶段流程:LLM 推理 → 工具调用 → 结果处理 → 再推理。出错时,很难立即看出是哪个阶段因为什么原因失败了。速度慢时,也搞不清楚时间消耗在哪里。在成本方面,更难分析是哪个查询消耗了大量 Token。

可观测性(Observability) 正是解决这个问题的手段。下面我整理了如何将追踪(Tracing)、指标(Metrics)、日志(Logging)这三个维度应用到 Agent 上的方法,配合实战代码一起说明。

Agent 可观测性的三个维度

三者缺一不可:

  • 追踪(Tracing):“问题出在哪里?”
  • 指标(Metrics):“严重程度如何?”
  • 日志(Logging):“到底哪里出了问题?”

准备工作

1
2
3
4
5
6
7
8
# Agent SDK
pip install azure-ai-agents azure-identity
# OpenTelemetry 核心
pip install opentelemetry-sdk
# Azure Monitor(Application Insights)导出器
pip install azure-monitor-opentelemetry
# Azure SDK OpenTelemetry 桥接
pip install azure-core-tracing-opentelemetry

环境变量设置:

1
2
3
4
5
export PROJECT_ENDPOINT="https://your-project.services.ai.azure.com/api/projects/your-project"
export AZURE_AI_MODEL_DEPLOYMENT_NAME="gpt-4o"
# Application Insights 连接字符串
# Azure Portal > Application Insights > 概述 > 连接字符串
export APPLICATIONINSIGHTS_CONNECTION_STRING="InstrumentationKey=xxx;IngestionEndpoint=https://..."

方法 1:Foundry 门户内置追踪

一行代码都不用改。只需在 Foundry 门户里打开设置,就能自动收集 Agent 执行 Trace。

配置步骤:

  1. Microsoft Foundry → 进入项目
  2. 左侧菜单 → Tracing
  3. 连接 Application Insights 资源(选择现有的或新建)

仅此设置,就能自动收集以下内容:

  • Run 开始/完成时间
  • LLM 调用次数与延迟
  • Token 使用量(输入/输出)
  • 工具调用情况与结果
  • 发生错误时的详细信息

在门户中查看:
Foundry 门户的 Tracing 标签页以时间轴视图显示每个 Run 各阶段的执行情况,一眼就能看出每个阶段花了多少秒。

方法 2:通过 OpenTelemetry 直接计测

门户追踪收集的是基本信息,而 OpenTelemetry 直接计测可以将追踪扩展到业务逻辑层面。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import os
import logging
from azure.identity import DefaultAzureCredential
from azure.ai.agents import AgentsClient
from azure.ai.agents.models import FunctionTool, ToolSet

# ─── OpenTelemetry 设置 ─────────────────────────────
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from azure.monitor.opentelemetry.exporter import AzureMonitorTraceExporter
from azure.core.settings import settings

# 让 Azure SDK 使用 OpenTelemetry
settings.tracing_implementation = "opentelemetry"

只需加一行 settings.tracing_implementation = "opentelemetry",Azure SDK 的所有 API 调用就会自动被追踪。

实战:完整的计测 Agent 示例

下面是在真实 Agent 上附加可观测性的完整代码:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
import os
import time
import json
import logging
from azure.identity import DefaultAzureCredential
from azure.ai.agents import AgentsClient
from azure.ai.agents.models import FunctionTool, ToolSet
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from azure.monitor.opentelemetry.exporter import AzureMonitorTraceExporter
from azure.core.settings import settings

# ─── 日志配置 ─────────────────────────────────────────
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
)
logger = logging.getLogger("hr-agent")

# ─── OpenTelemetry 配置 ──────────────────────────────
settings.tracing_implementation = "opentelemetry"
tracer_provider = TracerProvider()

# 导出到 Application Insights
connection_string = os.environ["APPLICATIONINSIGHTS_CONNECTION_STRING"]
exporter = AzureMonitorTraceExporter(connection_string=connection_string)
tracer_provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(tracer_provider)
tracer = trace.get_tracer("hr-agent")

client = AgentsClient(
    endpoint=os.environ["PROJECT_ENDPOINT"],
    credential=DefaultAzureCredential(),
)

def get_employee_info(employee_id: str) -> str:
    """查询员工信息"""
    with tracer.start_as_current_span("tool.get_employee_info") as span:
        start = time.time()
        span.set_attribute("employee_id", employee_id)
        employees = {
            "EMP001": {"name": "张伟", "department": "工程部", "position": "高级工程师"},
            "EMP002": {"name": "王芳", "department": "设计部", "position": "UX 设计师"},
        }
        result = employees.get(employee_id, {"error": "找不到该员工"})
        elapsed = time.time() - start
        span.set_attribute("found", "error" not in result)
        span.set_attribute("latency_ms", round(elapsed * 1000))
        logger.info(f"[tool] get_employee_info({employee_id}) → {elapsed:.3f}s")
        return json.dumps({"employee_id": employee_id, **result}, ensure_ascii=False)

def get_remaining_vacation(employee_id: str, year: int = 2025) -> str:
    """查询员工年假剩余天数"""
    with tracer.start_as_current_span("tool.get_remaining_vacation") as span:
        span.set_attribute("employee_id", employee_id)
        span.set_attribute("year", year)
        vacations = {
            "EMP001": {"total": 15, "used": 7, "remaining": 8},
            "EMP002": {"total": 15, "used": 12, "remaining": 3},
        }
        result = vacations.get(employee_id, {"total": 0, "used": 0, "remaining": 0})
        span.set_attribute("remaining_days", result["remaining"])
        logger.info(f"[tool] get_remaining_vacation({employee_id}, {year}) → remaining={result['remaining']}")
        return json.dumps({"employee_id": employee_id, "year": year, **result}, ensure_ascii=False)

# ─── 创建 Agent ────────────────────────────────────
functions = FunctionTool(functions={get_employee_info, get_remaining_vacation})
toolset = ToolSet()
toolset.add(functions)
agent = client.create_agent(
    model=os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"],
    name="hr-agent-observed",
    instructions="回答 HR 相关问题的助手。使用工具获取准确数据。",
    toolset=toolset,
)

def run_agent_observed(user_message: str) -> dict:
    with tracer.start_as_current_span("agent.run") as span:
        start_total = time.time()
        span.set_attribute("user.message", user_message[:200])
        thread = client.threads.create()
        client.messages.create(thread_id=thread.id, role="user", content=user_message)
        # 执行 Run(手动轮询以追踪详细状态)
        run = client.runs.create(thread_id=thread.id, agent_id=agent.id)
        span.set_attribute("run.id", run.id)
        tool_call_count = 0
        while run.status in ["queued", "in_progress", "requires_action"]:
            time.sleep(1)
            run = client.runs.get(thread_id=thread.id, run_id=run.id)
            if run.status == "requires_action":
                tool_outputs = []
                for tool_call in run.required_action.submit_tool_outputs.tool_calls:
                    tool_call_count += 1
                    func_name = tool_call.function.name
                    func_args = json.loads(tool_call.function.arguments)
                    with tracer.start_as_current_span(f"agent.tool_call.{func_name}") as tool_span:
                        tool_span.set_attribute("tool.name", func_name)
                        tool_span.set_attribute("tool.arguments", json.dumps(func_args))
                        if func_name == "get_employee_info":
                            result = get_employee_info(**func_args)
                        elif func_name == "get_remaining_vacation":
                            result = get_remaining_vacation(**func_args)
                        else:
                            result = json.dumps({"error": f"未知函数:{func_name}"})
                        tool_span.set_attribute("tool.result_length", len(result))
                    tool_outputs.append({"tool_call_id": tool_call.id, "output": result})
                run = client.runs.submit_tool_outputs(
                    thread_id=thread.id, run_id=run.id, tool_outputs=tool_outputs,
                )
        elapsed_total = time.time() - start_total
        span.set_attribute("run.status", str(run.status))
        span.set_attribute("run.tool_call_count", tool_call_count)
        span.set_attribute("run.total_latency_ms", round(elapsed_total * 1000))
        if run.usage:
            span.set_attribute("llm.prompt_tokens", run.usage.prompt_tokens)
            span.set_attribute("llm.completion_tokens", run.usage.completion_tokens)
            span.set_attribute("llm.total_tokens", run.usage.total_tokens)
            logger.info(
                f"[usage] tokens: prompt={run.usage.prompt_tokens}, "
                f"completion={run.usage.completion_tokens}, "
                f"total={run.usage.total_tokens}"
            )
        response_text = ""
        if run.status == "completed":
            messages = list(client.messages.list(thread_id=thread.id))
            for msg in reversed(messages):
                if msg.role == "assistant":
                    response_text = msg.content[0].text.value if msg.content else ""
                    break
        logger.info(f"[agent] 完成:latency={round(elapsed_total*1000)}ms, tools={tool_call_count}")
        return {
            "response": response_text,
            "run_id": run.id,
            "thread_id": thread.id,
            "status": str(run.status),
            "tool_calls": tool_call_count,
            "latency_ms": round(elapsed_total * 1000),
            "tokens": {
                "prompt": run.usage.prompt_tokens if run.usage else 0,
                "completion": run.usage.completion_tokens if run.usage else 0,
                "total": run.usage.total_tokens if run.usage else 0,
            },
        }

# 清理
client.delete_agent(agent.id)
# 强制刷新缓冲区中的剩余 Trace
tracer_provider.force_flush()

run.usage 中包含 Run 完成后的总 Token 使用量。将其记录为 Trace 属性,就能在事后分析"哪个查询消耗了最多 Token"。

在 Application Insights 中查看 Trace

运行代码后,数据会积累在 Application Insights 中。在 Azure Portal 中查看的方法如下:

事务搜索

Azure Portal → Application Insights 资源 → 搜索

agent.run 名称找到 Trace 并点击,就能以时间轴视图看到各执行阶段。一眼就能看出哪个阶段慢、工具调用了几次。

用 KQL 分析数据

使用 KQL(Kusto 查询语言)分析积累在 Application Insights 中的数据。

1. Agent 执行统计摘要

// 最近 24 小时 Agent 执行状况
dependencies
| where timestamp > ago(24h)
| where name == "agent.run"
| summarize
    total_runs = count(),
    success_runs = countif(success == true),
    avg_latency_ms = avg(duration),
    p95_latency_ms = percentile(duration, 95),
    p99_latency_ms = percentile(duration, 99)
| extend success_rate = round(100.0 * success_runs / total_runs, 2)

2. 按时间段的 Token 使用量趋势

// 当 Token 使用量以自定义属性记录时
dependencies
| where timestamp > ago(24h)
| where name == "agent.run"
| extend
    total_tokens = toint(customDimensions["llm.total_tokens"]),
    prompt_tokens = toint(customDimensions["llm.prompt_tokens"]),
    completion_tokens = toint(customDimensions["llm.completion_tokens"])
| summarize
    sum_total = sum(total_tokens),
    sum_prompt = sum(prompt_tokens),
    sum_completion = sum(completion_tokens)
    by bin(timestamp, 1h)
| order by timestamp asc

3. 工具调用频率分析

// 哪个工具被调用了多少次?
dependencies
| where timestamp > ago(7d)
| where name startswith "agent.tool_call."
| extend tool_name = replace_string(name, "agent.tool_call.", "")
| summarize
    call_count = count(),
    avg_duration_ms = avg(duration),
    p95_duration_ms = percentile(duration, 95),
    error_count = countif(success == false)
    by tool_name
| order by call_count desc

4. 查找慢 Run

// 超过 3 秒的 Agent 执行列表
dependencies
| where timestamp > ago(24h)
| where name == "agent.run"
| where duration > 3000
| project
    timestamp,
    run_id = tostring(customDimensions["run.id"]),
    latency_ms = duration,
    tool_calls = toint(customDimensions["run.tool_call_count"]),
    total_tokens = toint(customDimensions["llm.total_tokens"]),
    user_message = tostring(customDimensions["user.message"])
| order by latency_ms desc

5. 错误分析

// 最近 24 小时失败的 Run 详情
dependencies
| where timestamp > ago(24h)
| where name == "agent.run"
| where success == false
| project
    timestamp,
    run_id = tostring(customDimensions["run.id"]),
    run_status = tostring(customDimensions["run.status"]),
    error_message = tostring(customDimensions["error.message"])
| order by timestamp desc

成本追踪仪表板

可以创建 Azure Monitor Workbook 来可视化成本:

dependencies
| where timestamp > ago(30d)
| where name == "agent.run"
| extend
    prompt_tokens = tolong(customDimensions["llm.prompt_tokens"]),
    completion_tokens = tolong(customDimensions["llm.completion_tokens"])
| summarize
    total_prompt = sum(prompt_tokens),
    total_completion = sum(completion_tokens)
    by bin(timestamp, 1d)
| extend
    // GPT-4o 定价:输入 $2.50/1M,输出 $10/1M
    estimated_cost_usd = (total_prompt / 1000000.0 * 2.50) + (total_completion / 1000000.0 * 10.0)
| project
    day = bin(timestamp, 1d),
    total_prompt,
    total_completion,
    estimated_cost_usd = round(estimated_cost_usd, 4)
| order by day asc

异常检测告警设置

当 Agent 错误率急剧上升或延迟超过阈值时,自动接收告警通知。

Azure Portal 告警配置

Application Insights → 告警 → + 创建 → 告警规则

场景 1:错误率告警

  • 信号:自定义查询(KQL)
  • 条件:最近 5 分钟错误率 > 10%
  • 操作:邮件 / Teams 通知
// 告警评估用 KQL
dependencies
| where timestamp > ago(5m)
| where name == "agent.run"
| summarize
    total = count(),
    errors = countif(success == false)
| extend error_rate = 100.0 * errors / total
| where error_rate > 10

场景 2:延迟告警

  • 条件:p95 延迟 > 5 秒
dependencies
| where timestamp > ago(5m)
| where name == "agent.run"
| summarize p95 = percentile(duration, 95)
| where p95 > 5000

用结构化日志丰富上下文

同时使用结构化日志和 OpenTelemetry Trace,分析会更加方便:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import structlog

log = structlog.get_logger()

def run_agent_with_logging(user_message: str, user_id: str) -> dict:
    run_logger = log.bind(
        agent_id=agent.id,
        user_id=user_id,
        message_preview=user_message[:50],
    )
    run_logger.info("agent_run_started")
    try:
        result = run_agent_observed(user_message)
        run_logger.info(
            "agent_run_completed",
            latency_ms=result["latency_ms"],
            tool_calls=result["tool_calls"],
            total_tokens=result["tokens"]["total"],
        )
        return result
    except Exception as e:
        run_logger.error("agent_run_failed", error=str(e))
        raise

结构化日志会累积在 Application Insights 的 traces 表中。由于是 JSON 格式,用 KQL 提取字段非常方便。

用 Azure Monitor Workbook 搭建仪表板

数据在 Application Insights 中积累后,就可以用 Azure Monitor Workbook 搭建仪表板了。

推荐仪表板配置:

  • Agent 运行成功率(实时)
  • p95 / p99 延迟趋势
  • 每小时/每天 Token 使用量
  • 各工具调用频率
  • 错误率趋势 + 告警历史

可以用 Azure Bicep 或 ARM 模板部署 Workbook 模板 JSON,让整个团队可以立即使用同一个仪表板。

在开发环境中本地查看 Trace

在发送到 Application Insights 之前,如果想先在本地确认 Trace,可以使用控制台输出:

1
2
3
4
5
6
7
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor

# 开发环境:控制台输出
if os.environ.get("ENVIRONMENT") == "development":
    tracer_provider.add_span_processor(
        SimpleSpanProcessor(ConsoleSpanExporter())
    )

也可以用 Docker 启动本地追踪后端 Jaeger 或 Zipkin:

1
2
3
4
5
# 本地运行 Jaeger
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4317:4317 \
  jaegertracing/all-in-one:latest
1
2
3
4
5
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# 导出到 Jaeger
jaeger_exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
tracer_provider.add_span_processor(BatchSpanProcessor(jaeger_exporter))

http://localhost:16686 的 Jaeger UI 中查看 Trace。

上线前检查清单

在为生产环境的 Agent 附加可观测性之前,需要确认的事项:

追踪

  • 设置 settings.tracing_implementation = "opentelemetry"
  • 配置 Application Insights 连接字符串环境变量
  • 调用 force_flush() 在进程退出前刷新 Trace
  • 将 Run ID、Thread ID 记录到 Trace 属性中

指标

  • 记录 Token 使用量(prompt/completion)
  • 记录工具调用次数
  • 记录延迟
  • 追踪错误率

告警

  • 设置错误率阈值告警
  • 设置 p95 延迟阈值告警
  • 设置每日 Token 使用量上限告警

敏感信息

  • 确认 user.message 属性中是否包含 PII 信息
  • 必要时进行脱敏处理:user_message[:50]

总结

我在实践 Azure AI Agent 可观测性中总结了以下要点:

  • Foundry 门户内置追踪:无需修改代码即可收集基本指标。快速上手的最佳方式。
  • OpenTelemetry 直接计测:一行代码 settings.tracing_implementation = "opentelemetry" 即可对整个 Azure SDK 进行计测。通过自定义 Span 将追踪扩展到业务逻辑层面。
  • run.usage:从已完成的 Run 中获取 Token 使用量,用于成本分析。
  • KQL:查询 Application Insights 数据,分析慢 Run、错误和成本趋势。

构建 Agent 和运营 Agent 是两件不同的事。在没有可观测性的情况下将 Agent 上线到生产环境,一旦出现问题,就会对哪里出了问题、以何种方式出了问题一无所知。


参考资料