基于 LlamaIndex、Claude-3.5 Sonnet 和 MongoDB，構(gòu)建具有超級檢索能力的 AI 智能體

Telemart跨境遠(yuǎn)程購跨境電商2024-07-08400

想象一下，AI 助手可以無縫地與你互動，動態(tài)地根據(jù)你的需求檢索信息并完成任務(wù)。隨著智能體檢索增強生成（RAG）的興起，這一愿景正逐漸成為現(xiàn)實。

在本文中，我們將深入探討這個令人興奮的領(lǐng)域，探索如何利用強大的工具組合：LlamaIndex、Claude-3.5 Sonnet 和 MongoDB 來創(chuàng)建具有檢索超級能力的 AI 智能體。

工具集成

讓我們看看如何將這些強大的工具結(jié)合在一起：

LlamaIndex：這個先進的搜索引擎擅長基于意義而不是關(guān)鍵詞查找相似信息。它充當(dāng) AI 智能體的“眼睛”，在海量信息中定位最相關(guān)的數(shù)據(jù)。

Claude-3.5 Sonnet：它允許智能體處理 LlamaIndex 檢索到的信息，生成響應(yīng)。

MongoDB：一個強大的 NoSQL 數(shù)據(jù)庫，MongoDB 在存儲和管理支持 AI 智能體的知識庫中起著關(guān)鍵作用。其靈活性允許存儲各種數(shù)據(jù)類型，使其成為復(fù)雜信息檢索任務(wù)的理想選擇。

集成優(yōu)勢

這種協(xié)同作用帶來了許多好處：

增強的信息檢索：LlamaIndex 的向量搜索功能確保 AI 智能體檢索到最相關(guān)的信息，即使是細(xì)微的查詢也不例外。

動態(tài)任務(wù)完成：Claude-3.5 Sonnet 使 AI 智能體能夠分析檢索到的數(shù)據(jù)并采取適當(dāng)?shù)男袆?，使其真正成為能夠獨立行動的智能體。

可擴展性和靈活性：MongoDB 處理大型數(shù)據(jù)集的能力允許系統(tǒng)隨著信息需求的增加而增長。

代碼實現(xiàn)

讓我們深入研究使用 LlamaIndex、Claude-3.5 Sonnet 和 MongoDB 的智能體 RAG。

第一步：安裝庫

!pip install --quiet llama-index # main llamaindex library

!pip install --quiet llama-index-vector-stores-MongoDB # mongodb vector database

!pip install --quiet llama-index-llms-anthropic # anthropic LLM provider

!pip install --quiet llama-index-embeddings-openai # openai embedding provider

!pip install --quiet pymongo pandas datasets # others

第二步：設(shè)置環(huán)境變量

import os

os.environ["ANTHROPIC_API_KEY"] = ""

os.environ["HF_TOKEN"] = ""

os.environ["OPENAI_API_KEY"] = ""

# WARNING: Never commit API keys or sensitive information to public repositories

LLM 和嵌入模型配置

from llama_index.embeddings.openai import OpenAIEmbedding

from llama_index.llms.anthropic import Anthropic

from llama_index.core import Settings

llm = Anthropic(model="claude-3-5-sonnet-20240620")

embed_model = OpenAIEmbedding(

model="text-embedding-3-small",

dimensions=256,

embed_batch_size=10,

openai_api_key=os.environ["OPENAI_API_KEY"]

)

Settings.embed_model = embed_model

Settings.llm = llm

第三步：數(shù)據(jù)加載和處理

from datasets import load_dataset

import pandas as pd

# https://huggingface.co/datasets/MongoDB/airbnb_embeddings

dataset = load_dataset("MongoDB/airbnb_embeddings", split="train", streaming=True)

dataset = dataset.take(4000)

# Convert the dataset to a pandas dataframe

dataset_df = pd.DataFrame(dataset)

# Dataset comes with embeddings created with OpenAI, but we will recreate new ones

dataset_df = dataset_df.drop(columns=['text_embeddings'])

dataset_df.head(5)

第四步：生成嵌入

import json

from llama_index.core import Document

from llama_index.core.schema import MetadataMode

documents_json = dataset_df.to_json(orient='records')

documents_list = json.loads(documents_json)

llama_documents = []

for document in documents_list:

# Convert complex objects to JSON strings

for field in ["amenities", "images", "host", "address", "availability", "review_scores", "reviews", "image_embeddings"]:

document[field] = json.dumps(document[field])

# Create a Document object

llama_document = Document(

text=document["description"],

metadata=document,

excluded_llm_metadata_keys=["_id", "transit", "minimum_nights", "maximum_nights", "cancellation_policy", "last_scraped", "calendar_last_scraped", "first_review", "last_review", "security_deposit", "cleaning_fee", "guests_included", "host", "availability", "reviews", "image_embeddings"],

excluded_embed_metadata_keys=["_id", "transit", "minimum_nights", "maximum_nights", "cancellation_policy", "last_scraped", "calendar_last_scraped", "first_review", "last_review", "security_deposit", "cleaning_fee", "guests_included", "host", "availability", "reviews", "image_embeddings"],

metadata_template="{key}=>{value}",

text_template="Metadata: {metadata_str}\n-----\nContent: {content}",

)

llama_documents.append(llama_document)

# Observing input examples

print("\nThe LLM sees this: \n", llama_documents[0].get_content(metadata_mode=MetadataMode.LLM))

print("\nThe Embedding model sees this: \n", llama_documents[0].get_content(metadata_mode=MetadataMode.EMBED))

from llama_index.core.node_parser import SentenceSplitter, SemanticSplitterNodeParser

from llama_index.core.schema import MetadataMode

from tqdm import tqdm

# semantic_splitter = SemanticSplitterNodeParser(

# buffer_size=10, breakpoint_percentile_threshold=95, embed_model=embed_model

# )

base_splitter = SentenceSplitter(chunk_size=5000, chunk_overlap=200)

nodes = base_splitter.get_nodes_from_documents(llama_documents)

# Progress bar

pbar = tqdm(total=len(nodes), desc="Embedding Progress", unit="node")

for node in nodes:

node_embedding = embed_model.get_text_embedding(

node.get_content(metadata_mode=MetadataMode.EMBED)

)

node.embedding = node_embedding

# Update the progress bar

pbar.update(1)

# Close the progress bar

pbar.close()

print("Embedding process completed!")

第五步：MongoDB 設(shè)置

import pymongo

os.environ["MONGO_URI"] = ""

def get_mongo_client(mongo_uri):

"""Establish and validate connection to the MongoDB."""

client = pymongo.MongoClient(mongo_uri, appname="devrel.showcase.python")

# Validate the connection

ping_result = client.admin.command('ping')

if ping_result.get('ok') == 1.0:

# Connection successful

print("Connection to MongoDB successful")

return client

else:

print("Connection to MongoDB failed")

return None

mongo_client = get_mongo_client(mongo_uri)

DB_NAME = "airbnb"

COLLECTION_NAME = "listings_reviews"

db = mongo_client.get_database(DB_NAME)

collection = db.get_collection(COLLECTION_NAME)

第六步：向量數(shù)據(jù)庫集成

from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch

vector_store = MongoDBAtlasVectorSearch(

mongo_client,

db_name=DB_NAME,

collection_name=COLLECTION_NAME,

index_name="vector_index"

)

vector_store.add(nodes)

第七步：創(chuàng)建檢索工具和智能體

from llama_index.core import VectorStoreIndex

from llama_index.core.tools import QueryEngineTool, ToolMetadata

from llama_index.core.agent import FunctionCallingAgentWorker

index = VectorStoreIndex.from_vector_store(vector_store)

query_engine = index.as_query_engine(similarity_top_k=5, llm=llm)

query_engine_tool = QueryEngineTool(

query_engine=query_engine,

metadata=ToolMetadata(

name="knowledge_base",

description=(

"Provides information about Airbnb listings and reviews."

"Use a detailed plain text question as input to the tool."

)

agent_worker = FunctionCallingAgentWorker.from_tools(

[query_engine_tool], llm=llm, verbose=True

)

agent = agent_worker.as_agent()

response = agent.chat("Tell me the best listing for a place in New York")

print(str(response))

結(jié)論

LlamaIndex、Claude-3.5 Sonnet 和 MongoDB 共同構(gòu)建的智能體 RAG 未來充滿可能性。

本文內(nèi)容根據(jù)網(wǎng)絡(luò)資料整理，出于傳遞更多信息之目的，不代表金鑰匙跨境贊同其觀點和立場。

轉(zhuǎn)載請注明，如有侵權(quán)，聯(lián)系刪除。

本文鏈接：http://gantiao.com.cn/post/19143405.html

發(fā)布評論

取消回復(fù)

您暫未設(shè)置收款碼

請在主題配置——文章設(shè)置里上傳

金鑰匙跨境

掃描二維碼手機訪問

文章目錄

<abbr id="mqntu"></abbr>

欧美free性护士vide0shd,老熟女,一区二区三区,久久久久夜夜夜精品国产,久久久久久综合网天天,欧美成人护士h版

基于 LlamaIndex、Claude-3.5 Sonnet 和 MongoDB，構(gòu)建具有超級檢索能力的 AI 智能體

隨便看看

特朗普要求美國最高法院暫停執(zhí)行TikTok強制出售令

最新留言

您暫未設(shè)置收款碼

欧美free性护士vide0shd,老熟女,一区二区三区,久久久久夜夜夜精品国产,久久久久久综合网天天,欧美成人护士h版

基于 LlamaIndex、Claude-3.5 Sonnet 和 MongoDB，構(gòu)建具有超級檢索能力的 AI 智能體

隨便看看

特朗普要求美國最高法院暫停執(zhí)行TikTok強制出售令

最新留言

您暫未設(shè)置收款碼

基于 LlamaIndex、Claude-3.5 Sonnet 和 MongoDB，構(gòu)建具有超級檢索能力的 AI 智能體