seo优化服务公司,秦皇岛网站搜索优化,帮人建网站价格赚钱吗,asp网站模板如何修改langchain-ChatGLM是一个基于本地知识库的LLM对话库。其基于text2vec-large-Chinese为Embedding模型#xff0c;ChatGLM-6B为对话大模型。原项目地址#xff1a;https://github.com/chatchat-space/langchain-ChatGLM
对于如何本地部署ChatGLM模型#xff0c;可以参考我之前…langchain-ChatGLM是一个基于本地知识库的LLM对话库。其基于text2vec-large-Chinese为Embedding模型ChatGLM-6B为对话大模型。原项目地址https://github.com/chatchat-space/langchain-ChatGLM
对于如何本地部署ChatGLM模型可以参考我之前的文章http://t.csdn.cn/16STJ
在本项目中我们编写了langchai-ChatGLM API调用的客户端代码。经过测试虽然客户端可以正常调用服务器的API但是对于删除知识库的指令服务器无法正常执行
1 langchain-ChatGLM API服务器端程序 下面程序段为langchain-ChatGLM项目中的api.py文件
import argparse
import json
import os
import shutil
from typing import List, Optional
import urllibimport nltk
import pydantic
import uvicorn
from fastapi import Body, FastAPI, File, Form, Query, UploadFile, WebSocket
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing_extensions import Annotated
from starlette.responses import RedirectResponsefrom chains.local_doc_qa import LocalDocQA
from configs.model_config import (KB_ROOT_PATH, EMBEDDING_DEVICE,EMBEDDING_MODEL, NLTK_DATA_PATH,VECTOR_SEARCH_TOP_K, LLM_HISTORY_LEN, OPEN_CROSS_DOMAIN)
import models.shared as shared
from models.loader.args import parser
from models.loader import LoaderCheckPointnltk.data.path [NLTK_DATA_PATH] nltk.data.pathclass BaseResponse(BaseModel):code: int pydantic.Field(200, descriptionHTTP status code)msg: str pydantic.Field(success, descriptionHTTP status message)class Config:schema_extra {example: {code: 200,msg: success,}}class ListDocsResponse(BaseResponse):data: List[str] pydantic.Field(..., descriptionList of document names)class Config:schema_extra {example: {code: 200,msg: success,data: [doc1.docx, doc2.pdf, doc3.txt],}}class ChatMessage(BaseModel):question: str pydantic.Field(..., descriptionQuestion text)response: str pydantic.Field(..., descriptionResponse text)history: List[List[str]] pydantic.Field(..., descriptionHistory text)source_documents: List[str] pydantic.Field(..., descriptionList of source documents and their scores)class Config:schema_extra {example: {question: 工伤保险如何办理,response: 根据已知信息可以总结如下\n\n1. 参保单位为员工缴纳工伤保险费以保障员工在发生工伤时能够获得相应的待遇。\n2. 不同地区的工伤保险缴费规定可能有所不同需要向当地社保部门咨询以了解具体的缴费标准和规定。\n3. 工伤从业人员及其近亲属需要申请工伤认定确认享受的待遇资格并按时缴纳工伤保险费。\n4. 工伤保险待遇包括工伤医疗、康复、辅助器具配置费用、伤残待遇、工亡待遇、一次性工亡补助金等。\n5. 工伤保险待遇领取资格认证包括长期待遇领取人员认证和一次性待遇领取人员认证。\n6. 工伤保险基金支付的待遇项目包括工伤医疗待遇、康复待遇、辅助器具配置费用、一次性工亡补助金、丧葬补助金等。,history: [[工伤保险是什么,工伤保险是指用人单位按照国家规定为本单位的职工和用人单位的其他人员缴纳工伤保险费由保险机构按照国家规定的标准给予工伤保险待遇的社会保险制度。,]],source_documents: [出处 [1] 广州市单位从业的特定人员参加工伤保险办事指引.docx\n\n\t( 一) 从业单位 (组织) 按“自愿参保”原则 为未建 立劳动关系的特定从业人员单项参加工伤保险 、缴纳工伤保 险费。,出处 [2] ...,出处 [3] ...,],}}def get_folder_path(local_doc_id: str):return os.path.join(KB_ROOT_PATH, local_doc_id, content)def get_vs_path(local_doc_id: str):return os.path.join(KB_ROOT_PATH, local_doc_id, vector_store)def get_file_path(local_doc_id: str, doc_name: str):return os.path.join(KB_ROOT_PATH, local_doc_id, content, doc_name)async def upload_file(file: UploadFile File(descriptionA single binary file),knowledge_base_id: str Form(..., descriptionKnowledge Base Name, examplekb1),
):saved_path get_folder_path(knowledge_base_id)if not os.path.exists(saved_path):os.makedirs(saved_path)file_content await file.read() # 读取上传文件的内容file_path os.path.join(saved_path, file.filename)if os.path.exists(file_path) and os.path.getsize(file_path) len(file_content):file_status f文件 {file.filename} 已存在。return BaseResponse(code200, msgfile_status)with open(file_path, wb) as f:f.write(file_content)vs_path get_vs_path(knowledge_base_id)vs_path, loaded_files local_doc_qa.init_knowledge_vector_store([file_path], vs_path)if len(loaded_files) 0:file_status f文件 {file.filename} 已上传至新的知识库并已加载知识库请开始提问。return BaseResponse(code200, msgfile_status)else:file_status 文件上传失败请重新上传return BaseResponse(code500, msgfile_status)async def upload_files(files: Annotated[List[UploadFile], File(descriptionMultiple files as UploadFile)],knowledge_base_id: str Form(..., descriptionKnowledge Base Name, examplekb1),
):saved_path get_folder_path(knowledge_base_id)if not os.path.exists(saved_path):os.makedirs(saved_path)filelist []for file in files:file_content file_path os.path.join(saved_path, file.filename)file_content file.file.read()if os.path.exists(file_path) and os.path.getsize(file_path) len(file_content):continuewith open(file_path, ab) as f:f.write(file_content)filelist.append(file_path)if filelist:vs_path, loaded_files local_doc_qa.init_knowledge_vector_store(filelist, get_vs_path(knowledge_base_id))if len(loaded_files):file_status fdocuments {, .join([os.path.split(i)[-1] for i in loaded_files])} upload successreturn BaseResponse(code200, msgfile_status)file_status fdocuments {, .join([os.path.split(i)[-1] for i in loaded_files])} upload failreturn BaseResponse(code500, msgfile_status)async def list_kbs():# Get List of Knowledge Baseif not os.path.exists(KB_ROOT_PATH):all_doc_ids []else:all_doc_ids [folderfor folder in os.listdir(KB_ROOT_PATH)if os.path.isdir(os.path.join(KB_ROOT_PATH, folder))and os.path.exists(os.path.join(KB_ROOT_PATH, folder, vector_store, index.faiss))]return ListDocsResponse(dataall_doc_ids)async def list_docs(knowledge_base_id: Optional[str] Query(defaultNone, descriptionKnowledge Base Name, examplekb1)
):local_doc_folder get_folder_path(knowledge_base_id)if not os.path.exists(local_doc_folder):return {code: 1, msg: fKnowledge base {knowledge_base_id} not found}all_doc_names [docfor doc in os.listdir(local_doc_folder)if os.path.isfile(os.path.join(local_doc_folder, doc))]return ListDocsResponse(dataall_doc_names)async def delete_kb(knowledge_base_id: str Query(...,descriptionKnowledge Base Name,examplekb1),
):# TODO: 确认是否支持批量删除知识库knowledge_base_id urllib.parse.unquote(knowledge_base_id)if not os.path.exists(get_folder_path(knowledge_base_id)):return {code: 1, msg: fKnowledge base {knowledge_base_id} not found}shutil.rmtree(get_folder_path(knowledge_base_id))# self-added code#shutil.rmtree(get_vs_path(knowledge_base_id))# /self-added codereturn BaseResponse(code200, msgfKnowledge Base {knowledge_base_id} delete success)async def delete_doc(knowledge_base_id: str Query(...,descriptionKnowledge Base Name,examplekb1),doc_name: str Query(None, descriptiondoc name, exampledoc_name_1.pdf),
):knowledge_base_id urllib.parse.unquote(knowledge_base_id)if not os.path.exists(get_folder_path(knowledge_base_id)):return {code: 1, msg: fKnowledge base {knowledge_base_id} not found}doc_path get_file_path(knowledge_base_id, doc_name)if os.path.exists(doc_path):os.remove(doc_path)remain_docs await list_docs(knowledge_base_id)if len(remain_docs.data) 0:shutil.rmtree(get_folder_path(knowledge_base_id), ignore_errorsTrue)return BaseResponse(code200, msgfdocument {doc_name} delete success along with the whole knowledge base)else:status local_doc_qa.delete_file_from_vector_store(doc_path, get_vs_path(knowledge_base_id))if success in status:return BaseResponse(code200, msgfdocument {doc_name} delete success)else:return BaseResponse(code1, msgfdocument {doc_name} delete fail)else:return BaseResponse(code1, msgfdocument {doc_name} not found)async def update_doc(knowledge_base_id: str Query(...,description知识库名,examplekb1),old_doc: str Query(None, description待删除文件名已存储在知识库中, exampledoc_name_1.pdf),new_doc: UploadFile File(description待上传文件),
):knowledge_base_id urllib.parse.unquote(knowledge_base_id)if not os.path.exists(get_folder_path(knowledge_base_id)):return {code: 1, msg: fKnowledge base {knowledge_base_id} not found}doc_path get_file_path(knowledge_base_id, old_doc)if not os.path.exists(doc_path):return BaseResponse(code1, msgfdocument {old_doc} not found)else:os.remove(doc_path)delete_status local_doc_qa.delete_file_from_vector_store(doc_path, get_vs_path(knowledge_base_id))if fail in delete_status:return BaseResponse(code1, msgfdocument {old_doc} delete failed)else:saved_path get_folder_path(knowledge_base_id)if not os.path.exists(saved_path):os.makedirs(saved_path)file_content await new_doc.read() # 读取上传文件的内容file_path os.path.join(saved_path, new_doc.filename)if os.path.exists(file_path) and os.path.getsize(file_path) len(file_content):file_status fdocument {new_doc.filename} already existsreturn BaseResponse(code200, msgfile_status)with open(file_path, wb) as f:f.write(file_content)vs_path get_vs_path(knowledge_base_id)vs_path, loaded_files local_doc_qa.init_knowledge_vector_store([file_path], vs_path)if len(loaded_files) 0:file_status fdocument {old_doc} delete and document {new_doc.filename} upload successreturn BaseResponse(code200, msgfile_status)else:file_status fdocument {old_doc} success but document {new_doc.filename} upload failreturn BaseResponse(code500, msgfile_status)async def local_doc_chat(knowledge_base_id: str Body(..., descriptionKnowledge Base Name, examplekb1),question: str Body(..., descriptionQuestion, example工伤保险是什么),history: List[List[str]] Body([],descriptionHistory of previous questions and answers,example[[工伤保险是什么,工伤保险是指用人单位按照国家规定为本单位的职工和用人单位的其他人员缴纳工伤保险费由保险机构按照国家规定的标准给予工伤保险待遇的社会保险制度。,]],),
):vs_path get_vs_path(knowledge_base_id)if not os.path.exists(vs_path):# return BaseResponse(code1, msgfKnowledge base {knowledge_base_id} not found)return ChatMessage(questionquestion,responsefKnowledge base {knowledge_base_id} not found,historyhistory,source_documents[],)else:for resp, history in local_doc_qa.get_knowledge_based_answer(queryquestion, vs_pathvs_path, chat_historyhistory, streamingTrue):passsource_documents [f出处 [{inum 1}] {os.path.split(doc.metadata[source])[-1]}\n\n{doc.page_content}\n\nf相关度{doc.metadata[score]}\n\nfor inum, doc in enumerate(resp[source_documents])]return ChatMessage(questionquestion,responseresp[result],historyhistory,source_documentssource_documents,)async def bing_search_chat(question: str Body(..., descriptionQuestion, example工伤保险是什么),history: Optional[List[List[str]]] Body([],descriptionHistory of previous questions and answers,example[[工伤保险是什么,工伤保险是指用人单位按照国家规定为本单位的职工和用人单位的其他人员缴纳工伤保险费由保险机构按照国家规定的标准给予工伤保险待遇的社会保险制度。,]],),
):for resp, history in local_doc_qa.get_search_result_based_answer(queryquestion, chat_historyhistory, streamingTrue):passsource_documents [f出处 [{inum 1}] [{doc.metadata[source]}]({doc.metadata[source]}) \n\n{doc.page_content}\n\nfor inum, doc in enumerate(resp[source_documents])]return ChatMessage(questionquestion,responseresp[result],historyhistory,source_documentssource_documents,)async def chat(question: str Body(..., descriptionQuestion, example工伤保险是什么),history: List[List[str]] Body([],descriptionHistory of previous questions and answers,example[[工伤保险是什么,工伤保险是指用人单位按照国家规定为本单位的职工和用人单位的其他人员缴纳工伤保险费由保险机构按照国家规定的标准给予工伤保险待遇的社会保险制度。,]],),
):for answer_result in local_doc_qa.llm.generatorAnswer(promptquestion, historyhistory,streamingTrue):resp answer_result.llm_output[answer]history answer_result.historypassreturn ChatMessage(questionquestion,responseresp,historyhistory,source_documents[],)async def stream_chat(websocket: WebSocket, knowledge_base_id: str):await websocket.accept()turn 1while True:input_json await websocket.receive_json()question, history, knowledge_base_id input_json[question], input_json[history], input_json[knowledge_base_id]vs_path get_vs_path(knowledge_base_id)if not os.path.exists(vs_path):await websocket.send_json({error: fKnowledge base {knowledge_base_id} not found})await websocket.close()returnawait websocket.send_json({question: question, turn: turn, flag: start})last_print_len 0for resp, history in local_doc_qa.get_knowledge_based_answer(queryquestion, vs_pathvs_path, chat_historyhistory, streamingTrue):await websocket.send_text(resp[result][last_print_len:])last_print_len len(resp[result])source_documents [f出处 [{inum 1}] {os.path.split(doc.metadata[source])[-1]}\n\n{doc.page_content}\n\nf相关度{doc.metadata[score]}\n\nfor inum, doc in enumerate(resp[source_documents])]await websocket.send_text(json.dumps({question: question,turn: turn,flag: end,sources_documents: source_documents,},ensure_asciiFalse,))turn 1async def document():return RedirectResponse(url/docs)def api_start(host, port):global appglobal local_doc_qallm_model_ins shared.loaderLLM()llm_model_ins.set_history_len(LLM_HISTORY_LEN)app FastAPI()# Add CORS middleware to allow all origins# 在config.py中设置OPEN_DOMAINTrue允许跨域# set OPEN_DOMAINTrue in config.py to allow cross-domainif OPEN_CROSS_DOMAIN:app.add_middleware(CORSMiddleware,allow_origins[*],allow_credentialsTrue,allow_methods[*],allow_headers[*],)app.websocket(/local_doc_qa/stream-chat/{knowledge_base_id})(stream_chat)app.get(/, response_modelBaseResponse)(document)app.post(/chat, response_modelChatMessage)(chat)app.post(/local_doc_qa/upload_file, response_modelBaseResponse)(upload_file)app.post(/local_doc_qa/upload_files, response_modelBaseResponse)(upload_files)app.post(/local_doc_qa/local_doc_chat, response_modelChatMessage)(local_doc_chat)app.post(/local_doc_qa/bing_search_chat, response_modelChatMessage)(bing_search_chat)app.get(/local_doc_qa/list_knowledge_base, response_modelListDocsResponse)(list_kbs)app.get(/local_doc_qa/list_files, response_modelListDocsResponse)(list_docs)app.delete(/local_doc_qa/delete_knowledge_base, response_modelBaseResponse)(delete_kb)app.delete(/local_doc_qa/delete_file, response_modelBaseResponse)(delete_doc)app.post(/local_doc_qa/update_file, response_modelBaseResponse)(update_doc)local_doc_qa LocalDocQA()local_doc_qa.init_cfg(llm_modelllm_model_ins,embedding_modelEMBEDDING_MODEL,embedding_deviceEMBEDDING_DEVICE,top_kVECTOR_SEARCH_TOP_K,)uvicorn.run(app, hosthost, portport)if __name__ __main__:parser.add_argument(--host, typestr, default0.0.0.0)parser.add_argument(--port, typeint, default7861)# 初始化消息args Noneargs parser.parse_args()args_dict vars(args)shared.loaderCheckPoint LoaderCheckPoint(args_dict)api_start(args.host, args.port)
我们可以在api_start方法中看到服务器程序开放了以下的API 1 upload_file 上传一个文件 2 upload_files 上传多个文件 3 local_doc_chat 基于本地知识库进行对话 4 bing_search_chat 基于bing搜索进行对话 5 list_knowledge_base 列出所有的知识库 6 list_files 列出一个知识库下面所有文件 7 delete_knowledge_base 删除一个知识库 8 delete_file 删除某一个知识库下面一个文件 9 update_file 将某一知识库里一个文件替换为另一个问题
另外在文件最上方定义了三种服务器返回的消息类型BaseResponseListDocsResponseChatMessage。在api_start方法中可以找到每一个API对于的返回消息类型。各个消息均为json文件
基于以上内容我对除了bing_search_chat之外所有的API编写了对应的客户端。使用python的requests类发送HTTP请求并获取回应。下面是完整代码
import requestsAPI_BASE_URL http://localhost:7861 # the servers url
API_KB_URL API_BASE_URL /local_doc_qa # the url for knowledge base answer# upload local file for knowledge base
def upload_file(knowledge_base_id, file_path):with open(file_path, rb) as file:files {file: file}data {knowledge_base_id: knowledge_base_id}try:response requests.post(API_KB_URL /upload_file, datadata, filesfiles)response.raise_for_status()#print(fFile {file_path} uploaded successfully to knowledge base {knowledge_base_id}.)print_msg(response)except requests.exceptions.RequestException as e:print(fFailed to upload {file_path} knowledge base: {e})# upload multiple files for knowledge base
def upload_files(knowledge_base_id, file_paths):files [(ffiles, open(file_path, rb)) for file_path in file_paths]data {knowledge_base_id: knowledge_base_id}try:response requests.post(API_KB_URL /upload_files, datadata, filesfiles)response.raise_for_status()#print(fFile {file_paths} uploaded successfully to knowledge base {knowledge_base_id}.)print_msg(response)except requests.exceptions.RequestException as e:print(fFailed to upload {file_paths} knowledge base: {e})# replace an existing file with another one
def update_file(knowledge_base_id, old_file, new_file):files {new_doc: open(new_file, rb)}params {knowledge_base_id: knowledge_base_id, old_doc: old_file}try:response requests.post(API_KB_URL /update_file, paramsparams, filesfiles)response.raise_for_status()#print(fReplace {old_file} with {new_file} in knowledge base {knowledge_base_id})print_msg(response)except requests.exceptions.RequestException as e:print(fFail to update file {new_file} to knowledge base)# chat with chatglm
def chat_with_llm(question, knowledge_base_idNone, historyNone):# use chat with knowledge base (if knowledge base is available), or chat with LLMurl API_KB_URL /local_doc_chat if knowledge_base_id else API_BASE_URL /chatdata {question: question,history: history or [],}if knowledge_base_id:data[knowledge_base_id] knowledge_base_idtry:# send request to LLMresponse requests.post(url, jsondata)response.raise_for_status()chat_response response.json()print(LLM Response:, chat_response.get(response))print(Reference:, chat_response.get(source_documents))return chat_responseexcept requests.exceptions.RequestException as e:print(fError while chatting with LLM: {e})return None# list knowledge bases
def list_kbs():try:response requests.get(API_KB_URL /list_knowledge_base)response.raise_for_status()kbs response.json()print(List of Knowledge Bases:)for kb in kbs.get(data):print(kb)return kbsexcept requests.exceptions.RequestException as e:print(fError while listing knowledge bases: {e})return None# list documents in a knowledge base
def list_files(knowledge_base_id):try:response requests.get(API_KB_URL /list_files, params{knowledge_base_id:knowledge_base_id})response.raise_for_status()docs response.json()print(fList of Documents in Knowledge Base {knowledge_base_id}:)for doc in docs.get(data):print(doc)return docsexcept requests.exceptions.RequestException as e:print(fError while listing documents: {e})return None# delete a knowledge base
def delete_knowledge_base(knowledge_base_id):param {knowledge_base_id: knowledge_base_id}try:response requests.delete(API_KB_URL /delete_knowledge_base, paramsparam)response.raise_for_status()print_msg(response)except requests.exceptions.RequestException as e:print(fError while deleting knowledge base: {e})# delete a single file from a selected knowledge base
def delete_file(knowledge_base_id, file_path):params {knowledge_base_id: knowledge_base_id,doc_name: file_path}try:response requests.delete(API_KB_URL /delete_file, paramsparams)response.raise_for_status()print_msg(response)except requests.exceptions.RequestException as e:print(fFailed to delete file: {e})# print the status information returned by the server (for base and listdoc response only)
def print_msg(response):print(response.json().get(msg))
程序说明 1
def print_msg(response):print(response.json().get(msg))该方法用于打出服务器回复信息中msg部分
2
API_BASE_URL http://localhost:7861 # the servers url
API_KB_URL API_BASE_URL /local_doc_qa # the url for knowledge base answer
这里本地服务器默认的url为http://localhost:7861可以在服务器配置文件中看到所有和知识库相关的API的endpoint均为/local_doc_qa/(API名称)而不使用知识库直接和模型对话的时候endpoint直接为/chat
3 程序主体内容就是对各个API客户端方法的实现。其实现原理都差不多均为传入参数并封装为json文件然后在服务器请求的参数中将json文件传入服务器再等待服务器的回复。其中要注意的是upload_file, upload_files, local_doc_chat, chat, update_file使用POST指令。list_knowledge_baselist_files使用GET指令。delete_knowledge_base和delete_file使用DELETE指令
在程序完成后我对各个指令的实现进行测试
1 测试使用的数据 为了测试模型是否可以参考知识库内容进行回答我找了3篇关于番茄种植的文段并提出了一个需要结合三篇文段内容的知识型问题
文段 文本1种植番茄的方法 番茄是家庭菜园中广受欢迎的水果没错它们是水果因其多功能和美味而备受喜爱。要成功种植番茄请按照以下步骤进行 选择合适的位置番茄需要每天至少6-8小时的直接阳光。在花园中选一个能充分接受阳光的地方。 准备土壤番茄喜欢排水良好、富含有机物质的土壤。在种植前将堆肥或腐熟的粪肥混入土壤中以改善其肥力。 种植在您所在地区的最后霜日期后种植番茄苗。挖一个略深于苗木根系的洞将苗木放入洞中用土填充并轻轻压实。 浇水在种植后立即给新种植的苗木浇水。一旦苗木生根要定期浇水每周浇水量约为1-1.5英寸。 支撑番茄是藤本植物需要支撑才能向上生长。用桩或笼子支撑植物防止其蔓延在地面上。 修剪定期修剪掉主干和侧枝之间生长的小枝以便将能量集中在结果上。 施肥在种植后几周以及出现第一批果实时施用均衡肥料。 文本2常见的番茄害虫和疾病 种植番茄虽然令人满足但也可能面临与害虫和疾病有关的挑战。您可能会遇到以下一些常见问题 蚜虫这些微小的昆虫会吸取植物的汁液导致植物生长受阻叶子变形。 枯萎病早疫病和晚疫病都可能影响番茄导致叶子和果实上出现黑斑最终导致植物死亡。 烟粉虱这些小型飞虫会吸食植物汁液并排泄粘性的蜜露吸引霉菌并导致叶片发黄。 枯萎蔓枝菌这是一种土壤传播的真菌会导致植株下部叶片枯萎和发黄最终导致植株死亡。 大黄螟这些大型的绿色毛虫如果不加以控制可能会迅速让番茄植株叶片凋萎。 文本3值得一试的番茄品种 番茄有各种不同的形状、大小和颜色每种都有独特的口味和用途。考虑种植这些受欢迎的番茄品种 罗马番茄由于其肉质的质地和较少的种子非常适合制作酱汁和罐装食品。 小番茄小巧、甜美非常适合作为零食用于沙拉或装饰菜肴。 牛腿番茄个头大而多汁非常适合制作三明治和切片。 传统品种这些是口味和外观独特的老品种通常代代相传。 葡萄番茄椭圆形且甜美非常适合用于沙拉和烤制。 绿色斑纹番茄一种味道略带酸味的绿色条纹番茄非常适合用于沙拉和莎莎酱。 问题如何防止番茄植株蔓延在地面上枯萎蔓枝菌的常见症状是什么以及哪种番茄品种最适合制作酱汁和罐装食品 2 启动测试 先开启一个终端运行langchain-ChatGLM项目下面的api.py程序启动服务器 python api.py 单纯为了测试我这里直接把要调用的API写到客户端程序的主方法里然后运行客户端。在客户端终端就可以看到程序打印出的回复
在使用upload_file上传单个文件upload_files上传多个文件list_knowledge_base列出知识库list_files列出知识库内文件和chat_with_llm和模型对话这几项都成功。模型也可以结合知识库内容回答并给出对应索引位置
但是在使用所有和删除相关的方法时包括delete_knowledge_base delete_file update_file会出现以下问题
1 有些时候即使删除了某个文件模型依然会引用已删除的文件。我一开始以为是历史对话造成的结果但是在重启模型后该问题依然出现。
在知识库本地储存的位置中知识库content文件夹储存文本的文件夹下要删除的文件的确消失了但是有可能该文件对应的向量库并没有成功被移除。根据api.py文件里内容在向量库里移除文件的方法应该是local_doc_qa.delete_file_from_vector_store但是我目前还没有深入去研究local_doc_qa里的内容
我也尝试了在删除文件时同时删除该文件夹下面的vector_store文件夹shutil.rmtree(get_vs_path(knowledge_base_id, doc_name)) 这么做会导致LLM的知识库引用直接为空。向量库只有一个文件应该包含了该知识库所有文件转化的向量所有没法从中删除单个文件内容
2 在删除文件后进行提问会出现以下报错 INFO: 127.0.0.1:56826 - “POST /local_doc_qa/local_doc_chat HTTP/1.1” 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File “/home/pai/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py”, line 429, in run_asgi result await app( # type: ignore[func-returns-value] File “/home/pai/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py”, line 78, in call return await self.app(scope, receive, send) File “/home/pai/lib/python3.9/site-packages/fastapi/applications.py”, line 276, in call await super().call(scope, receive, send) File “/home/pai/lib/python3.9/site-packages/starlette/applications.py”, line 122, in call await self.middleware_stack(scope, receive, send) File “/home/pai/lib/python3.9/site-packages/starlette/middleware/errors.py”, line 184, in call raise exc File “/home/pai/lib/python3.9/site-packages/starlette/middleware/errors.py”, line 162, in call await self.app(scope, receive, _send) File “/home/pai/lib/python3.9/site-packages/starlette/middleware/exceptions.py”, line 79, in call raise exc File “/home/pai/lib/python3.9/site-packages/starlette/middleware/exceptions.py”, line 68, in call await self.app(scope, receive, sender) File “/home/pai/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py”, line 21, in call raise e File “/home/pai/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py”, line 18, in call await self.app(scope, receive, send) File “/home/pai/lib/python3.9/site-packages/starlette/routing.py”, line 718, in call await route.handle(scope, receive, send) File “/home/pai/lib/python3.9/site-packages/starlette/routing.py”, line 276, in handle await self.app(scope, receive, send) File “/home/pai/lib/python3.9/site-packages/starlette/routing.py”, line 66, in app response await func(request) File “/home/pai/lib/python3.9/site-packages/fastapi/routing.py”, line 237, in app raw_response await run_endpoint_function( File “/home/pai/lib/python3.9/site-packages/fastapi/routing.py”, line 163, in run_endpoint_function return await dependant.call(**values) File “/mnt/workspace/langchain-ChatGLM/api.py”, line 293, in local_doc_chat for resp, history in local_doc_qa.get_knowledge_based_answer( File “/mnt/workspace/langchain-ChatGLM/chains/local_doc_qa.py”, line 231, in get_knowledge_based_answer related_docs_with_score vector_store.similarity_search_with_score(query, kself.top_k) File “/home/pai/lib/python3.9/site-packages/langchain/vectorstores/faiss.py”, line 221, in similarity_search_with_score docs self.similarity_search_with_score_by_vector(embedding, k) File “/mnt/workspace/langchain-ChatGLM/vectorstores/MyFAISS.py”, line 86, in similarity_search_with_score_by_vector _id0 self.index_to_docstore_id[l] KeyError: 23 INFO: 127.0.0.1:57038 - “GET /local_doc_qa/list_knowledge_base HTTP/1.1” 200 OK INFO: 127.0.0.1:57042 - “GET /local_doc_qa/list_files?knowledge_base_idpizza HTTP/1.1” 200 OK INFO: 127.0.0.1:57046 - “POST /local_doc_qa/local_doc_chat HTTP/1.1” 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File “/home/pai/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py”, line 429, in run_asgi result await app( # type: ignore[func-returns-value] File “/home/pai/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py”, line 78, in call return await self.app(scope, receive, send) File “/home/pai/lib/python3.9/site-packages/fastapi/applications.py”, line 276, in call await super().call(scope, receive, send) File “/home/pai/lib/python3.9/site-packages/starlette/applications.py”, line 122, in call await self.middleware_stack(scope, receive, send) File “/home/pai/lib/python3.9/site-packages/starlette/middleware/errors.py”, line 184, in call raise exc File “/home/pai/lib/python3.9/site-packages/starlette/middleware/errors.py”, line 162, in call await self.app(scope, receive, _send) File “/home/pai/lib/python3.9/site-packages/starlette/middleware/exceptions.py”, line 79, in call raise exc File “/home/pai/lib/python3.9/site-packages/starlette/middleware/exceptions.py”, line 68, in call await self.app(scope, receive, sender) File “/home/pai/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py”, line 21, in call raise e File “/home/pai/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py”, line 18, in call await self.app(scope, receive, send) File “/home/pai/lib/python3.9/site-packages/starlette/routing.py”, line 718, in call await route.handle(scope, receive, send) File “/home/pai/lib/python3.9/site-packages/starlette/routing.py”, line 276, in handle await self.app(scope, receive, send) File “/home/pai/lib/python3.9/site-packages/starlette/routing.py”, line 66, in app response await func(request) File “/home/pai/lib/python3.9/site-packages/fastapi/routing.py”, line 237, in app raw_response await run_endpoint_function( File “/home/pai/lib/python3.9/site-packages/fastapi/routing.py”, line 163, in run_endpoint_function return await dependant.call(**values) File “/mnt/workspace/langchain-ChatGLM/api.py”, line 293, in local_doc_chat for resp, history in local_doc_qa.get_knowledge_based_answer( File “/mnt/workspace/langchain-ChatGLM/chains/local_doc_qa.py”, line 231, in get_knowledge_based_answer related_docs_with_score vector_store.similarity_search_with_score(query, kself.top_k) File “/home/pai/lib/python3.9/site-packages/langchain/vectorstores/faiss.py”, line 221, in similarity_search_with_score docs self.similarity_search_with_score_by_vector(embedding, k) File “/mnt/workspace/langchain-ChatGLM/vectorstores/MyFAISS.py”, line 86, in similarity_search_with_score_by_vector _id0 self.index_to_docstore_id[l] KeyError: 23 这段报错信息似乎说明各个文件的文本向量库是以字典形式储存的。在删除一个文件后某一个key对应的value缺失导致报错。由这一点基本上可以说明langchain-ChatGLM官方给出的api.py关于删除知识库文件的部分是有bug的。我尝试了对api.py进行修改但目前还没有成功。
3 LLM只采用知识库中一个文件的内容。在我使用种植番茄的例子之前我使用的是一个做披萨的例子传入了3个披萨配方然后问模型“怎么做披萨”.模型自始至终只会基于第一个配方回答但是在引用文段中包括了其他两个配方的内容。最后发现可能的问题是第一个配方中每一条内容之间都有空格换行而其他两个配方内容是连在一起的。有可能embedding模型在对文本分段过程中受到了格式的影响。因此在传入知识库文件时要尽量保持文件格式一致。
以上是我开发langchain-ChatGLM客户端的记录。如果大家谁发现了我提到这两个遗留问题的解决方法也请帮忙告诉我