东莞国网站建设,外贸网站建设报价表,浙江省工程造价信息网,桂林漓江水位摘要 
本文主要对大模型WizardLM的基本信息进行了简单介绍#xff0c;展示了WizardLM取得的优秀性能#xff0c;分析了论文的核心——指令进化方法。 
论文概述 
基本信息 
英文标题#xff1a;WizardLM: Empowering Large Language Models to Follow Complex Instructions中…摘要 
本文主要对大模型WizardLM的基本信息进行了简单介绍展示了WizardLM取得的优秀性能分析了论文的核心——指令进化方法。 
论文概述 
基本信息 
英文标题WizardLM: Empowering Large Language Models to Follow Complex Instructions中文标题WizardLM:授权大型语言模型遵循复杂的指令发表时间2023年4月-arxiv作者单位北京大学  微软论文链接https://arxiv.org/abs/2304.12244代码链接GitHub - nlpxucan/WizardLM: Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder and WizardMath 摘要 
论文展示了使用LLM而不是人工来创建具有不同复杂程度的大量指令数据的途径。从一组初始指令开始通过进化指令逐步将它们重写为更复杂的指令。然后将生成的所有指令数据进行混合来微调LLaMA。论文将生成的模型称为WizardLM。在复杂平衡测试平台和Vicuna测试集上的人类评估表明来自evolution - instruct的指令优于人类创造的指令。通过分析高复杂性部分的人工评估结果论文证明了WizardLM模型的输出比OpenAIChatGPT的输出更受欢迎。在GPT-4自动评估中WizardLM在29项技能中的17项达到了ChatGPT 90%以上的能力 WizardLM模型性能优越可以作为text2sql的基座模型github上有个DB-GPT-Hub项目开源了大模型微调text2sql的pipline模型支持也有WizardLM模型这是DB-GPT项目的子项目其中提供了数据集下载-数据集预处理-模型下载-模型微调-模型权重合并-模型预测-模型评估如果没有GPU可以使用AutoDL平台按需使用。 DB-GPT项目目前已有6.4k star可以关注一波目前该项目最新版本——DB-GPT V0.3.7 发布支持用自然语言分析和查询Excel表格数据 DB-GPT_Hub项目目前有200多star专注于text2sql大模型微调领域大家也可以去贡献代码比如模型支持里面也有WizardLM。 WizardLM模型的思想值得借鉴后面还有模型Code Llama更加出色后面再介绍。 结果 
收集测试集 
网上收集的指令测试集总共218个例子分成了29项类别比如有数学math、代码生成、写作等等。图3a说明了测试集中实例和技能的分布。测试集由218个实例组成每个实例都是针对特定技能的指令。图3b比较了和Vicuna小羊驼、Alpaca羊驼 人工打分评估 
为了评估WizardLM在evolution - directive测试集上进行了人类评估。我们在WizardLM和基线之间进行盲两两比较。具体来说招募了10名受过良好教育的注释员。对于每个注释者提供了来自Alpaca、Vicuna-7b、WizardLM和ChatGPT的四个响应这些响应被随机打乱以隐藏其来源。然后评注者根据附录h中的标准判断哪一个回答更好然后他们应该将四个回答从1到5进行排序(1表示最好)并允许同等分数的可比较实例。 
比如图4a中Evol-Instruct testset数据集上跟ChatGPT相比WizardLM赢了61次ChatGPT赢了89次平局68次。总共218 GPT4自动评估 
如图5a和5b所示WizardLM-78.0%在evolo-instruct测试集上的性能明显优于Alpaca-7B-71.8%和Vicuna-7B-72.2%(分别优于Alpaca-7B和Vicuna-7B的性能6.2%和5.8%) 图6比较了WizardLM和ChatGPT在evolution - directive测试集上的技能水平。结果表明WizardLM的平均性能达到了ChatGPT的78%17项技能的容量几乎超过了90%。然而WizardLM在代码、数学和推理场景方面遇到了困难显示出与ChatGPT的明显差距。所以后面有WizardCoder 结论 
本文提出了一种进化算法——evolution-directive用于生成多种复杂的LLM指令数据。论文证明提出的方法提高了LLM的性能WizardLM在高复杂性任务上取得了最先进的结果在其他指标上取得了具有竞争力的结果。 
局限性评估方法本文承认我们的自动GPT-4和人工评估方法的局限性。这种方法对可扩展性和可靠性提出了挑战。此外我们的测试集可能无法代表LLM可以应用或与其他方法进行比较的所有场景或领域。 更广泛的影响。evolo - instruct可以提高LLM在各个领域和应用中的性能和交互性但它也可能产生不道德、有害或误导性的指令。因此我们敦促未来对人工智能进化指令的研究以解决伦理和社会影响。 核心思想 这个图看着还挺有意思的 很简约 图形化很不错 只不过作为模型核心结构会有点懵 instruction data evolution指令数据演化 
输入指令I1-instruction通过LLM得到答复R1-response 
输入指令I2-instruction通过LLM得到答复R2-response 
不断迭代 指令I1如何更新为指令2 
通过LLM instruction evolution prompt 指令进化提示词 
instruction evolution prompt是什么 
参考下方的指令进化器 Automatic Instruction Data Evolution自动指令数据演化 
pipline 分成3个部分 
1)指令进化2)响应生成3)消除进化即过滤无法进化的指令。 指令进化instruction evolution 
作者发现LLM可以使用特定的提示使给定的指令变得更加复杂和困难。此外它们可以生成同样复杂但完全不同的全新指令。 
利用这一发现我们可以迭代地进化一个初始指令数据集提高难度水平扩大其丰富性和多样性。 
1.用给定的初始指令数据集D(0)初始化指令池。 
2.在每个进化时期从前一个时期升级的指令从池中取出。 
3.然后利用指令进化器instruction evolver来进化每条获取到的指令并利用指令消除器instruction eliminator来检查是否存在进化失败的指令。 
成功进化的指令被添加到池中不成功的指令被放回原处希望在下一个进化时期成功升级它们。 指令进化器instruction evolver 
指令进化器是一种LLM它使用提示来进化指令有两种类型:深度进化和广度进化。 深度进化 
深度进化通过五种类型的提示来增强指令的复杂性和难度: 
添加约束使得深度化使得具体化增加推理步骤使输入变得复杂化。 
举例子 
这是添加约束add contraints 
I want you act as a Prompt Rewriter.
Your objective is to rewrite a given prompt into a more complex version to make those famous AI systems (e.g., ChatGPT and GPT4) a bit harder to handle.
But the rewritten prompt must be reasonable and must be understood and responded by humans.
Your rewriting cannot omit the non-text parts such as the table and code in #Given Prompt#:. Also, please do not omit the input in #Given Prompt#.
You SHOULD complicate the given prompt using the following method:
Please add one more constraints/requirements into #Given Prompt#
You should try your best not to make the #Rewritten Prompt# become verbose, #Rewritten Prompt# can only add 10 to 20 words into #Given Prompt#.
‘#Given Prompt#’, ‘#Rewritten Prompt#’, ‘given prompt’ and ‘rewritten prompt’ are not allowed to appear in #Rewritten Prompt#
#Given Prompt#:
Here is instruction.
#Rewritten Prompt#: 
这是Deepening Prompt深化 
I want you act as a Prompt Rewriter.
Your objective is to rewrite a given prompt into a more complex version to make those famous AI systems (e.g., ChatGPT and GPT4) a bit harder to handle.
But the rewritten prompt must be reasonable and must be understood and responded by humans.
Your rewriting cannot omit the non-text parts such as the table and code in #Given Prompt#:. Also, please do not omit the input in #Given Prompt#.
You SHOULD complicate the given prompt using the following method:
If #Given Prompt# contains inquiries about certain issues, the depth and breadth of the inquiry can be increased. or
You should try your best not to make the #Rewritten Prompt# become verbose, #Rewritten Prompt# can only add 10 to 20 words into #Given Prompt#.
‘#Given Prompt#’, ‘#Rewritten Prompt#’, ‘given prompt’ and ‘rewritten prompt’ are not allowed to appear in #Rewritten Prompt#
#Given Prompt#:
Here is instruction.
#Rewritten Prompt#: 
这是具体化Concretizing Pormpt 
I want you act as a Prompt Rewriter.
Your objective is to rewrite a given prompt into a more complex version to make those famous AI systems (e.g., ChatGPT and GPT4) a bit harder to handle.
But the rewritten prompt must be reasonable and must be understood and responded by humans.
Your rewriting cannot omit the non-text parts such as the table and code in #Given Prompt#:. Also, please do not omit the input in #Given Prompt#.
You SHOULD complicate the given prompt using the following method:
Please replace general concepts with more specific concepts. or
You should try your best not to make the #Rewritten Prompt# become verbose, #Rewritten Prompt# can only add 10 to 20 words into #Given Prompt#.
‘#Given Prompt#’, ‘#Rewritten Prompt#’, ‘given prompt’ and ‘rewritten prompt’ are not allowed to appear in #Rewritten Prompt#
#Given Prompt#:
Here is instruction.
#Rewritten Prompt#: Increased Reasoning Steps Prompt: 
I want you act as a Prompt Rewriter.
Your objective is to rewrite a given prompt into a more complex version to make those famous AI systems (e.g., ChatGPT and GPT4) a bit harder to handle.
But the rewritten prompt must be reasonable and must be understood and responded by humans.
Your rewriting cannot omit the non-text parts such as the table and code in #Given Prompt#:. Also, please do not omit the input in #Given Prompt#.
You SHOULD complicate the given prompt using the following method:
If #Given Prompt# can be solved with just a few simple thinking processes, you can rewrite it to explicitly request multiple-step reasoning.
You should try your best not to make the #Rewritten Prompt# become verbose, #Rewritten Prompt# can only add 10 to 20 words into #Given Prompt#.
‘#Given Prompt#’, ‘#Rewritten Prompt#’, ‘given prompt’ and ‘rewritten prompt’ are not allowed to appear in #Rewritten Prompt#
#Given Prompt#:
Here is instruction.
#Rewritten Prompt#: 
这是complicating input 
I want you act as a Prompt Rewriter.
Your objective is to rewrite a given prompt into a more complex version to make those famous AI systems (e.g., ChatGPT and GPT4) a bit harder to handle.
But the rewritten prompt must be reasonable and must be understood and responded by humans.
You must add [XML data] format data as input data in [Rewritten Prompt]
#Given Prompt#:
Here is Demonstration instruction 1.
#Rewritten Prompt#:
Here is Demonstration Example 1.
... N -1 Examples ...
I want you act as a Prompt Rewriter.
Your objective is to rewrite a given prompt into a more complex version to make those famous AI systems (e.g., ChatGPT and GPT4) a bit harder to handle.
But the rewritten prompt must be reasonable and must be understood and responded by humans.
You must add [#Given Dataformat#] format data as input data, add [#Given Dataformat#] code as input code in [Rewritten Prompt]
Rewrite prompt must be a question style instruction
#Given Prompt#:
Here is instruction.
#Rewrite prompt must be a question style instruction Rewritten Prompt(MUST contain a specific JSON data as input#: 
广度进化 
I want you act as a Prompt Creator.
Your goal is to draw inspiration from the #Given Prompt# to create a brand new prompt.
This new prompt should belong to the same domain as the #Given Prompt# but be even more rare.
The LENGTH and difficulty level of the #Created Prompt# should be similar to that of the #Given Prompt#. The #Created Prompt# must be reasonable and must be understood and responded by humans.
‘#Given Prompt#’, ‘#Created Prompt#’, ‘given prompt’ and ‘created prompt’ are not allowed to appear in #Created Prompt#.
#Given Prompt#:
Here is instruction.
#Created Prompt#: 
生成response 
使用与进化相同的LLM来为进化的指令生成相应的响应。生成提示符是 Here is instruction. 。 
消除进化 
有以下4种情况归类为失败 
指令进化失败;与原始指令相比进化后的指令没有提供任何信息增益。我们使用ChatGPT进行此确定。进化的指令使得LLM很难产生响应。我们发现当生成的响应包含“sorry”并且长度相对较短(即少于80个单词)时它通常表明LLM努力响应进化的指令。所以我们可以用这个规则来做判断。LLM生成的响应只包含标点和停止词。进化指令显然从进化提示中复制了一些单词如“给定提示”、“重写提示”、“#重写提示#”等。 
baseline 
ChatGPT OpenAIAI bot基于GPT-3.5 or GPT-4 
Alapaca 开源模型基于LLaMA斯坦福大学Standford University 
Vicuna 开源的chat bot基于LLaMA 参考文献 
WizardLM论文https://arxiv.org/abs/2304.12244 
DB-GPT项目https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md 
DB-GPT-Hub项目GitHub - eosphoros-ai/DB-GPT-Hub: A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance, especially in Text-to-SQL.