洪泽网站建设,做网站有哪些技术,wordpress首页广告位,欧洲做安网站“你的意思是” 是搜索引擎中一个非常重要的功能#xff0c;因为它们通过显示建议的术语来帮助用户#xff0c;以便他可以进行更准确的搜索。比如#xff0c;在百度中#xff0c;我们进行搜索时#xff0c;它通常会显示一些更为常用推荐的搜索选项来供我们选择#xff1a…“你的意思是” 是搜索引擎中一个非常重要的功能因为它们通过显示建议的术语来帮助用户以便他可以进行更准确的搜索。比如在百度中我们进行搜索时它通常会显示一些更为常用推荐的搜索选项来供我们选择 为了创建 “你的意思是”我们将使用 phrase suggester因为通过它我们将能够建议句子更正而不仅仅是术语。在我之前的文章 “Elasticsearch如何实现短语建议 - phrase suggester”我有涉及到这个问题。
首先我们将使用一个 shingle 过滤器因为它将提供一个分词短语建议器将使用该标记来进行匹配并返回更正。有关 shingle 过滤器的描述请阅读之前的文章 “Elasticsearch: Ngrams, edge ngrams, and shingles”。 准备数据
我们首先来定义映射
PUT movies
{settings: {analysis: {analyzer: {en_analyzer: {tokenizer: standard,filter: [lowercase,stop]},shingle_analyzer: {type: custom,tokenizer: standard,filter: [lowercase,shingle_filter]}},filter: {shingle_filter: {type: shingle,min_shingle_size: 2,max_shingle_size: 3}}}},mappings: {properties: {title: {type: text,analyzer: en_analyzer,fields: {suggest: {type: text,analyzer: shingle_analyzer}}},actors: {type: text,analyzer: en_analyzer,fields: {keyword: {type: keyword,ignore_above: 256}}},description: {type: text,analyzer: en_analyzer,fields: {keyword: {type: keyword,ignore_above: 256}}},director: {type: text,fields: {keyword: {type: keyword,ignore_above: 256}}},genre: {type: text,fields: {keyword: {type: keyword,ignore_above: 256}}},metascore: {type: long},rating: {type: float},revenue: {type: float},runtime: {type: long},votes: {type: long},year: {type: long},title_suggest: {type: completion,analyzer: simple,preserve_separators: true,preserve_position_increments: true,max_input_length: 50}}}
}
我们接下来使用 _bulk 命令来写入一些文档到这个索引中去。我们使用这个链接中的内容。我们使用如下的方法
POST movies/_bulk
{index: {}}
{title: Guardians of the Galaxy, genre: Action,Adventure,Sci-Fi, director: James Gunn, actors: Chris Pratt, Vin Diesel, Bradley Cooper, Zoe Saldana, description: A group of intergalactic criminals are forced to work together to stop a fanatical warrior from taking control of the universe., year: 2014, runtime: 121, rating: 8.1, votes: 757074, revenue: 333.13, metascore: 76}
{index: {}}
{title: Prometheus, genre: Adventure,Mystery,Sci-Fi, director: Ridley Scott, actors: Noomi Rapace, Logan Marshall-Green, Michael Fassbender, Charlize Theron, description: Following clues to the origin of mankind, a team finds a structure on a distant moon, but they soon realize they are not alone., year: 2012, runtime: 124, rating: 7, votes: 485820, revenue: 126.46, metascore: 65}.... 在上面为了说明的方便我省去了其它的文档。你需要把整个 movies.txt 的文件拷贝过来并全部写入到 Elasticsearch 中。它共有1000 个文档。 搜索数据
现在让我们运行一个基本查询来查看 suggest 的结果
GET movies/_search?filter_pathsuggest
{suggest: {text: transformers revenge of the falen,did_you_mean: {phrase: {field: title.suggest,size: 5}}}
}
上面命令显示的结果为
{suggest: {did_you_mean: [{text: transformers revenge of the falen,offset: 0,length: 33,options: [{text: transformers revenge of the fallen,score: 0.004467494},{text: transformers revenge of the fall,score: 0.00020402104},{text: transformers revenge of the face,score: 0.00006419608}]}]}
}
请注意在几行中你已经获得了一些有希望的结果。
现在让我们通过使用更多短语建议功能来增加我们的查询。让我们使用 max_errors 2这样我们希望句子中最多有两个术语。 添加了 highlight 显示以突出显示建议的术语。
GET movies/_search?filter_pathsuggest
{suggest: {text: transformer revenge of the falen,did_you_mean: {phrase: {field: title.suggest,size: 5,confidence: 1,max_errors:2,highlight: {pre_tag: strong,post_tag: /strong}}}}
}
上面命令返回的结果为
{suggest: {did_you_mean: [{text: transformer revenge of the falen,offset: 0,length: 32,options: [{text: transformers revenge of the fallen,highlighted: strongtransformers/strong revenge of the strongfallen/strong,score: 0.004382903},{text: transformers revenge of the fall,highlighted: strongtransformers/strong revenge of the strongfall/strong,score: 0.00020015794},{text: transformers revenge of the face,highlighted: strongtransformers/strong revenge of the strongface/strong,score: 0.00006298054},{text: transformers revenge of the falen,highlighted: strongtransformers/strong revenge of the falen,score: 0.00006159308},{text: transformer revenge of the fallen,highlighted: transformer revenge of the strongfallen/strong,score: 0.000048000533}]}]}
}
我们再改进一点好吗 我们添加了 “collate”我们可以对每个结果执行查询改进建议的结果。 我使用了带有 “and” 运算符的匹配项以便在同一个句子中匹配所有术语。 如果我仍然想要不符合查询条件的结果我使用 prune true。
GET movies/_search?filter_pathsuggest
{suggest: {text: transformer revenge of the falen,did_you_mean: {phrase: {field: title.suggest,size: 5,confidence: 1,max_errors:2,collate: {query: { source : {match: {{{field_name}}: {query: {{suggestion}},operator: and}}}},params: {field_name : title}, prune :true},highlight: {pre_tag: strong,post_tag: /strong}}}}
}
现在的结果是 请注意答案已更改我有一个新字段 “collate_match”它指示结果中是否匹配整理规则这是因为 prune true。
让我们设置 prune 为 false
GET movies/_search?filter_pathsuggest
{suggest: {text: transformer revenge of the falen,did_you_mean: {phrase: {field: title.suggest,size: 5,confidence: 1,max_errors:2,collate: {query: { source : {match: {{{field_name}}: {query: {{suggestion}},operator: and}}}},params: {field_name : title}, prune :false},highlight: {pre_tag: strong,post_tag: /strong}}}}
}
这次我们得到的结果是 我们可以看到只有一个结果是最相关的建议。