ElasticSearch查询

ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java开发的，并作为Apache许可条款下的开放源码发布，是当前流行的企业级搜索引擎。设计用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。

在Elasticsearch中，包含多个索引（Index），相应的每个索引可以包含多个类型（Type），这些不同的类型每个都可以存储多个文档（Document），每个文档又有多个属性。一个索引索引 (index) 类似于传统关系数据库中的一个数据库，是一个存储关系型文档的地方。索引 (index) 的复数词为 indices 或 indexes 。

Elastic 本质上是一个分布式数据库，允许多台服务器协同工作，每台服务器可以运行多个 Elastic 实例。

单个 Elastic 实例称为一个节点（node）。一组节点构成一个集群（cluster）。

集群中有多个节点，其中有一个为主节点，这个主节点是可以通过选举产生的，主从节点是对于集群内部来说的。es的一个概念就是去中心化，字面上理解就是无中心节点，这是对于集群外部来说的，因为从外部来看es集群，在逻辑上是个整体，你与任何一个节点的通信和与整个es集群通信是等价的。

Elastic 会索引所有字段，经过处理后写入一个反向索引（Inverted Index）。查找数据的时候，直接查找该索引。

所以，Elastic 数据管理的顶层单位就叫做 Index（索引）。它是单个数据库的同义词。每个 Index （即数据库）的名字必须是小写。

下面的命令可以查看当前节点的所有 Index。

$ curl -X GET 'http://localhost:9200/_cat/indices?v'

Index 里面单条的记录称为 Document（文档）。许多条 Document 构成了一个 Index。

Document 使用 JSON 格式表示，下面是一个例子。

{ "user": "张三", "title": "工程师", "desc": "数据库管理" }

同一个 Index 里面的 Document，不要求有相同的结构（scheme），但是最好保持相同，这样有利于提高搜索效率。

2.4 Type

下面的命令可以列出每个 Index 所包含的 Type。

在任意的查询字符串中增加pretty参数，会让Elasticsearch美化输出(pretty-print)JSON响应以便更加容易阅读。

$ curl 'localhost:9200/_mapping?pretty=true'

根据规划，Elastic 6.x 版只允许每个 Index 包含一个 Type，7.x 版将会彻底移除 Type。

$ curl -X PUT 'localhost:9200/weather'

服务器返回一个 JSON 对象，里面的acknowledged字段表示操作成功。

{ "acknowledged":true, "shards_acknowledged":true }

curl -X DELETE 'localhost:9200/weather'

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip

analyzer是字段文本的分词器，search_analyzer是搜索词的分词器

curl -X PUT 'localhost:9200/megacorp' -d ' { "mappings": { "employee": { "properties": { "user": { "type": "text", "analyzer":"ik_max_word", "search_analyzer":"ik_smart" }, "title": { "type": "text", "analyzer":"ik_max_word", "search_analyzer":"ik_smart" } } } } }'

查看分词 GET

curl -X GET 'localhost:9200/megacorp/employee/1/_termvectors?fields=about'

PUT /megacorp/employee/1 { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] } PUT /megacorp/employee/2 { "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests": [ "music" ] } PUT /megacorp/employee/3 { "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about": "I like to build cabinets", "interests": [ "forestry" ] }

这个URI后面的1代表的是这条数据的ID，也可以字符串。如果不想自己指定ID，可以不传，但是必须使用POST来新增，这样的话Elasticsearch会给这条数据生成一个随机的字符串。

如果想对这条数据进行更新，可以重新请求这个URI，关键是这个ID要指定，然后修改json内容，这样就可以更新这条数据了。

根据ID检索到具体某条数据:

GET /megacorp/employee/1

GET /megacorp/employee/_search?q=last_name:Smith&size=20&from=0

搜索指定Index下的Type的全部文档，默认每页只显示10条，可以通过size字段改变这个设置，还可以通过from字段，指定位移（默认是从位置0开始）。返回结果的 took字段表示该操作的耗时（单位为毫秒），timed_out字段表示是否超时，hits字段表示命中的记录

GET /megacorp/employee/_search { "query" : { "match" : { "last_name" : "Smith" } }, "size": 20, "from": 0 }

这段查询和上面的例子是一样的，不过参数从简单的参数变成了一个复杂的json，不过复杂带来的优势就是控制力更强，我们可以对查询做出更多精细的控制。

根据last_name搜索，并且只关心年龄大于30的：

GET /megacorp/employee/_search { "query" : { "bool": { "must": { "match" : { "last_name" : "smith" } }, "filter": { "range" : { "age" : { "gt" : 30 } } } } } }

这里新增了一个range过滤器，gt 表示_大于(_great than)。

GET /megacorp/employee/_search { "query" : { "match" : { "about" : "rock climbing" } } }

这个搜索会返回about中包含rock或者climbing的数据，也就是关键词之间默认是or的关系。如果希望精确匹配这个短语呢？就是用match_phrase查询。

GET /megacorp/employee/_search { "query" : { "match_phrase" : { "about" : "rock climbing" } } }

GET /megacorp/employee/_search { "query" : { "match_phrase" : { "about" : "rock climbing" } }, "highlight": { "fields" : { "about" : {} } } }

返回结果多了个highlight的部分，默认是用包裹：

{ ... "hits": { "total": 1, "max_score": 0.23013961, "hits": [ { ... "_score": 0.23013961, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] }, "highlight": { "about": [ "I love to go <em>rock</em> <em>climbing</em>" ] } } ] } }

term可以用它处理数字（numbers）、布尔值（Booleans）、日期（dates）以及文本（text）。
创建并索引一些表示产品的文档，文档里有字段 price 和 productID （ 价格 和 产品ID ）

curl -X POST "localhost:9200/my_store/products/_bulk" -H 'Content-Type: application/json' -d' { "index": { "_id": 1 }} { "price" : 10, "productID" : "XHDK-A-1293-#fJ3" } { "index": { "_id": 2 }} { "price" : 20, "productID" : "KDKE-B-9947-#kL5" } { "index": { "_id": 3 }} { "price" : 30, "productID" : "JODL-X-1937-#pV7" } { "index": { "_id": 4 }} { "price" : 30, "productID" : "QQPX-R-3956-#aD8" } '

通常当查找一个精确值的时候，我们不希望对查询进行评分计算。只希望对文档进行包括或排除的计算，所以我们会使用 constant_score 查询以非评分模式来执行 term 查询并以一作为统一评分。
查询置于 filter 语句内不进行评分或相关度的计算，所以所有的结果都会返回一个默认评分 1 。

GET /my_store/products/_search { "query" : { "constant_score" : { "filter" : { "term" : { "price" : 20 } } } } }

GET /my_store/products/_search { "query" : { "constant_score" : { "filter" : { "term" : { "productID" : "XHDK-A-1293-#fJ3" } } } } }

找不到对该字段进行分析

GET /my_store/_analyze { "field": "productID", "text": "XHDK-A-1293-#fJ3" }

{ "tokens" : [ { "token" : "xhdk", "start_offset" : 0, "end_offset" : 4, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "a", "start_offset" : 5, "end_offset" : 6, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "1293", "start_offset" : 7, "end_offset" : 11, "type" : "<NUM>", "position" : 3 }, { "token" : "fj3", "start_offset" : 13, "end_offset" : 16, "type" : "<ALPHANUM>", "position" : 4 } ] }

Elasticsearch 用 4 个不同的 token 而不是单个 token 来表示这个 UPC 。
所有字母都是小写的。
丢失了连字符和哈希符（ # ）

所以当我们用 term 查询查找精确值 XHDK-A-1293-#fJ3 的时候，找不到任何文档，因为它并不在我们的倒排索引中

重建索引为keyword 不会拆分

DELETE /my_store PUT /my_store { "mappings" : { "products" : { "properties" : { "productID" : { "type" : "keyword" } } } } }

添加数据,再次查询就可以查到了

GET /my_store/products/_search { "query" : { "constant_score" : { "filter" : { "terms" : { "price" : [20, 30] } } } } }

这是个复合过滤器（compound filter），它可以接受多个其他过滤器作为参数，并将这些过滤器结合成各式各样的布尔（逻辑）组合

一个 bool 过滤器由三部分组成：

{ "bool" : { "must" : [], "should" : [], "must_not" : [], } }

must
所有的语句都必须（must）匹配，与 AND 等价。

must_not
所有的语句都不能（must not）匹配，与 NOT 等价。

should
至少有一个语句要匹配，与 OR 等价。

GET /my_store/products/_search { "query" : { "bool" : { "should" : [ { "term" : {"price" : 20}}, { "term" : {"productID" : "XHDK-A-1293-#fJ3"}} ], "must_not" : { "term" : {"price" : 30} } } } }

GET /my_store/products/_search { "query" : { "bool" : { "should" : [ { "term" : {"productID" : "KDKE-B-9947-#kL5"}}, { "bool" : { "must" : [ { "term" : {"productID" : "JODL-X-1937-#pV7"}}, { "term" : {"price" : 30}} ] }} ] } } }

gt: > 大于（greater than）
lt: < 小于（less than）
gte: >= 大于或等于（greater than or equal to）
lte: <= 小于或等于（less than or equal to）

GET /my_store/products/_search { "query" : { "constant_score" : { "filter" : { "range" : { "price" : { "gte" : 20, "lt" : 40 } } } } } }

POST /my_index/posts/_bulk { "index": { "_id": "1" }} { "tags" : ["search"] } { "index": { "_id": "2" }} { "tags" : ["search", "open_source"] } { "index": { "_id": "3" }} { "other_field" : "some data" } { "index": { "_id": "4" }} { "tags" : null } { "index": { "_id": "5" }} { "tags" : ["search", null] }

GET /my_index/posts/_search { "query" : { "constant_score" : { "filter" : { "exists" : { "field" : "tags" } } } } }

GET /my_index/posts/_search { "query": { "bool": { "must_not": { "exists": { "field" : "tags" } } } } }

集群健康

GET _cluster/health

监控单个节点

GET _nodes/stats

集群统计

GET _cluster/stats

索引统计

GET my_index/_stats GET my_index,another_index/_stats GET _all/_stats

等待中的任务

GET _cluster/pending_tasks

要启用表头，加上?v这个参数

GET /_cat =^.^= /_cat/allocation /_cat/shards /_cat/shards/{index} /_cat/master /_cat/nodes /_cat/indices /_cat/indices/{index} /_cat/segments /_cat/segments/{index} /_cat/count /_cat/count/{index} /_cat/recovery /_cat/recovery/{index} /_cat/health /_cat/pending_tasks /_cat/aliases /_cat/aliases/{alias} /_cat/thread_pool /_cat/plugins /_cat/fielddata /_cat/fielddata/{fields}

原文链接：https://blog.csdn.net/zhengshuoa/article/details/87875128

原创文章，作者：优速盾-小U，如若转载，请注明出处：https://www.cdnb.net/bbs/archives/6716