elasticsearch 笔记

ELASTICSEARCH官方文档笔记

分布式文档存储

序列化json文档,KV包含数据

full-text searche全文索引实时搜索(An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.)

When dynamic mapping is enabled, Elasticsearch automatically detects and adds new fields to the index.

相同的字段可以标记为多个类型以便不同的分析方法

aggregations operate 和 search requests可并行执行

搜索数据

REST API接口管理集群,索引和搜索数据(Elasticsearch client,kibana developer console, command line)

​ The Elasticsearch REST APIs support structured queries, full text queries, and complex queries that combine the two.

全文索引将按关联性返回搜索结果

Query DSL

SQL-style queries

分析数据

聚合功能

key metrics, patterns, and trends

机器学习

弹性调度

cluster

node

shard

自由添加node,自动平衡As the cluster grows (or shrinks), Elasticsearch automatically migrates shards to rebalance the cluster.

index(逻辑组)->shards(物理)->nodes

graph LR
index(Index逻辑组)-->shards
shards-->primary_shards
primary_shards-->replicas_shards(replicas shards只读)
primary_cluster-->CCR(Cross-cluster replication)
CCR-->replicated_cluster(replicated cluster只读)

##

CAT API

1
2
3
4
5
6
7
8
$ curl -X GET "127.0.0.1:9200/_cat/health?v&pretty"
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1566820997 12:03:17 cluster-name green 15 9 6073 6039 0 0 0 0 - 100.0%
$ curl -X GET "127.0.0.1:9200/_cat/indices?v&pretty" |more
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open logstash-xxx-2019.08.22 1NcDlWKNQxKK529JwuQxFg 5 0 308686 0 548.3mb 548.3mb
green open bbbb-2019.07.02 NTzO-PQESb2YrrUPzs0fVA 5 0 161398 0 99mb 99mb
green open logstash-cccc-2019.08.26 -1DGNXaFRiqcowhhQ4Y2Tg 5 0 1364938 0 178.4mb 178.4mb
1
$ GET /customer/_doc/1单个文档获取

bluk api (batch document operations)批量操作

上传数据

1
2
3
# https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v"

search,analyze and machine learning

跟sql不同,这里没有游标等概念,查询完就结束了

  1. REST request url
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
GET /bank/_search?q=*&sort=account_number:asc&pretty
{
"took" : 63, //查询使用的时间(微秒)
"timed_out" : false, //是否超时
"_shards" : { //返回查询的shard信息
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : { //查询结果
"total" : { //查询匹配
"value": 1000, //总数
"relation": "eq" //关联
},
"max_score" : null,
"hits" : [ { //实际结果列表
"_index" : "bank",
"_type" : "_doc",
"_id" : "0",
"sort": [0],
"_score" : null,
"_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
}, {
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"sort": [1],
"_score" : null,
"_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
}, ...
]
}
}
  1. REST request body

    1. match_all

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      GET /bank/_search
      { //使用QUERY DSL
      "query": { "match_all": {} },
      "sort": [
      { "account_number": "asc" }
      ],
      // "sort": { "balance": { "order": "desc" } }
      "size": 1, //默认为10
      "from": 10, //指定开始的结果,默认为0,对于分布显示结果很有用
      "_source": ["account_number", "balance"],//代替默认的_source返回字段
      }
    2. match

      1
      2
      3
      4
      GET /bank/_search
      {
      "query": { "match": { "account_number": 20 } }
      }
    3. match_phrase匹配短语

      1. 1
        2
        3
        4
        GET /bank/_search
        {
        "query": { "match_phrase": { "address": "mill lane" } }
        }
  1. bool

    1. must(and)
    2. should(or)
    3. must_not(not)
    4. filter
      1. range

analyze

term是代表完全匹配,即不进行分词器分析,文档中必须包含整个搜索的词汇

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
GET /bank/_search
{
"size": 0, //不显示搜索结果
"aggs": {
"group_by_state": {
"terms": { //terms aggregation聚合方式
"field": "state.keyword",
"size": 5 //默认显示10条记录
"order": { //指定排序
"average_balance": "desc"
}
},
"aggs":{ //嵌套
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1000,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "bank",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"account_number": 1,
"balance": 39225,
"firstname": "Amber",
"lastname": "Duke",
"age": 32,
"gender": "M",
"address": "880 Holmes Lane",
"employer": "Pyrami",
"email": "amberduke@pyrami.com",
"city": "Brogan",
"state": "IL"
}
}
]
},
"aggregations": {
"group_by_state": {
"doc_count_error_upper_bound": -1,
"sum_other_doc_count": 923,
"buckets": [
{
"key": "CO",
"doc_count": 14,
"average_balance": {
"value": 32460.35714285714
}
},
{
"key": "NE",
"doc_count": 16,
"average_balance": {
"value": 32041.5625
}
},
{
"key": "AZ",
"doc_count": 14,
"average_balance": {
"value": 31634.785714285714
}
},
{
"key": "MT",
"doc_count": 17,
"average_balance": {
"value": 31147.41176470588
}
},
{
"key": "VA",
"doc_count": 16,
"average_balance": {
"value": 30600.0625
}
}
]
}
}
}

安装ELASTICSEARCH

自带一个openjdk,可能过JAVA_HOME变量修改成自部署JAVA版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@dab238b13031 elasticsearch]# ls -1
LICENSE.txt
NOTICE.txt
README.textile
bin //执行文件
config //配置文件
data //shard数据目录
jdk //自带jdk版本
lib
logs //日志
modules
plugins //插件,每个插件都有一个子目录
[root@dab238b13031 elasticsearch]# ls jdk/
bin conf include jmods legal lib release

docker版本基础镜像基于centos7

1
2
3
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.3.1
$ grep vm.max_map_count /etc/sysctl.conf
vm.max_map_count=262144

运行

1
2
3
# 后台,添加命令行配置
# 配置文件 config/elasticsearch.yml
./bin/elasticsearch -d -Ecluster.name=my_cluster -Enode.name=node_1

检测是否在运行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
curl -X GET "localhost:9200/?pretty"
{
"name": "dab238b13031",
"cluster_name": "docker-cluster",
"cluster_uuid": "mNH6fmV0RE2WVvavM-9bFA",
"version": {
"number": "7.3.1",
"build_flavor": "default",
"build_type": "docker",
"build_hash": "4749ba6",
"build_date": "2019-08-19T20:19:25.651794Z",
"build_snapshot": false,
"lucene_version": "8.1.0",
"minimum_wire_compatibility_version": "6.8.0",
"minimum_index_compatibility_version": "6.0.0-beta1"
},
"tagline": "You Know, for Search"
}

配置

配置文件和敏感配置后需要重启

三个配置文件:默认在config目录下,可ES_PATH_CONF=/path/to/my/config修改

  • elasticsearch.yml for configuring Elasticsearch
  • jvm.options for configuring Elasticsearch JVM settings
    • 7-9:-Xmx2g 冒号前是java版本范围
  • log4j2.properties Log4j 2 for configuring Elasticsearch logging
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@dab238b13031 elasticsearch]# ls -1 bin/
elasticsearch
elasticsearch-certgen
elasticsearch-certutil
elasticsearch-cli
elasticsearch-croneval
elasticsearch-env
elasticsearch-enve
elasticsearch-keystore //用于创建敏感配置
elasticsearch-migrate
elasticsearch-node
elasticsearch-plugin
elasticsearch-saml-metadata
elasticsearch-setup-passwords
elasticsearch-shard
elasticsearch-sql-cli
elasticsearch-sql-cli-7.3.1.jar
elasticsearch-syskeygen
elasticsearch-users
x-pack-env
x-pack-security-env
x-pack-watcher-env

敏感配置(POST _nodes/reload_secure_settings加载)

添加string

1
cat /file/containing/setting/value | bin/elasticsearch-keystore add --stdin the.setting.name.to.set

添加文件

1
bin/elasticsearch-keystore add-file the.setting.name.to.set /path/example-file.json

删除

1
bin/elasticsearch-keystore remove the.setting.name.to.remove

INDEX生命周期策略管理

https://www.elastic.co/guide/en/elasticsearch/reference/current/using-policies-rollover.html

机器学习(默认是开启听)

xpack.ml.enabled

heap size

默认1G

Xms (minimum heap size) and Xmx (maximum heap size)

Set Xmx and Xms to no more than 50% of your physical RAM

大heap利于内部缓存,但会带来长GC暂停.

系统内存太小影响文件缓存

系统配置

1
elasticsearch  -  nofile  65535
1
sudo swapoff -a //禁用swap
1
sysctl -w vm.max_map_count=262144

ulimit -u 4096

JVM会缓存解析10秒,ES覆盖为60秒,反向解析10秒

Aggregations

四类

  1. bucketing分组
    1. terms
      1. size
        1. 先收集所有shard上的一定数量的响应,然后再汇总一起,结果不是很精准
        2. Numeric value (1000000000000000000000) out of range of int (-2147483648 - 2147483647)\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@4af5e4f0; line: 16, column: 54]”
    2. composite
      1. [composite] aggregation cannot be used with a parent aggregation
  2. metric度量
    1. avg/加权avg
  3. matrix矩阵(在多个字段基础上产生矩阵)
  4. pipeline管道(在其它聚合基础上再次聚合)
    1. Each bucket may be sorted based on its _key, _count or its sub-aggregations.

聚合嵌套功能十分强大

1
2
3
4
5
6
7
8
9
10
11
// 语法结构 aggregations或aggs
"aggregations" : {
"<aggregation_name>" : {
"<aggregation_type>" : {
<aggregation_body>
}
[,"meta" : { [<meta_data_body>] } ]?
[,"aggregations" : { [<sub_aggregation>]+ } ]?
}
[,"<aggregation_name_2>" : { ... } ]*
}
坚持原创技术分享,您的支持将鼓励我继续创作!