ELASTICSEARCH官方文档笔记
分布式文档存储
序列化json文档,KV包含数据
full-text searche全文索引实时搜索(An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.)
When dynamic mapping is enabled, Elasticsearch automatically detects and adds new fields to the index.
相同的字段可以标记为多个类型以便不同的分析方法
aggregations operate 和 search requests可并行执行
搜索数据
REST API接口管理集群,索引和搜索数据(Elasticsearch client,kibana developer console, command line)
The Elasticsearch REST APIs support structured queries, full text queries, and complex queries that combine the two.
全文索引将按关联性返回搜索结果
分析数据
聚合功能
key metrics, patterns, and trends
机器学习
弹性调度
cluster
node
shard
自由添加node,自动平衡As the cluster grows (or shrinks), Elasticsearch automatically migrates shards to rebalance the cluster.
index(逻辑组)->shards(物理)->nodes
graph LR index(Index逻辑组)-->shards shards-->primary_shards primary_shards-->replicas_shards(replicas shards只读) primary_cluster-->CCR(Cross-cluster replication) CCR-->replicated_cluster(replicated cluster只读)
##
CAT API
1 | curl -X GET "127.0.0.1:9200/_cat/health?v&pretty" |
1 | GET /customer/_doc/1单个文档获取 |
bluk api (batch document operations)批量操作
上传数据
1 | https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true |
search,analyze and machine learning
search
跟sql不同,这里没有游标等概念,查询完就结束了
- REST request url
1 | GET /bank/_search?q=*&sort=account_number:asc&pretty |
REST request body
match_all
1
2
3
4
5
6
7
8
9
10
11GET /bank/_search
{ //使用QUERY DSL
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
],
// "sort": { "balance": { "order": "desc" } }
"size": 1, //默认为10
"from": 10, //指定开始的结果,默认为0,对于分布显示结果很有用
"_source": ["account_number", "balance"],//代替默认的_source返回字段
}match
1
2
3
4GET /bank/_search
{
"query": { "match": { "account_number": 20 } }
}match_phrase匹配短语
1
2
3
4GET /bank/_search
{
"query": { "match_phrase": { "address": "mill lane" } }
}
bool
- must(and)
- should(or)
- must_not(not)
- filter
- range
analyze
term是代表完全匹配,即不进行分词器分析,文档中必须包含整个搜索的词汇
1 | GET /bank/_search |
1 | { |
安装ELASTICSEARCH
自带一个openjdk,可能过JAVA_HOME变量修改成自部署JAVA版本
1 | [root@dab238b13031 elasticsearch]# ls -1 |
docker版本基础镜像基于centos7
1 | docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.3.1 |
运行
1 | 后台,添加命令行配置 |
检测是否在运行
1 | curl -X GET "localhost:9200/?pretty" |
配置
配置文件和敏感配置后需要重启
三个配置文件:默认在config目录下,可ES_PATH_CONF=/path/to/my/config修改
elasticsearch.yml
for configuring Elasticsearchjvm.options
for configuring Elasticsearch JVM settings- 7-9:-Xmx2g 冒号前是java版本范围
log4j2.properties
Log4j 2 for configuring Elasticsearch logging
1 | [root@dab238b13031 elasticsearch]# ls -1 bin/ |
敏感配置(POST _nodes/reload_secure_settings加载)
添加string
1 | cat /file/containing/setting/value | bin/elasticsearch-keystore add --stdin the.setting.name.to.set |
添加文件
1 | bin/elasticsearch-keystore add-file the.setting.name.to.set /path/example-file.json |
删除
1 | bin/elasticsearch-keystore remove the.setting.name.to.remove |
INDEX生命周期策略管理
https://www.elastic.co/guide/en/elasticsearch/reference/current/using-policies-rollover.html
机器学习(默认是开启听)
xpack.ml.enabled
heap size
默认1G
Xms
(minimum heap size) and Xmx
(maximum heap size)
Set Xmx
and Xms
to no more than 50% of your physical RAM
大heap利于内部缓存,但会带来长GC暂停.
系统内存太小影响文件缓存
系统配置
1 | elasticsearch - nofile 65535 |
1 | sudo swapoff -a //禁用swap |
1 | sysctl -w vm.max_map_count=262144 |
ulimit -u 4096
JVM会缓存解析10秒,ES覆盖为60秒,反向解析10秒
Aggregations
四类
- bucketing分组
- terms
- size
- 先收集所有shard上的一定数量的响应,然后再汇总一起,结果不是很精准
- Numeric value (1000000000000000000000) out of range of int (-2147483648 - 2147483647)\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@4af5e4f0; line: 16, column: 54]”
- size
- composite
- [composite] aggregation cannot be used with a parent aggregation
- terms
- metric度量
- avg/加权avg
- matrix矩阵(在多个字段基础上产生矩阵)
- pipeline管道(在其它聚合基础上再次聚合)
- Each bucket may be sorted based on its
_key
,_count
or its sub-aggregations.
- Each bucket may be sorted based on its
聚合嵌套功能十分强大
1 | // 语法结构 aggregations或aggs |