elasticsearch 笔记

发表于 2019-08-26 | 更新于: 2019-08-26 | 分类于 elastic

ELASTICSEARCH官方文档笔记

分布式文档存储

序列化json文档,KV包含数据

full-text searche全文索引实时搜索(An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.)

When dynamic mapping is enabled, Elasticsearch automatically detects and adds new fields to the index.

相同的字段可以标记为多个类型以便不同的分析方法

aggregations operate 和 search requests可并行执行

搜索数据

REST API接口管理集群,索引和搜索数据(Elasticsearch client,kibana developer console, command line)

The Elasticsearch REST APIs support structured queries, full text queries, and complex queries that combine the two.

全文索引将按关联性返回搜索结果

Query DSL

SQL-style queries

分析数据

聚合功能

key metrics, patterns, and trends

机器学习

弹性调度

cluster

node

shard

自由添加node,自动平衡As the cluster grows (or shrinks), Elasticsearch automatically migrates shards to rebalance the cluster.

index(逻辑组)->shards(物理)->nodes

graph LR
index(Index逻辑组)-->shards
shards-->primary_shards
primary_shards-->replicas_shards(replicas shards只读)
primary_cluster-->CCR(Cross-cluster replication)
CCR-->replicated_cluster(replicated cluster只读)

CAT API

$ curl -X GET "127.0.0.1:9200/_cat/health?v&pretty"
epoch      timestamp cluster        status node.total node.data shards  pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1566820997 12:03:17  cluster-name   green          15         9   6073 6039    0    0        0             0                  -                100.0%
$ curl -X GET "127.0.0.1:9200/_cat/indices?v&pretty" |more
health status index                                                        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   logstash-xxx-2019.08.22           													 1NcDlWKNQxKK529JwuQxFg   5   0     308686            0    548.3mb        548.3mb
green  open   bbbb-2019.07.02                                           	 NTzO-PQESb2YrrUPzs0fVA   5   0     161398            0       99mb           99mb
green  open   logstash-cccc-2019.08.26                              			 -1DGNXaFRiqcowhhQ4Y2Tg   5   0    1364938            0    178.4mb        178.4mb

1	$ GET /customer/_doc/1单个文档获取

bluk api (batch document operations)批量操作

上传数据

1
2
3

# https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v"

search,analyze and machine learning

search

跟sql不同,这里没有游标等概念,查询完就结束了

REST request url

GET /bank/_search?q=*&sort=account_number:asc&pretty
{
  "took" : 63,	//查询使用的时间(微秒)
  "timed_out" : false, //是否超时
  "_shards" : {	//返回查询的shard信息
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : { //查询结果
    "total" : { //查询匹配
        "value": 1000, //总数
        "relation": "eq" //关联
    },
    "max_score" : null,
    "hits" : [ {  //实际结果列表
      "_index" : "bank",
      "_type" : "_doc",
      "_id" : "0",
      "sort": [0],
      "_score" : null,
      "_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
    }, {
      "_index" : "bank",
      "_type" : "_doc",
      "_id" : "1",
      "sort": [1],
      "_score" : null,
      "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
    }, ...
    ]
  }
}

REST request body

match_all

GET /bank/_search
	{	//使用QUERY DSL
    "query": { "match_all": {} },
    "sort": [
    { "account_number": "asc" }
    ],
    // "sort": { "balance": { "order": "desc" } }
    "size": 1, //默认为10
    "from": 10, //指定开始的结果,默认为0,对于分布显示结果很有用
    "_source": ["account_number", "balance"],//代替默认的_source返回字段
}

match

GET /bank/_search
{
  "query": { "match": { "account_number": 20 } }
}

match_phrase匹配短语

GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

bool
1. must(and)
2. should(or)
3. must_not(not)
4. filter
  1. range

analyze

term是代表完全匹配，即不进行分词器分析，文档中必须包含整个搜索的词汇

GET /bank/_search
{
  "size": 0, //不显示搜索结果
  "aggs": {
    "group_by_state": {
      "terms": { //terms aggregation聚合方式
        "field": "state.keyword",
        "size": 5 //默认显示10条记录
        "order": { //指定排序
            "average_balance": "desc"
        }
      },
      "aggs":{ //嵌套
      	"average_balance": {
      		"avg": {
      			"field": "balance"
      		}
      	}
      }
    }
  }
}

{
    "took": 8,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1000,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "bank",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "account_number": 1,
                    "balance": 39225,
                    "firstname": "Amber",
                    "lastname": "Duke",
                    "age": 32,
                    "gender": "M",
                    "address": "880 Holmes Lane",
                    "employer": "Pyrami",
                    "email": "amberduke@pyrami.com",
                    "city": "Brogan",
                    "state": "IL"
                }
            }
        ]
    },
    "aggregations": {
        "group_by_state": {
            "doc_count_error_upper_bound": -1,
            "sum_other_doc_count": 923,
            "buckets": [
                {
                    "key": "CO",
                    "doc_count": 14,
                    "average_balance": {
                        "value": 32460.35714285714
                    }
                },
                {
                    "key": "NE",
                    "doc_count": 16,
                    "average_balance": {
                        "value": 32041.5625
                    }
                },
                {
                    "key": "AZ",
                    "doc_count": 14,
                    "average_balance": {
                        "value": 31634.785714285714
                    }
                },
                {
                    "key": "MT",
                    "doc_count": 17,
                    "average_balance": {
                        "value": 31147.41176470588
                    }
                },
                {
                    "key": "VA",
                    "doc_count": 16,
                    "average_balance": {
                        "value": 30600.0625
                    }
                }
            ]
        }
    }
}

安装ELASTICSEARCH

自带一个openjdk,可能过JAVA_HOME变量修改成自部署JAVA版本

[root@dab238b13031 elasticsearch]# ls -1
LICENSE.txt
NOTICE.txt
README.textile
bin			//执行文件
config	//配置文件
data		//shard数据目录
jdk			//自带jdk版本
lib
logs		//日志
modules
plugins	//插件,每个插件都有一个子目录
[root@dab238b13031 elasticsearch]# ls jdk/
bin  conf  include  jmods  legal  lib  release

docker版本基础镜像基于centos7

1
2
3

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.3.1
$ grep vm.max_map_count /etc/sysctl.conf
vm.max_map_count=262144

运行

1
2
3

# 后台,添加命令行配置
# 配置文件 config/elasticsearch.yml
./bin/elasticsearch -d -Ecluster.name=my_cluster -Enode.name=node_1

检测是否在运行

curl -X GET "localhost:9200/?pretty"
{
  "name": "dab238b13031",
  "cluster_name": "docker-cluster",
  "cluster_uuid": "mNH6fmV0RE2WVvavM-9bFA",
  "version": {
    "number": "7.3.1",
    "build_flavor": "default",
    "build_type": "docker",
    "build_hash": "4749ba6",
    "build_date": "2019-08-19T20:19:25.651794Z",
    "build_snapshot": false,
    "lucene_version": "8.1.0",
    "minimum_wire_compatibility_version": "6.8.0",
    "minimum_index_compatibility_version": "6.0.0-beta1"
  },
  "tagline": "You Know, for Search"
}

配置

配置文件和敏感配置后需要重启

三个配置文件:默认在config目录下,可ES_PATH_CONF=/path/to/my/config修改

elasticsearch.yml for configuring Elasticsearch
jvm.options for configuring Elasticsearch JVM settings
- 7-9:-Xmx2g 冒号前是java版本范围
log4j2.properties Log4j 2 for configuring Elasticsearch logging

[root@dab238b13031 elasticsearch]# ls -1 bin/
elasticsearch
elasticsearch-certgen
elasticsearch-certutil
elasticsearch-cli
elasticsearch-croneval
elasticsearch-env
elasticsearch-enve
elasticsearch-keystore	//用于创建敏感配置
elasticsearch-migrate
elasticsearch-node
elasticsearch-plugin
elasticsearch-saml-metadata
elasticsearch-setup-passwords
elasticsearch-shard
elasticsearch-sql-cli
elasticsearch-sql-cli-7.3.1.jar
elasticsearch-syskeygen
elasticsearch-users
x-pack-env
x-pack-security-env
x-pack-watcher-env

敏感配置(POST _nodes/reload_secure_settings加载)

添加string

1	cat /file/containing/setting/value \| bin/elasticsearch-keystore add --stdin the.setting.name.to.set

添加文件

1	bin/elasticsearch-keystore add-file the.setting.name.to.set /path/example-file.json

删除

1	bin/elasticsearch-keystore remove the.setting.name.to.remove

INDEX生命周期策略管理

https://www.elastic.co/guide/en/elasticsearch/reference/current/using-policies-rollover.html

机器学习(默认是开启听)

xpack.ml.enabled

heap size

默认1G

Xms (minimum heap size) and Xmx (maximum heap size)

Set Xmx and Xms to no more than 50% of your physical RAM

大heap利于内部缓存,但会带来长GC暂停.

系统内存太小影响文件缓存

系统配置

1	elasticsearch - nofile 65535

1	sudo swapoff -a //禁用swap

1	sysctl -w vm.max_map_count=262144

ulimit -u 4096

JVM会缓存解析10秒,ES覆盖为60秒,反向解析10秒

Aggregations

四类

bucketing分组
1. terms
  1. size
    1. 先收集所有shard上的一定数量的响应,然后再汇总一起,结果不是很精准
    2. Numeric value (1000000000000000000000) out of range of int (-2147483648 - 2147483647)\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@4af5e4f0; line: 16, column: 54]”
2. composite
  1. [composite] aggregation cannot be used with a parent aggregation
metric度量
1. avg/加权avg
matrix矩阵(在多个字段基础上产生矩阵)
pipeline管道(在其它聚合基础上再次聚合)
1. Each bucket may be sorted based on its _key, _count or its sub-aggregations.

聚合嵌套功能十分强大

// 语法结构 aggregations或aggs
"aggregations" : {
    "<aggregation_name>" : {
        "<aggregation_type>" : {
            <aggregation_body>
        }
        [,"meta" : {  [<meta_data_body>] } ]?
        [,"aggregations" : { [<sub_aggregation>]+ } ]?
    }
    [,"<aggregation_name_2>" : { ... } ]*
}

未命名

发表于 2019-08-26 | 更新于: 2019-08-26 | 分类于 nginx

nginx笔记

第三章场景实践篇

静态web服务

跨域访问

服务器允许网站请求其它域名的内容.

1 2	add_header Access-Control-Allow-Orgin http://example.com; add_header Access-Control-Allow-Methods POST,GET,PUT,DELETE,OPTIONS;

防盗链

nginx支持简单方式

http Referer(Module ngx_http_referer_module)

referer_hash_bucket_size 64;
referer_hash_max_size 2048;
valid_referers none | blocked | server_names | string ...;
# 匹配为empty string,不匹配为1
# none为空
# blocked是被代理或防火墙删除了http://或https://后剩余的url
# server_names其中之一
# string ... 其它匹配字符串

# 注意:valid_referers and $valid_referer单复数格式
valid_referers none blocked server_names
               *.example.com example.* www.example.org/galleries/
               ~\.google\.;
if ($valid_referer){
		return 403;
}

代理服务

http

https

rtmp

icmp/pop/smtp

正向代理(客户端)

反向代理(服务端)

ngx_http_proxy_module

Syntax:	`proxy_pass URL;` //http,https,socket
Default:	—
Context:	`location`, `if in location`, `limit_except`

LINUX常用工具集

发表于 2019-08-23 | 更新于: 2019-08-23 | 分类于 linux

网络监控:

NetHogs是一个开源的命令行工具（类似于Linux的top命令），用来按进程或程序实时统计网络带宽使用率。

c primer plus 学习笔记

发表于 2019-08-20 | 更新于: 2019-08-20 | 分类于 c

c primer plus 学习笔记

第一章认识C语言

阅读全文 »

centos 7 关闭笔记本合盖挂起功能

发表于 2019-08-13 | 更新于: 2019-08-13 | 分类于 centos

centos 7 关闭笔记本合盖挂起功能
家里有一台dell的老笔记本，安装了centos7系统使用。
默认系统配置把笔记本电脑屏合上后会造成挂起，网络断开。笔记本不能合盖，放置占位，也会造成灰尘进入键盘等总之不是很方便。
systemd可以处理ACPI事件，这个默认配置可以通过修改systemd-logind.service的行为修改。

阅读全文 »

git study

发表于 2019-08-12 | 更新于: 2019-08-12 | 分类于 git

git study

Git 对待数据的方法:

Git 对待数据更像是一个快照流,没有更改的直接保留一个链接到原来的文件

subversion存储每个版本与初始文件的差异
git近乎所有操作都在本地执行,所以速度很快.其它CCVS离线后基本做不了什么
git存储sha-1校验和( 40 个十六进制字符),并以此做索引引用.

GIT三种状态

已修改 modified -> 工作目录
已暂存 staged(版本标记) -> 暂存区
已提交 commited(存储到数据库) -> 仓库

1565675036087

阅读全文 »

arch linux pacman清理

发表于 2019-08-06 | 更新于: 2019-08-06 | 分类于 arch

pacman

清理已经安装的包缓存文件

pacman -Scc

[Tue Aug 06 talen@tp-arch-tianfei pkg]$ sudo pacman -Scc
[sudo] password for talen:
Cache directory: /var/cache/pacman/pkg/
:: Do you want to remove ALL files from cache? [y/N] y
removing all files from cache...
Database directory: /var/lib/pacman/
:: Do you want to remove unused repositories? [Y/n] y
removing unused sync repositories...

linux终端中文显示问题修复

发表于 2019-08-05 | 更新于: 2019-08-05 | 分类于 linux

[talen@tp-arch-tianfei nginx]$ ls
'Mastering Nginx.pdf'
'Nginx 1 Web Server Implementation Cookbook.pdf'
'Nginx Essentials.pdf'
'Nginx From Beginner to Pro.pdf'
'Nginx HTTP Server, Third Edition.pdf'
'Nginx Module Extension.pdf'
'Nginx'$'\346\225\231\347\250\213\344\273\216\345\205\245\351\227\250\345\210\260\347\262\276\351\200\232''('$'\350\277\220\347\273\264\347\224\237\345\255\230\346\227\266\351\227\264''TTLSA'$'\345\207\272\345\223\201'').pdf'
 nginx
 nginx-kernel.txt
 nginx-pdf
 nginx.conf.info
 nginx.dot
'nginx: See Active connections _ Connections Per Seconds.html'
 nginx__try_files
 nginx_architecture.png
 nginx_conf.dot
 nginx_setup.dot
'nginx'$'\347\254\254\344\270\211\346\226\271\346\250\241\345\235\227''.txt'
'nginx'$'\347\274\226\350\257\221\345\217\202\346\225\260''.txt'
''$'\345\206\263\346\210\230''Nginx'$'\357\274\232'' '$'\347\263\273\347\273\237\345\215\267'' - '$'\351\253\230\346\200\247\350\203\275''Web'$'\346\234\215\345\212\241\345\231\250\350\257\246\350\247\243\344\270\216\350\277\220\347\273\264''(jb51.net).pdf'
''$'\345\256\236\346\210\230''Nginx_'$'\345\217\226\344\273\243''Apache'$'\347\232\204\351\253\230\346\200\247\350\203\275''Web'$'\346\234\215\345\212\241\345\231\250''.'$'\345\274\240\345\256\264''.'$'\346\211\253\346\217\217\347\211\210''.pdf'
''$'\345\256\236\346\210\230''nginx'
''$'\346\267\261\345\205\245\345\211\226\346\236\220''Nginx.pdf'
''$'\346\267\261\345\205\245\347\220\206\350\247\243''Nginx'$'\346\250\241\345\235\227\345\274\200\345\217\221\344\270\216\346\236\266\346\236\204\350\247\243\346\236\220''.pdf'

阅读全文 »

react study

发表于 2019-08-01 | 更新于: 2019-08-01 | 分类于 react ， front-end

React学习笔记

graph TB
JavaScript-->JavaScript_runtime
JavaScript_runtime-->Node.js
JavaScript-->v8
v8(Google Chrome V8 引擎)-->Node.js
Node.js-->webpack(npm package 管理器)
webpack(webpack 打包器)-->npm
JSX-->babel(babel编译器)
npm-->create-react-app(create-react-app官方脚手架)
create-react-app-->React(React framework)
React_component(React 组件化三大基本要素)
babel-->React_component(React)
render-->React_component
React_component-->React
React-->vdom(虚拟DOM)
React-->diff(DIFF算法)

out_data(外部数据)-->this.props
this.props-->render(render方法)
bindin(内部数据)-->this.state
this.state-->render

add-dom(步骤 1 添加一个 DOM 容器到 HTML)-->script-tag(步骤 2 添加 Script 标签)
script-tag-->create_react(步骤 3创建一个 React 组件)

面向数据编程

语法

阅读全文 »

Python errors QA

发表于 2019-07-31 | 更新于: 2020-06-04 | 更新于 2020-06-04 | 分类于 python

python Errors 原因

Q1

2019-07-31 17:23:44,663 - jms_perm.py - INFO - perm_key: (['名字(zhi.ming)'], ['v-hostname.hx(10.10.100.100)'])
Traceback (most recent call last):
  File "jms_perm.py", line 338, in <module>
    perm_args_list = perm_process()
  File "jms_perm.py", line 318, in perm_process
    long_time_perms[perm_key] = perm
TypeError: unhashable type: 'list'

A1

Python dict的key必须为hashable

阅读全文 »

ELASTICSEARCH官方文档笔记

搜索数据

分析数据

机器学习

弹性调度

CAT API

bluk api (batch document operations)批量操作

search,analyze and machine learning

search

analyze

安装ELASTICSEARCH

配置

敏感配置(POST _nodes/reload_secure_settings加载)

INDEX生命周期策略管理

机器学习(默认是开启听)

heap size

系统配置

Aggregations

nginx笔记

第三章 场景实践篇

静态web服务

跨域访问

防盗链

代理服务

正向代理(客户端)

反向代理(服务端)

c primer plus 学习笔记

第一章 认识C语言

git study

GIT三种状态

pacman

React学习笔记

语法

python Errors 原因

Q1

A1

第三章场景实践篇

第一章认识C语言