侧边栏壁纸
博主头像
孔子说JAVA博主等级

成功只是一只沦落在鸡窝里的鹰,成功永远属于自信且有毅力的人!

  • 累计撰写 352 篇文章
  • 累计创建 135 个标签
  • 累计收到 10 条评论

目 录CONTENT

文章目录

ES教程17-ElasticSearch-SQL插件使用SQL查询示例详解

孔子说JAVA
2022-10-22 / 0 评论 / 0 点赞 / 129 阅读 / 9,000 字 / 正在检测是否收录...
广告 广告

ElasticSearch-SQL(后文简称es-sql)是ElasticSearch的一个插件,提供了es的类sql查询的相关接口,使用者可以使用sql进行es数据的查询,但有一定的局限性,没有原生的Query DSL那么强大,对于嵌套属性和某些函数的支持并不怎么好,但是平时用来查询下数据基本够用了。

1、数据准备

本教程使用 kibana 插件执行查询,es版本为7.17.3。

1.1 创建索引

PUT /book_info
{
    "settings": {
        "index": {
            "number_of_shards": 3,
            "number_of_replicas": 2
        }
    },
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
		"fields":{"keyword": {"type": "keyword","ignore_above": 256}}
            },
            "author": {
                "type": "text",
		"fields":{"keyword": {"type": "keyword","ignore_above": 256}} 
            },
            "release_date": {
                "type": "date",
		"format":"yyyy-MM-dd"
				
            },
            "page_count": {
                "type": "integer"
            }
        }
    }
}

:可以对一个字段提供多种索引模式,同一个字段的值,一个分词,一个不分词。
如果要用到like这种模糊操作,但是type是text的,那么就还需要为该字段设置不分词的索引方式,即type=keyword,如上面的name和author字段。

1.2 批量增加文档

POST /book_info/_bulk
{ "index" : {  "_id" : "101" } }
{ "name" : "hello","author":"tom","release_date":"2020-01-26","page_count": 500 }
{ "index" : {  "_id" : "102" } }
{ "name" : "world","author":"Marry","release_date":"2018-01-26","page_count": 352 }
{ "index" : {  "_id" : "103" } }
{ "name" : "test","author":"test","release_date":"2019-03-26","page_count": 436 }
{ "index" : {  "_id" : "104" } }
{ "name" : "demo","author":"demo","release_date":"2020-06-26","page_count": 337 }
{ "index" : {  "_id" : "105" } }
{ "name" : "zhangsan","author":"zhangsan","release_date":"2015-05-21","page_count": 562 }
{ "index" : {  "_id" : "106" } }
{ "name" : "zhangsan02","author":"zhangsan","release_date":"2015-05-21","page_count": 562 }
# 这里需要换行

注意:语句之间不能换行,在语句的末尾需要换行。

1.3 DSL查询数据

# 查询所有数据
GET book_info/_search

image-1666317612778

2、查询语法

X-PACK SQL支持以REST API、SQL CLI客户端以及JDBC的形式连接Elasticsearch。

Mapping concepts across SQL and Elasticsearch

SQL Elasticsearch
column field
row document
table index
schema implicit
database cluster

2.1 查询语法

查询url格式如下:

/_sql?format=txt
  • format 指定返回格式,不设置则默认返回JSON (支持,csv,txt,json,yaml。

请求体body参数:

  • query:要查询的sql参数,如:“select * from user”
  • fetch_size:一个响应中要返回的最大行(或条目)数,默认是1000
  • filter:可选的Elasticsearch查询DSL,以进行其他过滤。
  • request_timeout:请求失败之前的超时,默认是90s
  • page_timeout:分页请求失败之前的超时,默认是45s
  • columnar:以列方式而不是基于行的方式返回结果。有效期为json,yaml,cbor和smile。默认是false。
  • field_multi_value_leniency:当遇到一个字段的多个值(默认值)或宽大处理时抛出异常,并从列表中返回第一个值(不保证将要返回的值-通常以自然的升序方式第一个)。默认是false。
  • index_include_frozen:是否在查询执行中包括冻结索引(默认值)。默认是false。

2.2 SQL语法

在 ElasticSearch 中使用 SQL 查询的语法与在数据库中使用基本一致,具体格式如下:

SELECT [TOP [ count ] ] select_expr [, ...]
[ FROM table_name ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count ] ]
[ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ]

3、查询示例

3.1 索引查询

查询索引

可以使用SHOW TABLES查看所有的表(ES中为索引)。

POST /_sql?format=txt
{
  "query": "SHOW TABLES"
}

image-1666318724940

查询字段

可以使用DESCRIBE语句查看表(ES中为索引)中有哪些字段,比如查看 book_info 表(索引)的字段,查询语句如下:

POST /_sql?format=txt
{
  "query": "DESCRIBE book_info"
}

image-1666318776923

3.2 条件查询

可以使用WHERE语句设置查询条件。逻辑操作符如下:

操作 操作符
并且 AND
或者 OR
否定 NOT

无条件查询

POST /_sql?format=txt
{
  "query": "SELECT * FROM book_info"
}

image-1666318152064

单条件查询

POST /_sql?format=txt
{
  "query": "SELECT * FROM book_info where name = 'hello'"
}

image-1666318275386

多条件查询

POST /_sql?format=txt
{
  "query": "SELECT * FROM book_info where name = 'hello' and author='tom'"
}

sql示例:

SELECT name FROM alicia WHERE salary >= 100 AND salary < 200 ORDER BY createTime LIMIT 5;
SELECT name FROM alicia WHERE salary < 200 OR salary = 100 ORDER BY createTime LIMIT 5;
SELECT name FROM alicia WHERE NOT salary = 200 LIMIT 5;

3.3 返回行数查询

limit限制行数

POST /_sql?format=txt
{
  "query": "SELECT * FROM book_info LIMIT 2"
}

fetch_size限制行数

POST /_sql?format=txt
{
  "query": "SELECT * FROM book_info",
  "fetch_size": 3
}

3.4 排序查询

可以使用ORDER BY语句对数据进行排序。可以查询指定的字段。

排序+限制行数

POST /_sql?format=txt
{
  "query": "SELECT * FROM book_info ORDER BY release_date DESC LIMIT 2"
}

排序+限制返回字段

POST /_sql?format=txt
{
  "query": "SELECT name,release_date,page_count FROM book_info ORDER BY release_date DESC  LIMIT 3"
}

3.5 比较查询

操作 操作符
等于 =
不等于 <> 或者 !=
比较 <,<=,>,>=
介于两个值之间的数据范围 BETWEEN
为空/不为空 IS NULL/IS NOT NULL
在 WHERE 子句中规定多个值 IN(, , …)

时间比较查询

POST /_sql?format=txt
{
  "query": "SELECT * FROM book_info WHERE release_date < '2020-01-01'  LIMIT 3"
}

时间区间查询

POST /_sql/translate
{
  "query": "SELECT name,author,release_date FROM book_info WHERE release_date >'2019-01-01' and release_date < '2020-01-01'  LIMIT 3"
}

sql示例:

SELECT name FROM alicia WHERE salary = 100 LIMIT 5;
SELECT name FROM alicia WHERE salary <> 200 ORDER BY createTime LIMIT 5;
SELECT name FROM alicia WHERE salary < 1000 ORDER BY createTime LIMIT 5;
SELECT name FROM alicia WHERE salary BETWEEN 100 AND 200 ORDER BY createTime;
SELECT name FROM alicia WHERE salary IS NOT NULL AND  IS NULL;
SELECT name FROM alicia WHERE salary IN (10000, 10001, 10002, 999) ORDER BY emp_no LIMIT 5;
SELECT name FROM alicia WHERE salary IN (10000, 10001, 10002, 999) ORDER BY emp_no LIMIT 5;

3.6 聚合与分组查询

可以使用GROUP BY语句对数据进行分组,统计出分组记录数量,最大age和平均salary等信息。

用途 函数名
返回平均值 AVG
返回匹配指定条件的行数 COUNT
返回非重复结果的数目 COUNT(DISTINCT)
返回指定的字段中第一个记录的值 FIRST/FIRST_VALUE
返回指定的字段中最后一个记录的值 LAST/LAST_VALUE
返回指定列的最大值 MAX
返回指定列的最小值 MIN
返回数值列的总数 SUM

注意:

  • FIRST/LAST不能用于类型为text的列,除非这个字段也另存为keyword。
  • MAX作用于text或者keyword字段时,被翻译为LAST/LAST_VALUE,因此,它不能在HAVING字句中使用。
  • MIN作用于text或者keyword字段时,被翻译为FIRST/FIRST_VALUE,因此,它不能在HAVING字句中使用。

聚合查询

POST /_sql?format=txt
{
  "query": "SELECT sum(page_count) FROM book_info"
}

POST /_sql?format=txt
{
  "query": "SELECT avg(page_count),sum(page_count),min(page_count),max(page_count),count(page_count) FROM book_info"
}

分组查询

POST /_sql?format=txt
{
  "query": "SELECT name,sum(page_count) FROM book_info group by name"
}

POST /_sql?format=txt
{
  "query": "SELECT author,count(*),max(page_count) as count FROM book_info group by author"
}

count查询

POST /_sql?format=txt
{
  "query": "SELECT count(*),count(distinct author) FROM book_info "
}

常用sql举例:

SELECT AVG(salary) AS avg FROM city;
SELECT COUNT(*) AS count FROM city;
SELECT COUNT(ALL name) AS count_all, COUNT(DISTINCT remark) count_distinct FROM city;
SELECT COUNT(DISTINCT name) unique_name, COUNT(salary) AS salary FROM city;
SELECT FIRST(name) FROM city;
SELECT createTime, FIRST(name) FROM city GROUP BY createTime ORDER BY createTime;
SELECT LAST(name) FROM city;
SELECT createTime, LAST(name) FROM city GROUP BY createTime ORDER BY createTime;
SELECT MAX(salary) AS max FROM city;
SELECT MIN(salary) AS min FROM city;
SELECT SUM(salary) AS sum FROM city;

统计函数

  • KURTOSIS只能直接在字段上使用
  • SKEWNESS只能直接在字段上使用
用途 函数名
峰度(描述总体中所有取值分布形态陡缓程度) KURTOSIS
中位数绝对偏差 MAD
百分位数 PERCENTILE
百分等级 PERCENTILE_RANK
偏态系数(统计数据分布偏斜方向和程度) SKEWNESS
总体标准方差 STDDEV_POP
平方和 SUM_OF_SQUARES
总体方差 VAR_POP

sql示例:

SELECT MIN(salary) AS min, MAX(salary) AS max, KURTOSIS(salary) AS k FROM city;
SELECT MIN(salary) AS min, MAX(salary) AS max, AVG(salary) AS avg, MAD(salary) AS mad FROM city;
SELECT name, PERCENTILE(salary, 95) AS "95th" FROM city GROUP BY name;
SELECT name, PERCENTILE_RANK(salary, 65000) AS rank FROM city GROUP BY name;
SELECT MIN(salary) AS min, MAX(salary) AS max, SKEWNESS(salary) AS s FROM city;
SELECT MIN(salary) AS min, MAX(salary) AS max, STDDEV_POP(salary) AS stddev FROM city;
SELECT MIN(salary) AS min, MAX(salary) AS max, SUM_OF_SQUARES(salary) AS sumsq FROM city;
SELECT MIN(salary) AS min, MAX(salary) AS max, VAR_POP(salary) AS varpop FROM city;

3.7 二次筛选

使用 HAVING 语句对分组数据进行二次筛选,比如筛选分组sum数量大于500的前2条记录。

POST /_sql?format=txt
{
  "query": "SELECT name,sum(page_count) FROM book_info group by name HAVING sum(page_count)>500 LIMIT 2"
}

3.8 类型转换

操作 操作符
类型转换 ::

sql示例:

SELECT '123'::long AS long;

3.9 LIKE 和 RLIKE 运算符

即使在过滤时,LIKE/RLIKE是一种有效的方案,但是全文搜索谓词 MATCH 和 QUERY 更快更强大,更适合作为首选方案。

  • 如果与LIKE / RLIKE一起使用的字段没有精确的(keyword类型的)非归一化子字段,Elasticsearch SQL将无法运行查询。如果该字段是精确字段或者有精确的子字段,它将按原样使用它,或者即使未在语句中明确指定,它也会自动使用精确的子字段。

sql示例:

SELECT name FROM alicia WHERE name LIKE 'ali%'; 
SELECT name FROM city WHERE name RLIKE 'ali.*';

3.10 SQL Translate API

将SQL语句转换为Elasticsearch查询语法。

POST /_xpack/sql/translate
{
    "query": "SELECT * FROM library ORDER BY page_count DESC",
    "fetch_size": 10
}

4、支持的函数

使用SQL查询ES中的数据,不仅可以使用一些SQL中的函数,还可以使用一些ES中特有的函数。

4.1 SHOW FUNCTIONS

我们可以使用SHOW FUNCTIONS语句查看所有支持的函数,比如搜索所有带有DATE字段的函数可以使用如下语句:

POST /_sql?format=txt
{
  "query": "SHOW FUNCTIONS LIKE '%DATE%'"
}

image-1666320104547

4.2 全文搜索函数

全文搜索函数是ES中特有的,当使用 MATCH 或 QUERY 函数时,会启用全文搜索功能,SCORE函数可以用来统计搜索评分。

(1)使用MATCH函数查询 author 中包含 zhangsan 的记录。

POST /_sql?format=txt
{
  "query": "SELECT author,name,page_count,SCORE() FROM book_info WHERE MATCH(author,'zhangsan') LIMIT 10"
}

image-1666320469783

(2)使用QUERY函数查询 author 中包含 zhangsan 的记录。

POST /_sql?format=txt
{
  "query": "SELECT author,name,page_count,SCORE() FROM book_info WHERE QUERY('author:zhangsan') LIMIT 10"
}

image-1666320507954

0

评论区