公司内容团队的服务底层使用的是es,抽时间看看代码顺带着回归下es相关功能
搜索核心就是倒排索引,es有很多版本2.x,5.x, 6.x, and 7.x 不同版本api都需要注意
why
mysql无法高性能的完成全文检索功能,因为like太低效率,同时也无法支持分词搜索功能
es还支持相关性评分召回,mysql对记录的查询只有匹配或者不匹配
好处
* 支持全文检索、结构化检索、数据分析
* 支持横向扩展、分片、高可用(备份)
场景
- 日志数据分析
- 电商网站商品检索
- 报表分析
- 多表关联或者负载的多表查询
文档相关
文档:https://www.elastic.co/guide/en/elasticsearch/client/java-rest/6.3/index.html?spm=a2c4g.11186623.2.15.39c97cdfk5qpQ9
下载地址:https://www.elastic.co/cn/downloads/插件
ik分词
参考:https://www.jianshu.com/p/653f7b33e63c
git地址:https://github.com/medcl/elasticsearch-analysis-ik
bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v1.10.6/elasticsearch-analysis-ik-1.10.6.zip
pinying分词
git 地址:
https://github.com/medcl/elasticsearch-analysis-pinyin
安装命令:
bin/plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v1.8.0/elasticsearch-analysis-pinyin-1.8.0.zip
如果找不到对应的版本,可以自己打包放到plugins目录下解压
elasticsearch-head
git地址:https://github.com/mobz/elasticsearch-head
安装:
for Elasticsearch 5.x, 6.x, and 7.x: site plugins are not supported. Run as a standalone server
for Elasticsearch 2.x: sudo elasticsearch/bin/plugin install mobz/elasticsearch-head
修改配置文件支持跨域
elasticsearch.yml文件,在文件的末尾加上参数,使head插件可以访问es
http.cors.enabled: true
http.cors.allow-origin: "*"
启动:
npm install
npm run start
es定义的字段类型
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html
index
默认true false表示不参加倒排索引,不做查询条件
store
在ES中原始的文本会存储在_source里面(除非你关闭了它)。
默认情况下其他提取出来的字段都不是独立存储的,是从_source里面提取出来的。
当然你也可以独立的存储某个字段,只要设置store:true即可。
Text vs. keyword
ElasticSearch 5.0以后,string类型有重大变更,移除了string类型,string字段被拆分成两种新的数据类型:
text用于全文搜索的,而keyword用于关键词搜索。
ElasticSearch对字符串拥有两种完全不同的搜索方式. 你可以按照整个文本进行匹配,
即关键词搜索(keyword search), 也可以按单个字符匹配, analyzed
即全文搜索(full-text search). not-analyzed
Text:会分词,然后进行索引
支持模糊、精确查询
不支持聚合
keyword:不进行分词,直接索引
支持模糊、精确查询
支持聚合
text用于全文搜索的, 而keyword用于关键词搜索
查询条件
matchQuery:会将搜索词分词,再与目标查询字段进行匹配,若分词中的任意一个词与目标字段匹配上,则可查询到。
termQuery:不会对搜索词进行分词处理,而是作为一个整体与目标字段进行匹配,若完全匹配,则可查询到。
match_all
匹配所有的查询
match 全文查询
默认情况下,Elasticsearch根据结果相关性评分来对结果集进行排序,所谓的「结果相关性评分」就是文档与查询条件的匹配程度
1、检查字段类型,查看字段是 analyzed, not_analyzed
2、分析查询字符串,如果只有一个单词项, match 查询在执行时就会是单个底层的 term 查询
3、查找匹配的文档,会在倒排索引中查找匹配文档,然后获取一组包含该项的文档
4、为每个文档评分
Term 基于词项的查询
精确查询,查询可用作精确值匹配,精确值的类型则可以是数字,时间,布尔类型,或者是那些 not_analyzed 的字符串
terms
查询允许指定多个值进行匹配。如果这个字段包含了指定值中的任何一个值,就表示该文档满足条件
Wildcard
通配符查询是一种底层基于词的查询,它允许指定匹配的正则表达式。而且它使用的是标准的 shell 通配符查询:
?匹配任意字符
* 匹配 0 个或多个字符
由于通配符和正则表达式只能在查询时才能完成,因此查询效率会比较低,在需要高性能的场合,应当谨慎使用
Bool
多个查询条件组合一起
Fuzzy
如果指定的字段是string类型,模糊查询是基于编辑距离算法来匹配文档。
编辑距离的计算基于我们提供的查询词条和被搜索文档。如果指定的字段是数值类型或者日期类型,模糊查询基于在字段值上进行加减操作来匹配文档
分片与备份
分片(shard):因为ES是个分布式的搜索引擎, 所以索引通常都会分解成不同部分,
而这些分布在不同节点的数据就是分片. ES自动管理和组织分片, 并在必要的时候对分片数据进行再平衡分配,
所以用户基本上不用担心分片的处理细节,一个分片默认最大文档数量是20亿.
副本(replica):ES默认为一个索引创建5个主分片, 并分别为其创建一个副本分片.
也就是说每个索引都由5个主分片成本, 而每个主分片都相应的有一个copy.
常用注意事项
es延迟
es refresh_interval默认是1秒 在这个时间间隔内search是不可见的
解决方法三种:
1、通过ui层解决。操作成功后只操作UI,而不是通过ES
2、搜索时加上?refresh=wait_for,表示如果1秒内有请求立即更新并可见
3、更新的时候设置refresh=true
es更新mapping
1、创建索引时候设置别名
相关参考代码
public void createIndex() throws IOException {
GetIndexRequest getIndexRequest = new GetIndexRequest();
getIndexRequest.indices(elasticsearchConfig.getIndexName());
//索引是否存在
boolean getIndexResponse = restHighLevelClient.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
if (getIndexResponse) {
DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest(elasticsearchConfig.getIndexName());
restHighLevelClient.indices().delete(deleteIndexRequest);
}
/*************创建**************/
CreateIndexRequest request = new CreateIndexRequest(elasticsearchConfig.getIndexName());
//设置分片与副本
request.settings(Settings.builder()
.put("index.number_of_shards", 5)
.put("index.number_of_replicas", 1)
);
String mappingJsonStr = "{\"properties\":{\"id\":{\"type\":\"long\"},\"name\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"},\"provinceCode\":{\"type\":\"keyword\"},\"cityCode\":{\"type\":\"keyword\"},\"areaCode\":{\"type\":\"keyword\"},\"tagIdList\":{\"type\":\"long\"}}}";
//创建mapping
request.mapping(ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName(), mappingJsonStr, XContentType.JSON);
restHighLevelClient.indices().createAsync(request, RequestOptions.DEFAULT, new ActionListener() {
@Override
public void onResponse(Object o) {
log.info("create index successful");
}
@Override
public void onFailure(Exception e) {
log.error("create index error", e);
}
});
}
/**
* 线上数据量不多 无须分页
*/
public void syncData() {
try {
SessionManager.changeSafeMode(SafeMode.KERNEL);
long beginTime = System.currentTimeMillis();
ResultWrapper<List<DepartmentBo>> allDepartmentBoList = departmentService.list();
if (CollectionUtils.isEmpty(allDepartmentBoList.getData())) {
return;
}
BulkRequest deleteRequest = new BulkRequest();
BulkRequest bulkRequest = new BulkRequest();
List<Long> departmentIdList = allDepartmentBoList.getData().stream().map(DepartmentBo::getId).collect(Collectors.toList());
Map<Long, List<BizDataTagBo>> departmentBizDataTagBoMap = departmentService.getDepartmentBizDataTagBoMap(departmentIdList);
ResultWrapper<List<DepartmentArea>> listDeparmentAreaWrapper = departmentService.findDepartmentAreaByDepartmentIdList(departmentIdList);
Map<Long, DepartmentArea> areaMap = listDeparmentAreaWrapper.getData().stream().collect(Collectors.toMap(DepartmentArea::getDepartmentId, Function.identity()));
allDepartmentBoList.getData().forEach(departmentBo -> {
DepartmentArea departmentArea = areaMap.get(departmentBo.getId());
ESDepartmentBo esDepartmentBo = new ESDepartmentBo();
if (areaMap.containsKey(departmentBo.getId())) {
esDepartmentBo.setAreaCode(departmentArea.getAreaCode());
esDepartmentBo.setCityCode(departmentArea.getCityCode());
esDepartmentBo.setProvinceCode(departmentArea.getProvinceCode());
}
esDepartmentBo.setId(departmentBo.getId());
esDepartmentBo.setName(departmentBo.getName());
if (departmentBizDataTagBoMap.containsKey(departmentBo.getId())) {
esDepartmentBo.setTagIdList(departmentBizDataTagBoMap.get(departmentBo.getId()).stream().map(BizDataTagBo::getTagId).collect(Collectors.toList()));
}
bulkRequest.add(putData(esDepartmentBo));
deleteRequest.add(deleteData(departmentBo.getId()));
});
bulkRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
restHighLevelClient.bulk(deleteRequest, RequestOptions.DEFAULT);
restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
long endTime = System.currentTimeMillis();
log.info("costTime:{}ms,syncdata size:{}", endTime - beginTime, allDepartmentBoList.getData().size());
} catch (Exception e) {
log.error("syncData error", e);
}
}
private DeleteRequest deleteData(Long id) {
DeleteRequest deleteRequest = new DeleteRequest(elasticsearchConfig.getIndexName(), ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName(), String.valueOf(id));
return deleteRequest;
}
public IndexRequest putData(ESDepartmentBo esDepartmentBo) {
IndexRequest indexRequest = new IndexRequest(elasticsearchConfig.getIndexName(), ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName(), String.valueOf(esDepartmentBo.getId()));
indexRequest.source(JSONObject.toJSONString(esDepartmentBo), XContentType.JSON);
return indexRequest;
}
public class DeparementESServiceImpl implements DeparementESService {
@Autowired
private ElasticSearchConfig elasticsearchConfig;
@Autowired
private RestHighLevelClient restHighLevelClient;
@Override
public ResultWrapper<Boolean> delete(List<Long> departmentIdList) {
ResultWrapper<Boolean> resultWrapper = new ResultWrapper<Boolean>(0, "");
if (CollectionUtils.isEmpty(departmentIdList)) {
return resultWrapper;
}
BulkRequest deleteRequest = new BulkRequest();
departmentIdList.forEach(id -> {
deleteRequest.add(new DeleteRequest(elasticsearchConfig.getIndexName(), ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName(), String.valueOf(id)));
});
deleteRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
try {
restHighLevelClient.bulk(deleteRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
log.error("delete error", e);
}
return resultWrapper;
}
@Override
public ResultWrapper<Boolean> save(List<ESDepartmentBo> esDepartmentBoList) {
ResultWrapper<Boolean> resultWrapper = new ResultWrapper<Boolean>(0, "");
if (CollectionUtils.isEmpty(esDepartmentBoList)) {
return resultWrapper;
}
BulkRequest bulkRequest = new BulkRequest();
esDepartmentBoList.forEach(esDepartmentBo -> {
IndexRequest indexRequest = new IndexRequest(elasticsearchConfig.getIndexName(), ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName(), String.valueOf(esDepartmentBo.getId()));
indexRequest.source(JSONObject.toJSONString(esDepartmentBo), XContentType.JSON);
bulkRequest.add(indexRequest);
});
bulkRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
try {
restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
log.error("save error", e);
}
return resultWrapper;
}
@Override
public ResultWrapper<Boolean> update(Map<Long, Map<String, Object>> mapMap) {
ResultWrapper<Boolean> resultWrapper = new ResultWrapper<Boolean>(0, "");
if (mapMap==null) {
return resultWrapper;
}
BulkRequest bulkRequest = new BulkRequest();
mapMap.forEach((deparmentId,mapJson) -> {
UpdateRequest updateRequest = new UpdateRequest(elasticsearchConfig.getIndexName(), ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName(), String.valueOf(deparmentId));
updateRequest.doc(mapJson);
bulkRequest.add(updateRequest);
});
bulkRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
try {
restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
log.error("save error", e);
}
return null;
}
@Override
public ResultWrapper<List<Long>> search(QueryDepartmentRequest queryDepartmentRequest) {
ResultWrapper<List<Long>> resultWrapper = new ResultWrapper<List<Long>>(0, "");
List<Long> depatementIdList = new ArrayList<>();
resultWrapper.setData(depatementIdList);
SearchRequest searchRequest = new SearchRequest(elasticsearchConfig.getIndexName());
searchRequest.types(ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName());
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
if (StringUtils.isNotBlank(queryDepartmentRequest.getName())) {
boolQuery.must().add(QueryBuilders.matchQuery("name", queryDepartmentRequest.getName()));
}
if (StringUtils.isNotBlank(queryDepartmentRequest.getProvinceCode()) && !("-1").equals(queryDepartmentRequest.getProvinceCode())) {
boolQuery.must().add(QueryBuilders.termQuery("provinceCode", queryDepartmentRequest.getProvinceCode()));
}
if (StringUtils.isNotBlank(queryDepartmentRequest.getAreaCode()) && !("-1").equals(queryDepartmentRequest.getAreaCode())) {
boolQuery.must().add(QueryBuilders.termQuery("areaCode", queryDepartmentRequest.getAreaCode()));
}
if (StringUtils.isNotBlank(queryDepartmentRequest.getCityCode()) && !("-1").equals(queryDepartmentRequest.getCityCode())) {
boolQuery.must().add(QueryBuilders.termQuery("cityCode", queryDepartmentRequest.getCityCode()));
}
if (CollectionUtils.isNotEmpty(queryDepartmentRequest.getTagIdList())) {
boolQuery.must().add(QueryBuilders.termsQuery("tagIdList", queryDepartmentRequest.getTagIdList()));
}
if (CollectionUtils.isNotEmpty(queryDepartmentRequest.getDepartmentIdList())) {
boolQuery.must().add(QueryBuilders.termsQuery("id", queryDepartmentRequest.getDepartmentIdList()));
}
searchSourceBuilder.query(boolQuery);
searchRequest.source(searchSourceBuilder);
try {
Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
searchRequest.scroll(scroll);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = searchResponse.getScrollId();
log.info("total:{}", searchResponse.getHits().getTotalHits());
SearchHit[] searchHits = searchResponse.getHits().getHits();
if (searchHits == null) {
return resultWrapper;
} else {
for (SearchHit searchHit : searchHits) {
depatementIdList.add(Long.parseLong(searchHit.getId()));
}
}
while (searchHits != null && searchHits.length > 0) {
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
searchResponse = restHighLevelClient.scroll(scrollRequest, RequestOptions.DEFAULT);
scrollId = searchResponse.getScrollId();
searchHits = searchResponse.getHits().getHits();
if (searchHits != null) {
for (SearchHit searchHit : searchHits) {
depatementIdList.add(Long.parseLong(searchHit.getId()));
}
}
}
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = restHighLevelClient.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
resultWrapper.setData(depatementIdList);
} catch (IOException e) {
log.error("", e);
}
return resultWrapper;
}
}