elasticsearch记录


公司内容团队的服务底层使用的是es,抽时间看看代码顺带着回归下es相关功能
搜索核心就是倒排索引,es有很多版本2.x,5.x, 6.x, and 7.x 不同版本api都需要注意

why

mysql无法高性能的完成全文检索功能,因为like太低效率,同时也无法支持分词搜索功能
es还支持相关性评分召回,mysql对记录的查询只有匹配或者不匹配

好处

* 支持全文检索、结构化检索、数据分析
*  支持横向扩展、分片、高可用(备份)

场景

pinying分词

git 地址:
https://github.com/medcl/elasticsearch-analysis-pinyin
安装命令:
bin/plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v1.8.0/elasticsearch-analysis-pinyin-1.8.0.zip

如果找不到对应的版本,可以自己打包放到plugins目录下解压

elasticsearch-head

git地址:https://github.com/mobz/elasticsearch-head
安装:
for Elasticsearch 5.x, 6.x, and 7.x: site plugins are not supported. Run as a standalone server
for Elasticsearch 2.x: sudo elasticsearch/bin/plugin install mobz/elasticsearch-head
修改配置文件支持跨域
elasticsearch.yml文件,在文件的末尾加上参数,使head插件可以访问es
http.cors.enabled: true
http.cors.allow-origin: "*"
启动:
npm install
npm run start

es定义的字段类型

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

index

默认true  false表示不参加倒排索引,不做查询条件

store

在ES中原始的文本会存储在_source里面(除非你关闭了它)。
默认情况下其他提取出来的字段都不是独立存储的,是从_source里面提取出来的。
当然你也可以独立的存储某个字段,只要设置store:true即可。

Text vs. keyword

ElasticSearch 5.0以后,string类型有重大变更,移除了string类型,string字段被拆分成两种新的数据类型:
text用于全文搜索的,而keyword用于关键词搜索。
ElasticSearch对字符串拥有两种完全不同的搜索方式. 你可以按照整个文本进行匹配,
 即关键词搜索(keyword search), 也可以按单个字符匹配, analyzed
 即全文搜索(full-text search). not-analyzed
 Text:会分词,然后进行索引
       支持模糊、精确查询
       不支持聚合
 keyword:不进行分词,直接索引
       支持模糊、精确查询
       支持聚合
text用于全文搜索的, 而keyword用于关键词搜索

查询条件

matchQuery:会将搜索词分词,再与目标查询字段进行匹配,若分词中的任意一个词与目标字段匹配上,则可查询到。
termQuery:不会对搜索词进行分词处理,而是作为一个整体与目标字段进行匹配,若完全匹配,则可查询到。

match_all

匹配所有的查询

match 全文查询

默认情况下,Elasticsearch根据结果相关性评分来对结果集进行排序,所谓的「结果相关性评分」就是文档与查询条件的匹配程度
1、检查字段类型,查看字段是 analyzed, not_analyzed
2、分析查询字符串,如果只有一个单词项, match 查询在执行时就会是单个底层的 term 查询
3、查找匹配的文档,会在倒排索引中查找匹配文档,然后获取一组包含该项的文档
4、为每个文档评分

Term 基于词项的查询

精确查询,查询可用作精确值匹配,精确值的类型则可以是数字,时间,布尔类型,或者是那些 not_analyzed 的字符串

terms

查询允许指定多个值进行匹配。如果这个字段包含了指定值中的任何一个值,就表示该文档满足条件

Wildcard

通配符查询是一种底层基于词的查询,它允许指定匹配的正则表达式。而且它使用的是标准的 shell 通配符查询:
?匹配任意字符
* 匹配 0 个或多个字符
由于通配符和正则表达式只能在查询时才能完成,因此查询效率会比较低,在需要高性能的场合,应当谨慎使用

Bool

多个查询条件组合一起

Fuzzy

如果指定的字段是string类型,模糊查询是基于编辑距离算法来匹配文档。
编辑距离的计算基于我们提供的查询词条和被搜索文档。如果指定的字段是数值类型或者日期类型,模糊查询基于在字段值上进行加减操作来匹配文档

分片与备份

分片(shard):因为ES是个分布式的搜索引擎, 所以索引通常都会分解成不同部分,
而这些分布在不同节点的数据就是分片. ES自动管理和组织分片, 并在必要的时候对分片数据进行再平衡分配,
 所以用户基本上不用担心分片的处理细节,一个分片默认最大文档数量是20亿.
副本(replica):ES默认为一个索引创建5个主分片, 并分别为其创建一个副本分片.
 也就是说每个索引都由5个主分片成本, 而每个主分片都相应的有一个copy.

常用注意事项

es延迟

es refresh_interval默认是1秒 在这个时间间隔内search是不可见的
解决方法三种:
1、通过ui层解决。操作成功后只操作UI,而不是通过ES
2、搜索时加上?refresh=wait_for,表示如果1秒内有请求立即更新并可见
3、更新的时候设置refresh=true

es更新mapping

1、创建索引时候设置别名

相关参考代码

 public void createIndex() throws IOException {
        GetIndexRequest getIndexRequest = new GetIndexRequest();
        getIndexRequest.indices(elasticsearchConfig.getIndexName());
        //索引是否存在
        boolean getIndexResponse = restHighLevelClient.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
        if (getIndexResponse) {
            DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest(elasticsearchConfig.getIndexName());
            restHighLevelClient.indices().delete(deleteIndexRequest);
        }

        /*************创建**************/
        CreateIndexRequest request = new CreateIndexRequest(elasticsearchConfig.getIndexName());
        //设置分片与副本
        request.settings(Settings.builder()
                .put("index.number_of_shards", 5)
                .put("index.number_of_replicas", 1)
        );
        String mappingJsonStr = "{\"properties\":{\"id\":{\"type\":\"long\"},\"name\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"},\"provinceCode\":{\"type\":\"keyword\"},\"cityCode\":{\"type\":\"keyword\"},\"areaCode\":{\"type\":\"keyword\"},\"tagIdList\":{\"type\":\"long\"}}}";
        //创建mapping
        request.mapping(ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName(), mappingJsonStr, XContentType.JSON);
        restHighLevelClient.indices().createAsync(request, RequestOptions.DEFAULT, new ActionListener() {
            @Override
            public void onResponse(Object o) {
                log.info("create index successful");
            }

            @Override
            public void onFailure(Exception e) {
                log.error("create index error", e);
            }
        });
    }

    /**
     * 线上数据量不多 无须分页
     */
    public void syncData() {
        try {
            SessionManager.changeSafeMode(SafeMode.KERNEL);
            long beginTime = System.currentTimeMillis();
            ResultWrapper<List<DepartmentBo>> allDepartmentBoList = departmentService.list();
            if (CollectionUtils.isEmpty(allDepartmentBoList.getData())) {
                return;
            }
            BulkRequest deleteRequest = new BulkRequest();
            BulkRequest bulkRequest = new BulkRequest();
            List<Long> departmentIdList = allDepartmentBoList.getData().stream().map(DepartmentBo::getId).collect(Collectors.toList());
            Map<Long, List<BizDataTagBo>> departmentBizDataTagBoMap = departmentService.getDepartmentBizDataTagBoMap(departmentIdList);

            ResultWrapper<List<DepartmentArea>> listDeparmentAreaWrapper = departmentService.findDepartmentAreaByDepartmentIdList(departmentIdList);

            Map<Long, DepartmentArea> areaMap = listDeparmentAreaWrapper.getData().stream().collect(Collectors.toMap(DepartmentArea::getDepartmentId, Function.identity()));
            allDepartmentBoList.getData().forEach(departmentBo -> {
                DepartmentArea departmentArea = areaMap.get(departmentBo.getId());
                ESDepartmentBo esDepartmentBo = new ESDepartmentBo();
                if (areaMap.containsKey(departmentBo.getId())) {
                    esDepartmentBo.setAreaCode(departmentArea.getAreaCode());
                    esDepartmentBo.setCityCode(departmentArea.getCityCode());
                    esDepartmentBo.setProvinceCode(departmentArea.getProvinceCode());
                }
                esDepartmentBo.setId(departmentBo.getId());
                esDepartmentBo.setName(departmentBo.getName());
                if (departmentBizDataTagBoMap.containsKey(departmentBo.getId())) {
                    esDepartmentBo.setTagIdList(departmentBizDataTagBoMap.get(departmentBo.getId()).stream().map(BizDataTagBo::getTagId).collect(Collectors.toList()));
                }
                bulkRequest.add(putData(esDepartmentBo));
                deleteRequest.add(deleteData(departmentBo.getId()));
            });


            bulkRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
            restHighLevelClient.bulk(deleteRequest, RequestOptions.DEFAULT);
            restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
            long endTime = System.currentTimeMillis();
            log.info("costTime:{}ms,syncdata size:{}", endTime - beginTime, allDepartmentBoList.getData().size());
        } catch (Exception e) {
            log.error("syncData error", e);
        }
    }

    private DeleteRequest deleteData(Long id) {
        DeleteRequest deleteRequest = new DeleteRequest(elasticsearchConfig.getIndexName(), ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName(), String.valueOf(id));
        return deleteRequest;
    }

    public IndexRequest putData(ESDepartmentBo esDepartmentBo) {
        IndexRequest indexRequest = new IndexRequest(elasticsearchConfig.getIndexName(), ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName(), String.valueOf(esDepartmentBo.getId()));
        indexRequest.source(JSONObject.toJSONString(esDepartmentBo), XContentType.JSON);
        return indexRequest;
    }

public class DeparementESServiceImpl implements DeparementESService {
    @Autowired
    private ElasticSearchConfig elasticsearchConfig;
    @Autowired
    private RestHighLevelClient restHighLevelClient;

    @Override
    public ResultWrapper<Boolean> delete(List<Long> departmentIdList) {
        ResultWrapper<Boolean> resultWrapper = new ResultWrapper<Boolean>(0, "");
        if (CollectionUtils.isEmpty(departmentIdList)) {
            return resultWrapper;
        }
        BulkRequest deleteRequest = new BulkRequest();
        departmentIdList.forEach(id -> {
            deleteRequest.add(new DeleteRequest(elasticsearchConfig.getIndexName(), ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName(), String.valueOf(id)));
        });
        deleteRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
        try {
            restHighLevelClient.bulk(deleteRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            log.error("delete error", e);
        }
        return resultWrapper;
    }

    @Override
    public ResultWrapper<Boolean> save(List<ESDepartmentBo> esDepartmentBoList) {
        ResultWrapper<Boolean> resultWrapper = new ResultWrapper<Boolean>(0, "");
        if (CollectionUtils.isEmpty(esDepartmentBoList)) {
            return resultWrapper;
        }
        BulkRequest bulkRequest = new BulkRequest();
        esDepartmentBoList.forEach(esDepartmentBo -> {
            IndexRequest indexRequest = new IndexRequest(elasticsearchConfig.getIndexName(), ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName(), String.valueOf(esDepartmentBo.getId()));
            indexRequest.source(JSONObject.toJSONString(esDepartmentBo), XContentType.JSON);
            bulkRequest.add(indexRequest);
        });
        bulkRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
        try {
            restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            log.error("save error", e);
        }
        return resultWrapper;
    }

    @Override
    public ResultWrapper<Boolean> update(Map<Long, Map<String, Object>> mapMap) {
        ResultWrapper<Boolean> resultWrapper = new ResultWrapper<Boolean>(0, "");
        if (mapMap==null) {
            return resultWrapper;
        }
        BulkRequest bulkRequest = new BulkRequest();
        mapMap.forEach((deparmentId,mapJson) -> {
            UpdateRequest updateRequest = new UpdateRequest(elasticsearchConfig.getIndexName(), ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName(), String.valueOf(deparmentId));
            updateRequest.doc(mapJson);
            bulkRequest.add(updateRequest);
        });
        bulkRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
        try {
            restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            log.error("save error", e);
        }
        return null;
    }

    @Override
    public ResultWrapper<List<Long>> search(QueryDepartmentRequest queryDepartmentRequest) {
        ResultWrapper<List<Long>> resultWrapper = new ResultWrapper<List<Long>>(0, "");
        List<Long> depatementIdList = new ArrayList<>();
        resultWrapper.setData(depatementIdList);
        SearchRequest searchRequest = new SearchRequest(elasticsearchConfig.getIndexName());
        searchRequest.types(ElasticSeachTypeEnum.DEPARTMENT_TAG.getTypeName());
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
        if (StringUtils.isNotBlank(queryDepartmentRequest.getName())) {
            boolQuery.must().add(QueryBuilders.matchQuery("name", queryDepartmentRequest.getName()));
        }

        if (StringUtils.isNotBlank(queryDepartmentRequest.getProvinceCode()) && !("-1").equals(queryDepartmentRequest.getProvinceCode())) {
             boolQuery.must().add(QueryBuilders.termQuery("provinceCode", queryDepartmentRequest.getProvinceCode()));
        }
        if (StringUtils.isNotBlank(queryDepartmentRequest.getAreaCode()) && !("-1").equals(queryDepartmentRequest.getAreaCode())) {
             boolQuery.must().add(QueryBuilders.termQuery("areaCode", queryDepartmentRequest.getAreaCode()));
        }
        if (StringUtils.isNotBlank(queryDepartmentRequest.getCityCode()) && !("-1").equals(queryDepartmentRequest.getCityCode())) {
             boolQuery.must().add(QueryBuilders.termQuery("cityCode", queryDepartmentRequest.getCityCode()));
        }
        if (CollectionUtils.isNotEmpty(queryDepartmentRequest.getTagIdList())) {
             boolQuery.must().add(QueryBuilders.termsQuery("tagIdList", queryDepartmentRequest.getTagIdList()));
        }
        if (CollectionUtils.isNotEmpty(queryDepartmentRequest.getDepartmentIdList())) {
            boolQuery.must().add(QueryBuilders.termsQuery("id", queryDepartmentRequest.getDepartmentIdList()));
        }
        searchSourceBuilder.query(boolQuery);
        searchRequest.source(searchSourceBuilder);
        try {
            Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
            searchRequest.scroll(scroll);
            SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
            String scrollId = searchResponse.getScrollId();
            log.info("total:{}", searchResponse.getHits().getTotalHits());
            SearchHit[] searchHits = searchResponse.getHits().getHits();
            if (searchHits == null) {
                return resultWrapper;
            } else {
                for (SearchHit searchHit : searchHits) {
                    depatementIdList.add(Long.parseLong(searchHit.getId()));
                }
            }
            while (searchHits != null && searchHits.length > 0) {
                SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
                scrollRequest.scroll(scroll);
                searchResponse = restHighLevelClient.scroll(scrollRequest, RequestOptions.DEFAULT);
                scrollId = searchResponse.getScrollId();
                searchHits = searchResponse.getHits().getHits();
                if (searchHits != null) {
                    for (SearchHit searchHit : searchHits) {
                        depatementIdList.add(Long.parseLong(searchHit.getId()));
                    }
                }
            }
            ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
            clearScrollRequest.addScrollId(scrollId);
            ClearScrollResponse clearScrollResponse = restHighLevelClient.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
            resultWrapper.setData(depatementIdList);
        } catch (IOException e) {
            log.error("", e);
        }
        return resultWrapper;
    }
}