Hibernate Search 是一个库,它通过自动索引实体,将 Hibernate ORM 与 Apache Lucene 或 Elasticsearch 集成,从而实现高级搜索功能:全文搜索、地理空间搜索、聚合等。更多信息,请参阅 hibernate.org 上的 Hibernate Search。

昨晚,我已经将 Hibernate Search 版本 4.4.0.CR1(4.4 的候选发布版)上传到 Sourceforce 和 Maven 仓库。

对于 Maven 用户,请注意依赖关系

<dependency>
 <groupId>org.hibernate</groupId>
 <artifactId>hibernate-search</artifactId>
 <version>4.4.0.CR1</version>
</dependency>

此次发布主要解决了非常小的问题,预计下周将发布 4.4.0.Final 版本,可能不会有更多代码更改,只是对文档进行了一些改进。有关本次发布所有更改的详细列表,请参阅 发行说明

关于新动态分片支持的补充

之前的文章 中宣布的新动态分片功能得到了一些改进,特别是我们现在提供了一个基类,可以扩展以处理最常见的日常维护。您可以在 文档中包含的示例实现 中看到效果:为您减少了样板代码。当然,您也可以忽略这个基类,自己定制实现,只要实现 org.hibernate.search.store.ShardIdentifierProvider 即可。

我们需要您的分片故事!

我们已经在IRC和邮件列表上对在接口中添加一个新方法进行了大量的头脑风暴。这个方法将允许策略实现者缩小到需要发送删除操作的哪个分片:目前,删除操作被发送到所有分片,这在功能上是正确的,但不是最优的。讨论的罪魁祸首是你在真实世界场景中实际上能否做出这样的决定。我相信这是可能的,因此这个方法肯定会很快被添加;遗憾的是,它还没有被包括在内,因为它的具体形状正在匆忙中。如果你有关于如何使用动态分片的计划,告诉我们将非常棒,这样我们就可以在更具体的用例上讨论这个特性,从而改进这个特性。

在时间上,结果是 org.hibernate.search.store.ShardIdentifierProvider 在这个发布周期中被标记为实验性的,但不用担心,这个特性非常好,并打算保留。

示例:使用新API进行时间分割分片,结合高级过滤

在下面的测试中,你会看到这个博客非常不寻常的:我没有使用Hibernate API,也没有使用你可能习惯的Hibernate Search公共API。这是我们如何在hibernate-search-engine模块中运行单元测试的方式:与其他服务隔离。

示例场景:一个系统,该系统索引每秒记录的某个周期性事件的日志消息。按照设计,每个秒只期望存储一条消息。想法是以小时为基础进行分片,日志被设置为轮换,这样超过24小时的日志就会被删除。

这个用例受益于高级的 org.hibernate.search.store.ShardIdentifierProvider,这样在给定的小时内,所有写入都发生在特定的分片 最新 上。如果我们向搜索中添加一个额外的控制删除操作的方法,我们也会将删除操作限制在特定的分片 最旧 上。这种方法提供了几个好处

  • 每个22个不可变索引上的全文过滤器实例都是完全可缓存的
  • 这些22个索引上的索引读取器实例永远不会需要刷新
  • 时间范围查询可以轻松定位它们需要的子集索引
  • 添加操作在单独的后端发生,这提供了其他几个性能好处,例如,NRT后端可以在不需要刷新的情况下继续写入。(刷新的需求通常由删除操作触发)。

这个测试故意没有使用Hibernate ORM API,因为这个存储日志消息的用例可能更适合Infinispan的用户(提醒:同样的索引技术包含在Infinispan中)。因此,一些操作如store/delete/query使用内部API:当使用Hibernate Search与Hibernate ORM时,这些方法实现会简单得多,但从概念上讲,你必须激活相同的过滤选项。

public class LogRotationExampleTest {

        //Make a SearchFactory using the test entity "LogMessage" and enabling the custom sharding strategy.
        @Rule
        public SearchFactoryHolder sfHolder = new SearchFactoryHolder( LogMessage.class )
                .withProperty( "hibernate.search.logs.sharding_strategy", LogMessageShardingStrategy.class.getName() );

        @Test
        public void filtersTest() {
                SearchFactory searchFactory = sfHolder.getSearchFactory();
                storeLog( makeTimestamp( 2013, 10, 7, 21, 33 ), "implementing method makeTimestamp" );
                storeLog( makeTimestamp( 2013, 10, 7, 21, 35 ), "implementing method storeLog" );
                storeLog( makeTimestamp( 2013, 10, 7, 15, 15 ), "Infinispan team meeting" );
                storeLog( makeTimestamp( 2013, 10, 7, 7, 30 ), "reading another bit from Mordechai Ben-Ari" );
                storeLog( makeTimestamp( 2013, 10, 7, 9, 00 ), "email nightmare begins" );
                storeLog( makeTimestamp( 2013, 10, 7, 9, 50 ), "sync-up with Davide" );
                storeLog( makeTimestamp( 2013, 10, 7, 10, 0 ), "first cofee. At Costa!" );
                storeLog( makeTimestamp( 2013, 10, 7, 10, 10 ), "sync-up with Gunnar and Hardy" );
                storeLog( makeTimestamp( 2013, 10, 7, 10, 20 ), "Checking JIRA state for Hibernate Search release plans" );
                storeLog( makeTimestamp( 2013, 10, 7, 10, 30 ), "Check my Infinispan pull requests from the weekend, cleanup git branches" );
                storeLog( makeTimestamp( 2013, 10, 7, 22, 00 ), "Implementing LogMessageShardingStrategy" );

                QueryBuilder logsQueryBuilder = searchFactory.buildQueryBuilder()
                                .forEntity( LogMessage.class )
                                .get();

                Query allLogs = logsQueryBuilder.all().createQuery();

                Assert.assertEquals( 11, queryAndFilter( allLogs, 0, 24 ) );
                Assert.assertEquals( 0, queryAndFilter( allLogs, 2, 5 ) );
                Assert.assertEquals( 1, queryAndFilter( allLogs, 2, 8 ) );
                Assert.assertEquals( 3, queryAndFilter( allLogs, 0, 10 ) );

                deleteLog( makeTimestamp( 2013, 10, 7, 9, 00 ) );
                Assert.assertEquals( 10, queryAndFilter( allLogs, 0, 24 ) );
        }

        private int queryAndFilter(Query luceneQuery, int fromHour, int toHour) {
                SearchFactoryImplementor searchFactory = sfHolder.getSearchFactory();

                //In this specific test se use the internal HSQuery API, while
                //you would normally use a FullTextSession:
                HSQuery hsQuery = searchFactory.createHSQuery()
                        .luceneQuery( luceneQuery )
                        .targetedEntities( Arrays.asList( new Class<?>[]{ LogMessage.class } ) );
                hsQuery
                        .enableFullTextFilter( "timeRange" )
                                .setParameter( "from", Integer.valueOf( fromHour ) )
                                .setParameter( "to", Integer.valueOf( toHour ) )
                        ;
                return hsQuery.queryResultSize();
        }

        private void storeLog(long timestamp, String message) {
                LogMessage log = new LogMessage();
                log.timestamp = timestamp;
                log.message = message;

                //You would normally just save the LogMessage through a Session / EntityManager
                //this emulates the same using the internal API:
                SearchFactoryImplementor searchFactory = sfHolder.getSearchFactory();
                Work work = new Work( log, log.timestamp, WorkType.ADD, false );
                ManualTransactionContext tc = new ManualTransactionContext();
                searchFactory.getWorker().performWork( work, tc );
                tc.end();
        }

        private void deleteLog(long timestamp) {
                //Again just emulating a delete operation with the internal API,
                //don't worry too much about these details:
                SearchFactoryImplementor searchFactory = sfHolder.getSearchFactory();
                Work work = new Work( LogMessage.class, log.timestamp, WorkType.DELETE );
                ManualTransactionContext tc = new ManualTransactionContext();
                searchFactory.getWorker().performWork( work, tc );
                tc.end();
        }

        /**
         * A ShardIdentifierProvider suitable for the rotating - logs design
         * as described in this test.
         * Sharding isn't actually dynamic as we know all hours in advance, but
         * both addition operations target a specific index, and a range
         * filter can make queries need to search only a subset of all indexes.
         */
        public static final class LogMessageShardingStrategy implements ShardIdentifierProvider {

                private Set<String> hoursOfDay;

                @Override
                public void initialize(Properties properties, BuildContext buildContext) {
                        Set<String> hours = new HashSet<String>( 24 );
                        for ( int hour = 0; hour < 24; hour++ ) {
                                hours.add( String.valueOf( hour ) );
                        }
                        hoursOfDay = Collections.unmodifiableSet( hours );
                }

                @Override
                public String getShardIdentifier(Class<?> entityType, Serializable id, String idAsString, Document document) {
                        return fromIdToHour( (Long) id );
                }

                @Override
                public Set<String> getShardIdentifiersForQuery(FullTextFilterImplementor[] fullTextFilters) {
                        for ( FullTextFilterImplementor ftf : fullTextFilters ) {
                                if ( "timeRange".equals( ftf.getName() ) ) {
                                        Integer from = (Integer) ftf.getParameter( "from" );
                                        Integer to = (Integer) ftf.getParameter( "to" );
                                        Set<String> hours = new HashSet<String>();
                                        for ( int hour = from; hour < to; hour++ ) {
                                                hours.add( String.valueOf( hour ) );
                                        }
                                        return Collections.unmodifiableSet( hours );
                                }
                        }
                        return hoursOfDay;
                }

                @Override
                public Set<String> getAllShardIdentifiers() {
                        return hoursOfDay;
                }

        }

        @Indexed( index = "logs" )
        //ShardSensitiveOnlyFilter is a special marker filter which just serves
        //as a transport to provide the filter parameters to ShardIdentifierProvider
        //See details on https://docs.jboss.net.cn/hibernate/search/4.4/reference/en-US/html_single/#query-filter-shard
        @FullTextFilterDef( name = "timeRange", impl = ShardSensitiveOnlyFilter.class )
        public static final class LogMessage {

                private long timestamp;
                private String message;

                @DocumentId
                public long getId() { return timestamp; }

                public void setId(long id) { this.timestamp = id; }

                @Field
                public String getMessage() { return message; }

                public void setMessage(String message) { this.message = message; }
        }

        /**
         * @return a timestamp from the calendar-style encoding using GMT as timezone (precision to the minute)
         */
        private static long makeTimestamp(int year, int month, int date, int hourOfDay, int minute) {
                Calendar gmtCalendar = createGMTCalendar();
                gmtCalendar.set( year, month, date, hourOfDay, minute );
                gmtCalendar.set( Calendar.SECOND, 0 );
                gmtCalendar.set( Calendar.MILLISECOND, 0 );
                return gmtCalendar.getTimeInMillis();
        }

        /**
         * @return the hour of the day from a timetamp, in string format matching the index shard identifiers format
         */
        private static String fromIdToHour(long millis) {
                Calendar gmtCalendar = createGMTCalendar();
                gmtCalendar.setTimeInMillis( millis );
                return String.valueOf( gmtCalendar.get( Calendar.HOUR_OF_DAY ) );
        }

        /**
         * @return a new GMT Calendar
         */
        private static Calendar createGMTCalendar() {
                return Calendar.getInstance( TimeZone.getTimeZone( "GMT" ) );
        }

}

完整的测试源代码可以在 我们的测试套件中找到。我开发了这个示例,目的是说明控制删除操作的分片有多么有益;我认为这是一个很好的示例,但我很乐意听到你如何使用这个。

下一步去哪里

对于开发建议和对最新功能的头脑风暴,请加入 开发者邮件列表 或在 论坛 上给我们写。

问题跟踪器是JIRA,所有代码都在GitHub:欢迎提出拉取请求和反馈。


回到顶部