昨晚,我已经将 Hibernate Search 版本 4.4.0.CR1(4.4 的候选发布版)上传到 Sourceforce 和 Maven 仓库。
对于 Maven 用户,请注意依赖关系
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-search</artifactId>
<version>4.4.0.CR1</version>
</dependency>
此次发布主要解决了非常小的问题,预计下周将发布 4.4.0.Final 版本,可能不会有更多代码更改,只是对文档进行了一些改进。有关本次发布所有更改的详细列表,请参阅 发行说明。
关于新动态分片支持的补充
在 之前的文章 中宣布的新动态分片功能得到了一些改进,特别是我们现在提供了一个基类,可以扩展以处理最常见的日常维护。您可以在 文档中包含的示例实现 中看到效果:为您减少了样板代码。当然,您也可以忽略这个基类,自己定制实现,只要实现 org.hibernate.search.store.ShardIdentifierProvider 即可。
我们需要您的分片故事!
我们已经在IRC和邮件列表上对在接口中添加一个新方法进行了大量的头脑风暴。这个方法将允许策略实现者缩小到需要发送删除操作的哪个分片:目前,删除操作被发送到所有分片,这在功能上是正确的,但不是最优的。讨论的罪魁祸首是你在真实世界场景中实际上能否做出这样的决定。我相信这是可能的,因此这个方法肯定会很快被添加;遗憾的是,它还没有被包括在内,因为它的具体形状正在匆忙中。如果你有关于如何使用动态分片的计划,告诉我们将非常棒,这样我们就可以在更具体的用例上讨论这个特性,从而改进这个特性。
在时间上,结果是 org.hibernate.search.store.ShardIdentifierProvider 在这个发布周期中被标记为实验性的,但不用担心,这个特性非常好,并打算保留。
示例:使用新API进行时间分割分片,结合高级过滤
在下面的测试中,你会看到这个博客非常不寻常的:我没有使用Hibernate API,也没有使用你可能习惯的Hibernate Search公共API。这是我们如何在hibernate-search-engine模块中运行单元测试的方式:与其他服务隔离。
示例场景:一个系统,该系统索引每秒记录的某个周期性事件的日志消息。按照设计,每个秒只期望存储一条消息。想法是以小时为基础进行分片,日志被设置为轮换,这样超过24小时的日志就会被删除。
这个用例受益于高级的 org.hibernate.search.store.ShardIdentifierProvider,这样在给定的小时内,所有写入都发生在特定的分片 最新
上。如果我们向搜索中添加一个额外的控制删除操作的方法,我们也会将删除操作限制在特定的分片 最旧
上。这种方法提供了几个好处
- 每个22个不可变索引上的全文过滤器实例都是完全可缓存的
- 这些22个索引上的索引读取器实例永远不会需要刷新
- 时间范围查询可以轻松定位它们需要的子集索引
- 添加操作在单独的后端发生,这提供了其他几个性能好处,例如,NRT后端可以在不需要刷新的情况下继续写入。(刷新的需求通常由删除操作触发)。
这个测试故意没有使用Hibernate ORM API,因为这个存储日志消息的用例可能更适合Infinispan的用户(提醒:同样的索引技术包含在Infinispan中)。因此,一些操作如store/delete/query使用内部API:当使用Hibernate Search与Hibernate ORM时,这些方法实现会简单得多,但从概念上讲,你必须激活相同的过滤选项。
public class LogRotationExampleTest {
//Make a SearchFactory using the test entity "LogMessage" and enabling the custom sharding strategy.
@Rule
public SearchFactoryHolder sfHolder = new SearchFactoryHolder( LogMessage.class )
.withProperty( "hibernate.search.logs.sharding_strategy", LogMessageShardingStrategy.class.getName() );
@Test
public void filtersTest() {
SearchFactory searchFactory = sfHolder.getSearchFactory();
storeLog( makeTimestamp( 2013, 10, 7, 21, 33 ), "implementing method makeTimestamp" );
storeLog( makeTimestamp( 2013, 10, 7, 21, 35 ), "implementing method storeLog" );
storeLog( makeTimestamp( 2013, 10, 7, 15, 15 ), "Infinispan team meeting" );
storeLog( makeTimestamp( 2013, 10, 7, 7, 30 ), "reading another bit from Mordechai Ben-Ari" );
storeLog( makeTimestamp( 2013, 10, 7, 9, 00 ), "email nightmare begins" );
storeLog( makeTimestamp( 2013, 10, 7, 9, 50 ), "sync-up with Davide" );
storeLog( makeTimestamp( 2013, 10, 7, 10, 0 ), "first cofee. At Costa!" );
storeLog( makeTimestamp( 2013, 10, 7, 10, 10 ), "sync-up with Gunnar and Hardy" );
storeLog( makeTimestamp( 2013, 10, 7, 10, 20 ), "Checking JIRA state for Hibernate Search release plans" );
storeLog( makeTimestamp( 2013, 10, 7, 10, 30 ), "Check my Infinispan pull requests from the weekend, cleanup git branches" );
storeLog( makeTimestamp( 2013, 10, 7, 22, 00 ), "Implementing LogMessageShardingStrategy" );
QueryBuilder logsQueryBuilder = searchFactory.buildQueryBuilder()
.forEntity( LogMessage.class )
.get();
Query allLogs = logsQueryBuilder.all().createQuery();
Assert.assertEquals( 11, queryAndFilter( allLogs, 0, 24 ) );
Assert.assertEquals( 0, queryAndFilter( allLogs, 2, 5 ) );
Assert.assertEquals( 1, queryAndFilter( allLogs, 2, 8 ) );
Assert.assertEquals( 3, queryAndFilter( allLogs, 0, 10 ) );
deleteLog( makeTimestamp( 2013, 10, 7, 9, 00 ) );
Assert.assertEquals( 10, queryAndFilter( allLogs, 0, 24 ) );
}
private int queryAndFilter(Query luceneQuery, int fromHour, int toHour) {
SearchFactoryImplementor searchFactory = sfHolder.getSearchFactory();
//In this specific test se use the internal HSQuery API, while
//you would normally use a FullTextSession:
HSQuery hsQuery = searchFactory.createHSQuery()
.luceneQuery( luceneQuery )
.targetedEntities( Arrays.asList( new Class<?>[]{ LogMessage.class } ) );
hsQuery
.enableFullTextFilter( "timeRange" )
.setParameter( "from", Integer.valueOf( fromHour ) )
.setParameter( "to", Integer.valueOf( toHour ) )
;
return hsQuery.queryResultSize();
}
private void storeLog(long timestamp, String message) {
LogMessage log = new LogMessage();
log.timestamp = timestamp;
log.message = message;
//You would normally just save the LogMessage through a Session / EntityManager
//this emulates the same using the internal API:
SearchFactoryImplementor searchFactory = sfHolder.getSearchFactory();
Work work = new Work( log, log.timestamp, WorkType.ADD, false );
ManualTransactionContext tc = new ManualTransactionContext();
searchFactory.getWorker().performWork( work, tc );
tc.end();
}
private void deleteLog(long timestamp) {
//Again just emulating a delete operation with the internal API,
//don't worry too much about these details:
SearchFactoryImplementor searchFactory = sfHolder.getSearchFactory();
Work work = new Work( LogMessage.class, log.timestamp, WorkType.DELETE );
ManualTransactionContext tc = new ManualTransactionContext();
searchFactory.getWorker().performWork( work, tc );
tc.end();
}
/**
* A ShardIdentifierProvider suitable for the rotating - logs design
* as described in this test.
* Sharding isn't actually dynamic as we know all hours in advance, but
* both addition operations target a specific index, and a range
* filter can make queries need to search only a subset of all indexes.
*/
public static final class LogMessageShardingStrategy implements ShardIdentifierProvider {
private Set<String> hoursOfDay;
@Override
public void initialize(Properties properties, BuildContext buildContext) {
Set<String> hours = new HashSet<String>( 24 );
for ( int hour = 0; hour < 24; hour++ ) {
hours.add( String.valueOf( hour ) );
}
hoursOfDay = Collections.unmodifiableSet( hours );
}
@Override
public String getShardIdentifier(Class<?> entityType, Serializable id, String idAsString, Document document) {
return fromIdToHour( (Long) id );
}
@Override
public Set<String> getShardIdentifiersForQuery(FullTextFilterImplementor[] fullTextFilters) {
for ( FullTextFilterImplementor ftf : fullTextFilters ) {
if ( "timeRange".equals( ftf.getName() ) ) {
Integer from = (Integer) ftf.getParameter( "from" );
Integer to = (Integer) ftf.getParameter( "to" );
Set<String> hours = new HashSet<String>();
for ( int hour = from; hour < to; hour++ ) {
hours.add( String.valueOf( hour ) );
}
return Collections.unmodifiableSet( hours );
}
}
return hoursOfDay;
}
@Override
public Set<String> getAllShardIdentifiers() {
return hoursOfDay;
}
}
@Indexed( index = "logs" )
//ShardSensitiveOnlyFilter is a special marker filter which just serves
//as a transport to provide the filter parameters to ShardIdentifierProvider
//See details on https://docs.jboss.net.cn/hibernate/search/4.4/reference/en-US/html_single/#query-filter-shard
@FullTextFilterDef( name = "timeRange", impl = ShardSensitiveOnlyFilter.class )
public static final class LogMessage {
private long timestamp;
private String message;
@DocumentId
public long getId() { return timestamp; }
public void setId(long id) { this.timestamp = id; }
@Field
public String getMessage() { return message; }
public void setMessage(String message) { this.message = message; }
}
/**
* @return a timestamp from the calendar-style encoding using GMT as timezone (precision to the minute)
*/
private static long makeTimestamp(int year, int month, int date, int hourOfDay, int minute) {
Calendar gmtCalendar = createGMTCalendar();
gmtCalendar.set( year, month, date, hourOfDay, minute );
gmtCalendar.set( Calendar.SECOND, 0 );
gmtCalendar.set( Calendar.MILLISECOND, 0 );
return gmtCalendar.getTimeInMillis();
}
/**
* @return the hour of the day from a timetamp, in string format matching the index shard identifiers format
*/
private static String fromIdToHour(long millis) {
Calendar gmtCalendar = createGMTCalendar();
gmtCalendar.setTimeInMillis( millis );
return String.valueOf( gmtCalendar.get( Calendar.HOUR_OF_DAY ) );
}
/**
* @return a new GMT Calendar
*/
private static Calendar createGMTCalendar() {
return Calendar.getInstance( TimeZone.getTimeZone( "GMT" ) );
}
}
完整的测试源代码可以在 我们的测试套件中找到。我开发了这个示例,目的是说明控制删除操作的分片有多么有益;我认为这是一个很好的示例,但我很乐意听到你如何使用这个。