Apache Solr 是一个开源的搜索服务器,Solr 使用 Java 语言开发,主要基于 HTTP 和 Apache Lucene 实现。 Apache Lucene 是一个高效的、基于 Java 的全文检索库。
URL 为:http://139.198.13.12:7000/solr/admin.html。请注意:Solr5.5 的,一定要加 admin.html,如果不加的话,则按回车后将返回 404(表示找不到页面)。
4.2.1、安装 Solr 服务:安装的版本号是 5.5.4。
4.2.2、建立 Core
要使用 Solr,需要建立类似于数据库实例的 Core。每个 Core 对应一个文件夹,此文件夹建立在 Solr Home 路径下,且其名字要和 Core 的名字一致:
4.2.3、配置 Core
以 Demo 中使用于 Solr 服务器上的 PolicyCore 为例,修改以下 3 个配置文件:
solrconfig.xml、managed-schema 是从位于【{Solr Home 路径}/configsets/basic_configs/conf】路径下的同名配置文件拷贝而来,而 data-config.xml 来自:对 Solr 服务端安装文件 solr-5.5.4.tgz 解压后,得到 solr-5.5.4 的文件夹名,然后把位于【solr-5.5.4/example/example-DIH/solr/db/conf】路径下的 db-data-config.xml 文件拷贝到【{Solr Home 路径}/configsets/basic_configs/conf】路径下,并重命名为 data-config.xml。
在 solrconfig.xml 配置文件中增加如下内容:
- <lib dir="../contrib/extraction/lib" regex=".*\.jar" />
- <lib dir="../dist/" regex="solr-cell-\d.*\.jar" />
- <lib dir="../contrib/clustering/lib/" regex=".*\.jar" />
- <lib dir="../dist/" regex="solr-clustering-\d.*\.jar" />
- <lib dir="../contrib/langid/lib/" regex=".*\.jar" />
- <lib dir="../dist/" regex="solr-langid-\d.*\.jar" />
- <lib dir="../contrib/velocity/lib" regex=".*\.jar" />
- <lib dir="../dist/" regex="solr-velocity-\d.*\.jar" />
- <lib dir="../dist/" regex="solr-dataimporthandler-\d.*\.jar" />
以上内容加在【5.5.4】节点之后、【${solr.data.dir:}】节点之前。
- <requestHandler name="/dataimport" class="solr.DataImportHandler">
- <lst name="defaults">
- <str name="config">data-config.xml</str>
- </lst>
- </requestHandler>
以上内容加的位置请见如下图所示:
对 managed-schema 文件进行修改:以下内容加在节点内:
- <fieldType name="textPolicy_ik" class="solr.TextField">
- <analyzer type="index" useSmart="false" class="org.wltea.analyzer.lucene.IKAnalyzer" />
- <analyzer type="query" useSmart="true" class="org.wltea.analyzer.lucene.IKAnalyzer" />
- </fieldType>
注释掉以下配置:
- <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
然后在其下增加如下配置:
- <field name="PolicyID" type="string" indexed="true" stored="true" required="true" multiValued="false" />
- <field name="PolicyGroupID" type="long" indexed="true" stored="true" />
- <field name="PolicyOperatorID" type="long" indexed="true" stored="true" />
- <field name="PolicyOperatorName" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" />
- <field name="PolicyCode" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" />
- <field name="PolicyName" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" />
- <field name="PolicyType" type="string" indexed="true" stored="true" />
- <field name="TicketType" type="int" indexed="true" stored="true" />
- <field name="FlightType" type="int" indexed="true" stored="true" />
- <field name="DepartureDate" type="tdate" indexed="true" stored="true" default="NOW+8HOUR" />
- <field name="ArrivalDate" type="tdate" indexed="true" stored="true" default="NOW+8HOUR" />
- <field name="ReturnDepartureDate" type="tdate" indexed="true" stored="true" default="NOW+8HOUR" />
- <field name="ReturnArrivalDate" type="tdate" indexed="true" stored="true" default="NOW+8HOUR" />
- <field name="DepartureCityCodes" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" />
- <field name="TransitCityCodes" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" />
- <field name="ArrivalCityCodes" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" />
- <field name="OutTicketType" type="int" indexed="true" stored="true" />
- <field name="OutTicketStart" type="tdate" indexed="true" stored="true" default="NOW+8HOUR" />
- <field name="OutTicketEnd" type="tdate" indexed="true" stored="true" default="NOW+8HOUR" />
- <field name="OutTicketPreDays" type="int" indexed="true" stored="true" />
- <field name="Remark" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" />
- <field name="Status" type="int" indexed="true" stored="true" />
- <field name="SolrUpdatedTime" type="tdate" indexed="true" stored="true" default="NOW+8HOUR" />
- <uniqueKey>PolicyID</uniqueKey>
属性说明:
对 data-config.xml 文件进行修改:先注释掉默认有的 dataConfig,然后在被注释内容的后面增加如下配置内容:
- <dataConfig>
- <dataSource driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://{SQLServer 服务器 IP 地址}:{端口号,如果端口号是默认的 1433,则可不写};DatabaseName=SolrDB" user="sa" password="{登录 SQL Server 的密码}"/>
- <document name="Info">
- <entity name="Policy" dataSource="SolrDB" transformer="ClobTransformer" pk="PolicyID"
- query="SELECT [PolicyID], [PolicyGroupID], [PolicyOperatorID], [PolicyOperatorName], [PolicyCode], [PolicyName], [PolicyType], [TicketType], [FlightType], DATEADD(HOUR, 8, CAST([DepartureDate] AS DATETIME)) [DepartureDate], DATEADD(HOUR, 8, CAST([ArrivalDate] AS DATETIME)) [ArrivalDate], DATEADD(HOUR, 8, CAST([ReturnDepartureDate] AS DATETIME)) [ReturnDepartureDate], DATEADD(HOUR, 8, CAST([ReturnArrivalDate] AS DATETIME)) [ReturnArrivalDate], [DepartureCityCodes], [TransitCityCodes], [ArrivalCityCodes], [OutTicketType], [OutTicketStart], [OutTicketEnd], [OutTicketPreDays], [Remark], [Status], DATEADD(HOUR, 8, CAST([SolrUpdatedTime] AS DATETIME)) [SolrUpdatedTime] FROM [Policy]"
- deltaImportQuery="SELECT [PolicyID], [PolicyGroupID], [PolicyOperatorID], [PolicyOperatorName], [PolicyCode], [PolicyName], [PolicyType], [TicketType], [FlightType], DATEADD(HOUR, 8, CAST([DepartureDate] AS DATETIME)) [DepartureDate], DATEADD(HOUR, 8, CAST([ArrivalDate] AS DATETIME)) [ArrivalDate], DATEADD(HOUR, 8, CAST([ReturnDepartureDate] AS DATETIME)) [ReturnDepartureDate], DATEADD(HOUR, 8, CAST([ReturnArrivalDate] AS DATETIME)) [ReturnArrivalDate], [DepartureCityCodes], [TransitCityCodes], [ArrivalCityCodes], [OutTicketType], [OutTicketStart], [OutTicketEnd], [OutTicketPreDays], [Remark], [Status], DATEADD(HOUR, 8, CAST([SolrUpdatedTime] AS DATETIME)) [SolrUpdatedTime] FROM [Policy] WHERE PolicyID = '${dataimporter.delta.PolicyID}'"
- deltaQuery="SELECT [PolicyID] FROM [Policy] WHERE [SolrUpdatedTime] > '${dataimporter.last_index_time}'">
- <field column="PolicyID" name="PolicyID"/>
- <field column="PolicyGroupID" name="PolicyGroupID"/>
- <field column="PolicyOperatorID" name="PolicyOperatorID"/>
- <field column="PolicyOperatorName" name="PolicyOperatorName"/>
- <field column="PolicyCode" name="PolicyCode"/>
- <field column="PolicyName" name="PolicyName"/>
- <field column="PolicyType" name="PolicyType"/>
- <field column="TicketType" name="TicketType"/>
- <field column="FlightType" name="FlightType"/>
- <field column="DepartureDate" name="DepartureDate"/>
- <field column="ArrivalDate" name="ArrivalDate"/>
- <field column="ReturnDepartureDate" name="ReturnDepartureDate"/>
- <field column="ReturnArrivalDate" name="ReturnArrivalDate"/>
- <field column="DepartureCityCodes" name="DepartureCityCodes"/>
- <field column="TransitCityCodes" name="TransitCityCodes"/>
- <field column="ArrivalCityCodes" name="ArrivalCityCodes"/>
- <field column="OutTicketType" name="OutTicketType"/>
- <field column="OutTicketStart" name="OutTicketStart"/>
- <field column="OutTicketEnd" name="OutTicketEnd"/>
- <field column="OutTicketPreDays" name="OutTicketPreDays"/>
- <field column="Remark" name="Remark"/>
- <field column="Status" name="Status"/>
- <field column="SolrUpdatedTime" name="SolrUpdatedTime"/>
- </entity>
- </document>
- </dataConfig>
属性说明:
- USE [SolrDB]
- GO
- CREATE TRIGGER [dbo].[TR_Solr_UPDATE_Policy] ON [dbo].[Policy]
- FOR UPDATE, INSERT
- AS
- BEGIN
- IF UPDATE(PolicyID)
- OR UPDATE(PolicyGroupID)
- OR UPDATE(PolicyOperatorID)
- OR UPDATE(PolicyOperatorName)
- OR UPDATE(PolicyCode)
- OR UPDATE(PolicyName)
- OR UPDATE(PolicyType)
- OR UPDATE(TicketType)
- OR UPDATE(FlightType)
- OR UPDATE(DepartureDate) OR UPDATE(ArrivalDate)
- OR UPDATE(ReturnDepartureDate) OR UPDATE(ReturnArrivalDate)
- OR UPDATE(DepartureCityCodes)
- OR UPDATE(TransitCityCodes)
- OR UPDATE(ArrivalCityCodes)
- OR UPDATE(OutTicketType)
- OR UPDATE(OutTicketStart) OR UPDATE(OutTicketEnd)
- OR UPDATE(OutTicketPreDays)
- OR UPDATE(Remark)
- OR UPDATE(Status)
- BEGIN
- UPDATE dbo.Policy
- SET SolrUpdatedTime = GETDATE()
- FROM dbo.Policy p, inserted i
- WHERE p.PolicyID = i.PolicyID
- END
- END
- GO
SolrNet 是 Solr 的开源. NET 客户端之一。
Solr 自身提供有定时增量导入功能,但经测试 apache-solr-dataimportscheduler1.0 版本在 Solr5.5 上已经不能使用,除非修改 apache-solr-dataimportscheduler 的源码。于是,我们采用了如下方式:
首先,开发 Job 任务调度 RESTful 服务,这种方式不仅可以实现定时增量数据导入,也能够实现定时全量数据导入。
然后,在自主研发的【Job 集中式管理平台】中把相关内容都配置好,如下图所示。
这样,我们的 JobServer 就会定时地以 HTTP GET 或 HTTP POST 或 HTTP HEAD 方式请求全量 / 增量导入链接,从而实现了定时全量、增量数据导入功能。另外,如果你想要知道如何利用 SolrNet 实现全量导入、增量导入,请分别参考 Demo 代码中的 FullDataImport() 和 DeltaDataImport() 这两个示例。
用 SolrNet 的 CURD API 实现,示例请见 Demo 的 Add()、Delete() 和 Query()。准实时数据导入较定时增量数据导入更近于实时,在实际应用中如通过消息队列对数据库和 Solr 同时更新,则更好。
本系列文章涉及内容清单如下,其中有感兴趣的,欢迎关注:
来源: http://www.infoq.com/cn/articles/architecture-practice-07-solr