Lucene is an excellent text search engine.
Solr a sub project of lucene provides web based interface handling XML requests and executing XML commands .
"Solr is more of a general-purpose search server, and it assumes you already have structured data (like catalog data, music collections,etc)."
Nutch again a sub project of lucene is an excellent web crawler.
"Nutch is more like an open-source google... it's for crawling, converting, indexing, and searching websites."
Assuming you have an existing J2ee application with struts and hibernate. The following major components/classes would be required if we consider to use lucene or solr:
1. Search Index writer: This will use existing hibernate methods/APIs to read the records (which need to be searched) and create lucene indeces. The index can be stored on the disk or in a database via JDBC. Lucene calls these index records as documents and will have relevant information to display in search result.
2. Search index reader: This will read the index stored(on disk or database by step 1.) and return the search result. There is no hibernate method calls involved as the lucene index is separate from database. To display the search result, the struts action classes would need to be altered/added. However on click of the search result for details of the record, the existing struts/hibernate(if it is available) functionality will be used to display the details of a record.
Basically you can think of it as a search engine implementation(like google) where you index the records(like websites) and search results will contain only the basic information. The detailed information is delivered on click of a link on search result.
Coming to Solr, this is basically a web service wrapper on top of lucene. Under the hood it also builds lucene index. The advantage of Solr being it is web service based so it is easy to sync index in a distributed environment. Also it provides caching, index syncing etc out of the box.
Gotchas and Tips:
Sorting:
Need to maintain a duplicate field which is NOT_ANALYZED.
**** sorting is case sensitive *****
sorting field value can be having a fixed max length say 20 ... this improves performance.
Indexing:
Analyzer used for indexing and searching should be the same.
StandardAnalyzer can be extended to have HTMLStripReader and ISOLatin1AccentFilter.
Links:
Syntax supported:
http://lucene.apache.org/java/1_4_3/queryparsersyntax.html
Solr + Jquery sample:
http://solrjs.solrstuff.org/test/reuters/
http://www.theserverside.com/news/thread.tss?thread_id=43617
http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene-andrest.html
http://www.ibm.com/developerworks/java/library/j-solr-update/?S_TACT=105AGX01&S_CMP=HP
Wednesday, February 4, 2009
Lucene
Posted on 7:35 AM by Unknown
Subscribe to:
Post Comments (Atom)
Popular Posts
-
Web profiling -- HTTPAnalyzer -- -- YSlow --CSS,Javascript report , time/size measurement for individual component is good. -- F...
-
Hardware/Software stack: iPhone 3Gs having iOS 5.1.1(latest as of today). The Xcdode 4.2 failed to detect this iphone as it has support on...
-
SimpleUrlHandlerMapping uses a hashMap to hold the interceptors. Ordering can only be guaranteed by setting order property.. By default it ...
-
SVN could be tricky and waste a hell lot of time. Case Issue : Normally we run SVN server in linux. Our dev env will be in windows NT. Windo...
-
Liferay out of the box has a web content management system. The web pages can mostly have these web content (articles) as web pages. The web...
-
First, you need to add a task definition to the build.xml file. This top-level taskdef element specifies that the cobertura.jar file is i...
-
The usage of cached transformer objects is recommended here A sample implementation of CachingTransformerFactory is here The above code abs...
-
Well we can control how the server serves stuff to clients by defining rewrite rules. As servers are dumb, its important to explain well abo...
-
A very nice article which details on possible options for tomcat server monitoring to tweak its performance. http://www.devx.com/Java/Articl...
-
Copied from http://myappsecurity.blogspot.com/2007/01 Like to thank anurag for the content. /breaking-same-origin-barrier-of.html External ...
Categories
- AJAX javascript
- All about UK Visa HSMP VS H1
- All about UK Visa HSMP VS H1 hsmp assistance
- amazon
- android apps ship control radio hindi
- Apache 2.x setup Quick guide for Linux
- apache commons configurator usage
- apache commons usage
- arsenals for developers
- article
- Batch script to load developer environment
- Blind folded chess
- Castor castor convert dtd to xsd
- cloud comparision price
- cloud comparison blog
- cloud computing monthly price
- cloud usage
- Cobertura- junit coverage tool
- Code generators
- cron jobs expressions
- cruisecontrol cruise control
- debugging eclipse tips
- developer tools
- document library
- easy mock jmock vs mocking java tdd
- External Javascript from Java Servlets
- fedora 9 lenovo 3000 n200 windows xp dual boot problem
- Fire fox plugins and tweaks
- Free PHP hosting
- gods debris the religion war scott adams dilbert
- google app engine
- gwt javascript
- hibernate second level cache
- i18n locale localization internationalization spring liferay portlet locale
- ibatis sybase mapping
- image gallery
- iphone apps bri8 apple
- iphone shsh 3gs ipsw downgrade ifaith tinyumbrella ios5.1.1 to ios5.0
- java
- java JDBC
- javascript junit testing
- Javascript trouble shooting tool
- Jboss overview
- jmeter load testing custom java sampler javasamplerclient xml test
- JNDI test JSP page
- Joomla CMS
- JProfiler setup jprofiler on linux
- jquery IE AJAX issues
- jquery spring AJAX
- keyboard music java typing soothing notes auto suggest
- liferay kids version
- liferay web 2.0 java/j2ee
- linux - the difference between hard and soft links
- linux mysql
- linux mysql setup quick start
- Linux ssh autologin with putty
- LINUX usefuls
- linux)
- LinuxPerformance Tuning(apache
- log4j setup useful
- lucene solr
- mac m701 android skype not working crash
- maven
- maven cut reduce build time
- microsoft ODBC oracle dll connection issues
- Mobile Ad Services (adwhirl)
- opsourcecloud
- oracle connect by hierarchy
- oracle table previlege
- Oracle thin vs OCI(type II/thick) drivers
- pdf 2 text
- Pega PRPC
- php
- rackspacecloud
- Rewrite rules in apache and IIS
- scaffold
- setting up a static ip on SKY broadband
- setup quick start
- Single Sign on - OpenSSO with Liferay
- smart gwt
- smtp email test mock server james
- SMTP MAIL telnet windows
- Software tools mind map freemind j2ee tools
- Speed typing tips.
- Spring + Hibernate Usefuls BaseDAOHibernate
- Spring annotations with spring-mock not working 2.0 2.5.6
- spring jndi datasource lookup
- Spring portlet mvc and spring servlet mvc validation
- Spyware trojan and virus removal tools
- struts magic
- SVN/ Subversion Tips and traps
- tabbed ms dos console cygwin console
- Texter - An auto text expander autotyper
- Textpad tricks
- The art of debugging
- tomcat
- Tomcat Exploded war - cut deployment time
- Tomcat on linux tips
- Tomcat on linux tips commands
- TypeIt4Me
- Typinator
- Useful Eclipse Plugins
- Useful Eclipse Plugins eclipse shortcuts keys
- Useful tools
- web content
- xcode cocos2d iphone box2d
- xpath xml xquery
Powered by Blogger.
0 comments:
Post a Comment