One of Azures missing features was SQL’s FreeText Search, it was a feature I used frequently when implementing search on my sites. I had been meaning to play around with Lucene for a while, and I guess this was now a perfect opportunity.
Lucene .net is a port of the lucene search engine library. Written in C# and targeted at .Net runtime users (http://lucenenet.apache.org)
Now before I get started I want to point out that I realise that Azure has a new Search Service, with a whole load of new features added recently, however I wrote this service a while ago and am planning on moving to Azure Search Service pretty soon and just wanted something written up on both so I can compare and decide what I will go forward using as my main search service
The search service has 2 parts, creating the documents (Lucene’s indexes to search) and searching the indexes. I’ve opted in the example for the Lucene search to run constantly (with a slight delay), picking up my database and rewriting the entire index. Another way would be creating a new row in the document every time an article or user is added, or altering a row when an article or user is altered, this is probably how my Azure Search Service will work.
So Lucene is available on nuGet, they also have a few extensions available, one of which is for Azure integration, grab Lucene.Net and Lucene.Net.Store.Azure from nuGet to get started.
I then created a new cloud service using VisualStudio’s templates.
With a new worker role.
I now have 2 new projects, one with a lovely icon of a cloud next to it.
The cloud service is really just a collection of settings and references to the roles attached.
The worker role project has a similar feel to a console app.
The Lucene search service creates 2 documents, one for my articles and one for my users.
My article document will consist of the following fields:
Any my user document:
When the service starts I have a few things to setup, I have a cancellation token for the service I am running, DB setup and Storage setup, I then get the 2 directories that I will be using (article and user)
I first create an index writer by passing in a few variables, including the Azure storage directory and a StandardAnalyzer object, which is initiated using the Lucene version constant, max length of the document and Boolean property to create the index or modify. If there is a problem with the index I remove the locks and return.
I then get all the articles from my repository and start writing the index. When I was writing this service I didn’t want to alter my repositories and just add on the service to my existing code. I will be creating new methods that only selected the fields needed for the document as this can be wasteful.
When writing the index I have a few options:
Once all the rows of the document have been written, I then optimize the index and dispose. I have placed my sleep threads in here, rather than the service as I want the service to restart immediately if there is an error and I do not want to wrap my parallel methods in try catches in the service class.
A Similar processes runs in parallel for the user index, the document is slightly different and we are getting all the active users, not articles. Otherwise it is identical.
Once both of these methods are completed the thread is paused for 1000ms on top of the 10minutes before starting the process again.
Finally on stop passes through a cancellation token and allows the last service to run through to completion, which should prevent locks being left and indexes not being optimized or disposed.
And that’s the service, it is published into azure and runs constantly, now onto the reader:
The reader is much simpler. I pass in the storage account connection string, storage container for article and user and the max results. This is then handled by my dependency mapper with variables coming from my web.config.
I then just need to search term when using the SearchArticle or SearchUser methods.
I first get the directory and open the index reader in readonly mode. I then initialize the MutiFieldQueryParser, passing in a list of field names.
I then query the results, taking the top count. These results are then returned as a list of ids, which can be used to get all the details from my repositories.
And that’s it, A Lucene Search service using Azure Cloud Service. It works well and is pretty simple to throw together.