Sunday 22 July 2012

Fulltext Search in SQL Server 2012


Fulltext Search codebase has been significantly revamped to address both query performance and throughput on large scale (millions of documents) with concurrent updates. With SQL2008 they moved all the index storage in the database file and majority of population logic in to the core engine to make fulltext search an integral and fully manageable engine component. However there was work to be done to make it perform and scale against best of the fulltext engines out there in the industry. With Denali CTP1, they are pleased to deliver this improvement.

they looked at the entire code base from how queries block while waiting an ongoing index update to release a shared schema lock, from how much memory is allocated during index fragment population, to how they could reorganize the query code base as a streaming Table Value Function to optimize for TOP N search queries, how they could maintain key distribution histograms to execute search on parallel threads, all the way to how they could take better advantage of the processor compute instructions (scoring ranks for example)… End result is that they are able to significantly boost performance (10X in many cases when it comes to concurrent index updates with large query workloads) and scale without having to change any storage structures or existing API surface. All our customers going from SQL 2008 / R2 to Denali will benefit with this improvement.

Besides performance and scale improvement, they also added support for property scoped searches over documents with file system properties stored with-in a fulltext enabled table. One can now issue a CONTAINS query looking for all documents containing a particular term and authored by a particular author without having to maintain a separate column for the Author name in the database.

they also improved NEAR operator in the CONTAINS predicate to allow users to specify distance between two terms and if the order of the term matters. It is important to note that distances between two words in a single sentence are much smaller compared to the same across two sentences (even if words are placed next to each other with a period in between), or across paragraphs, or across bullet points or across spreadsheet columns or worksheets.


No comments:

Post a Comment