Monday, January 12, 2015

Interesting Post About Query Parser

  • Paper:
The Magic and Wonder of Query Parsing – Search Technologies

More info:Sometimes it is necessary to apply a custom filter to an existing SQL query (to perform search … query, you should first create … parser will work faster if we …

URL: http://www.searchtechnologies.com/search-query-parsing

  • Tool:
OpenNLP

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.

  • My understandings:
1. Different users speak different query languages; different kinds of search engines should have different documents or regulations.

2. Not only a string of simple tokens, how can we understand some operations or special signs, such as "+/-", "NOT", "OR", and so on?

3. How can you set up the relationships between "chair" and "chairs", between "mouse" and "mice", and so on?

4. Spell checker should put before query parser, which can remove some dirty data.

5. Relevancy ranking: Documents which contain search terms in the title or abstract may be considered to be more relevant; Query parsers can boost documents which contain all of the terms close together (proximity weighting) or boost documents from friendly web sites while reducing documents from un-friendly sites.

No comments:

Post a Comment