Text queries

Simple Text Searches

A query is broken up into terms and operators. There are two types of terms: Single Terms and Phrases. Single Term is a single word such as "test" or "hello". A Phrase is a group of words surrounded by double quotes such as "hello dolly".

Wildcard Searches

Wildcard searches allow you to search on partial terms. The wildcard character can appear in the middle or end of a term but not as the first character. You can perform single or multiple character wildcard searches. To perform a single character wildcard search use the "?" symbol. Single character wildcard searches look for terms with the single character replaced. For example, to search for "text" or "test" you can use the search te?t To perform a multiple character wildcard search use the "*" symbol.

Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search: test*

Note: When text is indexed it is converted to lowercase. In most cases the query parser will lowercase your query as well, so the actual case of the entered query is irrelevant. However, if you include a wildcard the query is not converted and therefore will not find any matches, if you used uppercase in the query. For example, if you know an issue exists which includes the text IMPORTANT, and you search for it using "important" or you search using "IMPORTANT", the search engine will find the issue. However, if you were to use a wildcard, IMPORT* will get no results. import* will return the issue you expect. In general, it is best to stick with lowercase in your queries.

Fuzzy Searches

Fuzzy searches are based on the Levenshtein Distance, or Edit Distance algorithm. To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to "roam" use the fuzzy search: roam~ This search will find terms like foam and roams.

Proximity Searches

Proximity searches find words that are a within a specific distance from each other. To do a proximity search use the tilde, "~", symbol at the end of a Phrase. For example to search for a "apache" and "jakarta" within 10 words of each other in the text use the search: "jakarta apache"~10.

Boolean operators

Boolean operators allow terms to be combined through logic operators: AND, "+", OR, NOT and "-". Boolean operators must be ALL CAPS. If two terms are entered with no Boolean operator, the OR operator is used by default.

OR The OR operator is used between two terms to search for text that contains either of the terms. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.

AND The AND operator is used to find text that contains both terms anywhere in the text. This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.

+ The "+", or required, operator requires that the term after the "+" symbol exist somewhere in the text. To search for results that must contain "jakarta" and may contain "lucene" use the query: +jakarta lucene.

NOT The NOT operator excludes results that contain the term after NOT. This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT. To search for text that contain "jakarta apache" but not "jakarta lucene" use the query: "jakarta apache" NOT "jakarta lucene". Note: The NOT operator cannot be used with just one term. For example, the following search will return no results: NOT "jakarta apache"

- The "-", or prohibit, operator excludes results that contain the term after the "-" symbol. To search for text that contain "jakarta apache" but not "jakarta lucene" use the query: "jakarta apache" -"jakarta lucene".

Grouping

Grouping is supported using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query. To search for either "jakarta" or "apache" and "website" use the query: (jakarta OR apache) AND website. This eliminates any confusion and makes sure you that website must exist and either term jakarta or apache may exist

Escaping Special Characters

Using any of the previously discussed characters or words, which would normally be interpreted as operators, within a term can be achieved using escaping. Note: some operators exist for the underlying lucene search engine, but are not exposed in Scarab's usage. These operators such as colon ":" need to be escaped. The current list special characters is: + - & || ! ( ) { } [ ] ^ " ~ * ? : \

You can escape special characters by placing the term in quotes or by placing a "\" (backslash) before the operator. You should always use quotes for search strings containing numbers. For example, if you are searching for a database error code, such as ORA-00932, you will need to enter the term in quotes in the search input field: "ORA-00932". However if searching for a normal hyphenated word such as twenty-five, entering the search text as twenty\-five or "twenty-five" will work.

Using quotes can work if AND or OR are used as part of the query text as well, though these words are included in a set of common english words that are ignored in searches.

The instructions given above are based on the lucene documentation at http://jakarta.apache.org/ lucene/docs/queryparsersyntax.html