Term vectors for user questions are computed similarly by using the idf values associated with terms in a given FAQ.
This metric does not require any understanding of the text - a good thing because the answers in FAQ files are free natural language text, often several paragraphs or more in length.
The semantic-similarity metric enhances term-vector comparison by taking into account a shallow level of semantic analysis of lexical items that appear in user and FAQ questions.
However, in FAQ FINDER, because matching is being performed on a small number of terms, the system needs a means of matching such synonyms.
The need to match related words suggests the need for a level of semantic analysis of user and FAQ questions.
For example, because the consumer credit FAQ file is full of questions about credit reports and debts, it is important that the system identify the relation between ex-spouse and ex-husband.
Because FAQ FINDER is intended to encompass the whole gamut of USENET topics, not just computers, it is impractical to expect even this simple level of domain-specific knowledge representation.
By using a marker-passing algorithm (Quillian 1968), the FAQ FINDER system uses the WORDNET database to accept variations such as ex-husband for ex-spouse.
Marker passing is performed to compare each word in the user's question with each word in the FAQ file question.
The matrix s for a user question of length n and a FAQ file question of length m is an n x m matrix representing all possible comparisons of words in the two questions: