What is Latent Semantic Analysis (LSI Indexing)?

Latent Semantic Indexing allows you to use synonyms for keywords or phrase instead of repeating the same keyword phrase to make the content more natural for search engine and Users both. LSI obviously represents terms and documents in a rich, high-dimensional space, allowing the underlying, semantic relationships between terms and documents to be exploited during search.
 
Latent Semantic Analysis aka LSA is not the same thing as Latent Semantic Indexing aka LSI. LSA is the basis for LSI. It is the result of LSA that leads to LSI.

LSA is not about replacing keywords with synonyms either. Just replacing keywords with synonyms doesn't make LSA show better results. The whole idea behind LSA is to determine whether the content is relevant to the topic at hand.

Take for example, you have a topic which says "Top Dog Breeds". LSA will parse your content to see if you are really talking about top dog breeds or you are just stuffing keywords and synonyms for keywords, purely for the purpose of fooling search engines into thinking that the article is about top dog breeds when in actual fact, it is just full of rubbish without any useful information about top dog breeds.

LSA is a tool based on mathematical formulas. It works with semantic groups of words. Semantics focus on the meaning of the words rather than the actual words themselves. So in the article given as an example above, it's not enough to just replace the word dog with, say, canine, pooch, mutt, etc. There has got to be other words which give the meanings related to dogs like, say, companionship, guard, watch, care, etc. And the words must be connected in a meaningful coherence.

In simple language, LSA just means that it's a way to see if your content is what it claims to be.
 
Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text. A matrix containing word counts per paragraph (rows represent unique words and columns represent each paragraph) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Words are then compared by taking the cosine of the angle between the two vectors (or the dot product between the normalizations of the two vectors) formed by any two rows. Values close to 1 represent very similar words while values close to 0 represent very dissimilar words.
 
Last edited:
Latent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text.
 
What is Latent Semantic Analysis (LSI Indexing)?

Latent Semantic Analysis aka LSA is not the same thing as Latent Semantic Indexing aka LSI. LSA is the basis for LSI. It is the result of LSA that leads to LSI.
Latent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text.
 
Back
Top