What is Latent Semantic Analysis (LSI Indexing)?

Latent Semantic Analysis aka LSI is not the same as Latent Semantic Analysis aka LSA. I think the OP made a slight mistake there in his title. He should have made the title "What is Latent Semantic Indexing (LSI)?" The word 'indexing' after LSI is redundant because LSI already is Latent Semantic Indexing.

So, @Michael, if you actually know the difference between LSA and LSI, you wouldn't be surprised that the OP has created two separate threads because LSA and LSI are two separate topics.
 
Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text. A matrix containing word counts per paragraph (rows represent unique words and columns represent each paragraph) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Words are then compared by taking the cosine of the angle between the two vectors (or the dot product between the normalizations of the two vectors) formed by any two rows. Values close to 1 represent very similar words while values close to 0 represent very dissimilar words.
 
Last edited:
Latent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text.
 
Back
Top