Explore Python Gensim Library For NLP

Avinash Navlani
10 min readApr 9, 2022

In this tutorial, we will focus on the Gensim Python library for text analysis.

Gensim is an acronym for Generate Similar. It is a free Python library for natural language processing written by Radim Rehurek which is used in word embeddings, topic modeling, and text similarity. It is developed for generating word and document vectors. It also extracts the topics from textual documents. It is an open-source, scalable, robust, fast, efficient multicore Implementation, and platform-independent.

In this tutorial, we are going to cover the following topics:

Contents

1. Installing Gensim
2. Create Gensim Dictionary
3. Bag of Words
4. TF-IDF
5. Word2Vec
6. Pretrained Word2Vec: Google’s Word2Vec, Standford’s Glove and Fast text
7. Google’s Word2Vec
8. Standford Glove
9. Facebook FastText
10. Doc2Vec
11. Summary

Installing Gensim

Gensim is one of the powerful libraries for natural language processing. It will support Bag of Words, TFIDF, Word2Vec, Doc2Vec, and Topic modeling. Let install the library…

--

--

Avinash Navlani

Sr Data Scientist| Analytics Consulting | Data Science Communicator | Helping Clients to Improve Products & Services with Data