Unsupervised natural language processing library.
Kadot just lets you process a text easily.
>>> hello_world = Text("Kadot just lets you process a text easily.") >>> hello_world.ngrams(n=2) [('Kadot', 'just'), ('just', 'lets'), ('lets', 'you'), ('you', 'process'), ('process', 'a'), ('a', 'text'), ('text', 'easily')]
🔋 What's included ?
Kadot includes tokenizers, text generators, classifiers, word-level and document-level vectorizers as well as a spell checker, a fuzzy string matching utility or a stopwords detector.
The philosophy of Kadot is "never hardcode the language rules" : use unsupervised solutions to support most languages. So it will never includes Treebank based algorithms (like a POS Tagger) : use TextBlob to do that.
🤔 How to use it ?
You can play with the TextBlob-like syntax :
>>> from kadot import Text >>> example_text = Text("This is a text sample !") >>> example_text.words ['This', 'is', 'a', 'text', 'sample'] >>> example_text.ngrams(n=2) [('This', 'is'), ('is', 'a'), ('a', 'text'), ('text', 'sample')]
Or you can use the words vectorizer to get words relations :
>>> history_book = text_from_file('history_book.txt') >>> vectors = history_book.vectorize(window=20, reduce_rate=300) >>> vectors.apply_translation(vectors['man'], vectors['woman'], vectors['king'], best=1) # 'man' is to 'woman' what 'king' is to... [('queen', 0.98872148869)]
For more usages, check examples. An advanced documentation is coming.
pip command that refair to you Python 3.x interpreter.
In my case :
$ pip3 install kadot
It actually require the Python's standard library, Numpy, Scipy and Scikit-Learn.
Kadot is under MIT license.
Issues and pull requests are gratefully welcome. Come help us !