Counting 2,129 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Author
Last Commit
Nov. 4, 2017
Created
Feb. 10, 2017

Kadot

Unsupervised natural language processing library.

Build Status Code Health PyPI version GitHub license

Kadot just lets you process a text easily.

>>> hello_world = Text("Kadot just lets you process a text easily.")
>>> hello_world.ngrams(n=2)

[('Kadot', 'just'), ('just', 'lets'), ('lets', 'you'), ('you', 'process'), ('process', 'a'), ('a', 'text'), ('text', 'easily')]

🔋 What's included ?

Kadot includes tokenizers, text generators, classifiers, word-level and document-level vectorizers as well as a spell checker, a fuzzy string matching utility or a stopwords detector.

The philosophy of Kadot is "never hardcode the language rules" : use unsupervised solutions to support most languages. So it will never includes Treebank based algorithms (like a POS Tagger) : use TextBlob to do that.

🤔 How to use it ?

You can play with the TextBlob-like syntax :

>>> from kadot import Text
>>> example_text = Text("This is a text sample !")
>>> example_text.words

['This', 'is', 'a', 'text', 'sample']

>>> example_text.ngrams(n=2)

[('This', 'is'), ('is', 'a'), ('a', 'text'), ('text', 'sample')]

Or you can use the words vectorizer to get words relations :

>>> history_book = text_from_file('history_book.txt')
>>> vectors = history_book.vectorize(window=20, reduce_rate=300)
>>> vectors.apply_translation(vectors['man'], vectors['woman'], vectors['king'], best=1)

# 'man' is to 'woman' what 'king' is to...
[('queen', 0.98872148869)]

For more usages, check examples. An advanced documentation is coming.

🔨 Installation

Use the pip command that refair to you Python 3.x interpreter. In my case :

$ pip3 install kadot

It actually require the Python's standard library, Numpy, Scipy and Scikit-Learn.

⚖️ License

Kadot is under MIT license.

🚀 Contribute

Issues and pull requests are gratefully welcome. Come help us !

forthebadge

Latest Releases
The big release !
 Jul. 31 2017
PositionalWordVectorizer
 Jul. 11 2017
Essentials
 Jun. 15 2017
Classifiers !
 Feb. 27 2017
First step to stability
 Feb. 23 2017