Natural-Language-Processing (NLP)
Implementations of Natural Language Processing algorithm by Python 3
The repository provides demo programs for implementations of basic NLP algorithms by Python 3. I hope these programs will help people understand the NLP theories and implementations.
I will enrich those implementations and descriptions from time to time. If you include any of my work into your website or project; please add a link to this repository and send me an email to let me know.
Your comments are welcome. Thanks,
Algorithm | Description | Link |
---|---|---|
POS (Part-of-Speech) tagging by Hidden Markov model (HMM) and Viterbi with smoothing methods | HMM is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (i.e. hidden) states. The Viterbi algorithmis used to compute the most probable path (as well as its probability) of state changing. It requires knowledge of the parameters of the HMM model and a particular output sequence and it finds the state sequence that is most likely to have generated that output sequence. It works by finding a maximum over all possible state sequences. In sequence analysis, this method can be used for example to predict coding vs non-coding sequences. This implementation include two programs: Training program with add one smoothing on initial and transition probabilities table, and emission probabilities table with no smoothing. Tagging program by HMM, Viterbi, Good-Turing smoothing and some feature engineerings (regular rules) on unknown words and some | Specification and HMM-Learner Source Code , HMM-Viterbi Algorithm Source Code |
Sentiment classifications for Hotel reviews by Naive Bayes classifier with Laplace smoothing methods | NaiveBayes is a statistical probability classifiers based on applying Bayes’ theorem with strong (naive) independence assumptions between each feature. This implementation include two programs: Training program with add Laplace smoothing on posterior probabilities table. Classification program by Naive Bayes classifier and drop out unseen words. Naive Bayes class codes replicate into two programs. You can move it out or merge the two modules into one program. The implementation includes “Laplace_smoothing” to prevent some features/attributes do not occur in every class value. | Specification and Naive Bayes Learner Source Code , Naive Bayes Classifier Source Code |
Sentiment classifications for Hotel reviews by Vanilla and Averaged Perceptron classifiers | Perceptron is a supervised binary classifiers. This implementation include two programs: Training program with vanilla and averaged models. Perceptron class codes replicate into two programs. You can move it out or merge the two modules into one program. | Specification and Perceptron Learner Source Code , Perceptron Classifier Source Code |
Reference:
Hal Daumé III, A Course in Machine Learning (v. 0.99 draft), Chapter 4: The Perceptron.
University of Southern California, Spring 2018, CSCI 544 — Applied Natural Language Processing. Instructor: Professor Ron Artstein