CS458 Natural language Processing
Self-study 5 Naive Bayes, Text Classification & Sentiment Analysis
Krishnendu Ghosh
Department of Computer Science & Engineering
Indian Institute of Information Technology Dharwad
Naive Bayes
1. Scikit-learn
Install Scikit-learn:�pip install scikit-learn
Example Code:�from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
Naive Bayes
# Sample data
texts = ["I love programming", "Python is great", "I hate bugs", "Debugging is hard"]
labels = [1, 1, 0, 0]
# Vectorize the text
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.25, random_state=42)
Naive Bayes
# Train Naive Bayes model
model = MultinomialNB()
model.fit(X_train, y_train)
# Predict and evaluate
predictions = model.predict(X_test)
print("Predictions:", predictions)
Naive Bayes
2. NLTK
Install NLTK:�pip install nltk
Example Code:�import nltk
from nltk.classify import NaiveBayesClassifier
from nltk.classify.util import accuracy
# Sample data
training_data = [
({"love": True, "programming": True}, "positive"),
({"hate": True, "bugs": True}, "negative"),
({"debugging": True, "hard": True}, "negative"),
]
Naive Bayes
# Train the classifier
classifier = NaiveBayesClassifier.train(training_data)
# Test the classifier
test_data = [{"love": True, "python": True}, {"hate": True, "debugging": True}]
for test in test_data:
print(f"Classification for {test}: {classifier.classify(test)}")
# Display accuracy
print("Accuracy:", accuracy(classifier, training_data))
Naive Bayes
3. Custom Implementation
Example Code:
import numpy as np
# Sample data
X = np.array([[1, 0, 1], [1, 1, 0], [0, 1, 1], [1, 1, 1]])
y = np.array([0, 1, 1, 0])
# Calculate probabilities
def train_naive_bayes(X, y):
classes = np.unique(y)
priors = {cls: np.mean(y == cls) for cls in classes}
likelihoods = {cls: np.mean(X[y == cls], axis=0) for cls in classes}
return priors, likelihoods
priors, likelihoods = train_naive_bayes(X, y)
print("Priors:", priors)
print("Likelihoods:", likelihoods)
Text Classification
1. Scikit-learn
Key Features:
Example Code:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
# Sample data
texts = ["I love programming", "Python is amazing", "Debugging is hard", "I hate bugs"]
labels = [1, 1, 0, 0]
Text Classification
# Vectorization
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.25, random_state=42)
# Train classifier
clf = MultinomialNB()
clf.fit(X_train, y_train)
# Predict
predictions = clf.predict(X_test)
print(predictions)
Text Classification
2. NLTK
Example Code:
import nltk
from nltk.classify import NaiveBayesClassifier
# Training data
training_data = [
({"text": "I love programming"}, "positive"),
({"text": "Python is amazing"}, "positive"),
({"text": "Debugging is hard"}, "negative"),
({"text": "I hate bugs"}, "negative"),
]
# Train classifier
classifier = NaiveBayesClassifier.train(training_data)
# Test data
test_data = {"text": "I love Python"}
print("Classification:", classifier.classify(test_data))
Text Classification
3. TensorFlow/Keras
Key Features:
Example Code:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
# Sample data
texts = ["I love programming", "Python is amazing", "Debugging is hard", "I hate bugs"]
labels = [1, 1, 0, 0]
Text Classification
# Preprocess text (tokenization and padding)
tokenizer = tf.keras.preprocessing.text.Tokenizer()
tokenizer.fit_on_texts(texts)
X = tokenizer.texts_to_sequences(texts)
X = tf.keras.preprocessing.sequence.pad_sequences(X, maxlen=5)
# Create model
model = Sequential([
Embedding(input_dim=50, output_dim=16, input_length=5),
LSTM(32),
Dense(1, activation='sigmoid')
])
# Compile and train model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X, labels, epochs=5, verbose=1)
Text Classification
4. PyTorch
Key Features:
Example Code:
import torch
from torchtext.legacy.data import Field, TabularDataset, BucketIterator
from torchtext.legacy.data.utils import get_tokenizer
# Define fields
TEXT = Field(tokenize=get_tokenizer("basic_english"), lower=True)
LABEL = Field(sequential=False, use_vocab=False)
# Example training and processing
# Define your model as per your needs
Sentiment Analysis
1. VADER (Valence Aware Dictionary and sEntiment Reasoner)
Installation:
pip install vaderSentiment
Key Features:
Example Code:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
text = "I love programming! It's amazing!"
score = analyzer.polarity_scores(text)
print(score)
Sentiment Analysis
2. Hugging Face Transformers
Installation:
pip install transformers
Key Features:
Example Code:
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
result = classifier("I love programming in Python!")
print(result)
Sentiment Analysis
3. AllenNLP
Installation:
pip install allennlp
Key Features:
Example Code:
from allennlp.predictors import Predictor
predictor = Predictor.from_path("https://allennlp.s3.amazonaws.com/models/sentiment-analysis-bert.tar.gz")
result = predictor.predict(sentence="I love this!")
print(result)
Sentiment Analysis
4. PyTorch
Installation:
pip install torch
Key Features:
Example Code:
import torch
from torchtext.legacy.data import Field, TabularDataset, BucketIterator
from torchtext.legacy import data
TEXT = Field(tokenize='spacy', batch_first=True)
LABEL = Field(sequential=False, use_vocab=False)
# Define fields, model, and other necessary steps as per your dataset
Thank You