|
Ma’lumotlar uchun frequency va likehood tablellarni quring
|
bet | 4/4 | Sana | 07.12.2023 | Hajmi | 111,07 Kb. | | #113353 |
Bog'liq iqtisodiyot6 Ma’lumotlar uchun frequency va likehood tablellarni quring.
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
vectorizer = CountVectorizer()
X_train_bow = vectorizer.fit_transform(X_train)
X_test_bow = vectorizer.transform(X_test)
classifier = MultinomialNB()
classifier.fit(X_train_bow, y_train)
features = vectorizer.get_feature_names_out()
likelihood_table = pd.DataFrame(data={'So'zlar': features, 'Ijobiy (Positive) Likelihood': classifier.feature_log_prob_[1], 'Salbiy (Negative) Likelihood': classifier.feature_log_prob_[0]})
likelihood_table['Ijobiy (Positive) Likelihood'] = likelihood_table['Ijobiy (Positive) Likelihood'].apply(lambda x: 2 ** x) # likelihood qiymatlarini tartibga solish
likelihood_table['Salbiy (Negative) Likelihood'] = likelihood_table['Salbiy (Negative) Likelihood'].apply(lambda x: 2 ** x) # likelihood qiymatlarini tartibga solish
frequency_table = pd.DataFrame(data={'So'zlar': features, 'Ijobiy (Positive) Frequency': X_train_bow[y_train == 1].sum(axis=0).tolist()[0], 'Salbiy (Negative) Frequency': X_train_bow[y_train == 0].sum(axis=0).tolist()[0]})
frequency_table['Ijobiy (Positive) Frequency'] = frequency_table['Ijobiy (Positive) Frequency'] + 1 # Qolgan bo'lmagan so'zlarni hisobga olish
frequency_table['Salbiy (Negative) Frequency'] = frequency_table['Salbiy (Negative) Frequency'] + 1 # Qolgan bo'lmagan so'zlarni hisobga olish
likelihood_table.to_csv('likelihood_table.csv', index=False)
frequency_table.to_csv('frequency_table.csv', index=False)
7 Bayes teoremasidan foydalanib, ijobiy yoki salbiy sinflarga mansub
so‘zlarning ehtimolliklarini hisoblang.
jobiy_likelihoods = (frequency_table['Ijobiy (Positive) Frequency'] + 1) / (frequency_table['Ijobiy (Positive) Frequency'].sum() + len(features))
salbiy_likelihoods = (frequency_table['Salbiy (Negative) Frequency'] + 1) / (frequency_table['Salbiy (Negative) Frequency'].sum() + len(features))
jobiy_prior = y_train.mean()
salbiy_prior = 1 - jobiy_prior
total_word_likelihood = jobiy_likelihoods * jobiy_prior + salbiy_likelihoods * salbiy_prior
jobiy_posterior = (jobiy_likelihoods * jobiy_prior) / total_word_likelihood
salbiy_posterior = (salbiy_likelihoods * salbiy_prior) / total_word_likelihood
result_table = pd.DataFrame(data={'So'zlar': features, 'Ijobiy (Positive) Posterior': jobiy_posterior, 'Salbiy (Negative) Posterior': salbiy_posterior})
result_table.to_csv('result_table.csv', index=False)
8 Test to‘pamiga asoslanib, modelning tasniflash aniqligini baholang.
from sklearn.metrics import accuracy_score
X_test_bow = vectorizer.transform(X_test)
y_pred = classifier.predict(X_test_bow)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
kod test ma'lumotlarini BoW formatiga o'tkazadi va modelni ishlatib, bayolangan sinflar bilan taqqoslaydi. Natijalardan "accuracy_score" metrikasi orqali aniqlikni hisoblayadi va ekranga chiqaradi. Yani, modelning qanday darajada to'g'ri tasniflashni ko'rsatdiğini aytib chiqaradi.
|
| |