• 6 Ma’lumotlar uchun frequency va likehood tablellarni quring.
  • Har bir so‘z uchun ushbu so‘zning har bir sinfda paydo bo‘lish ehtimolini




    Download 0,6 Mb.
    bet3/5
    Sana24.12.2023
    Hajmi0,6 Mb.
    #127865
    1   2   3   4   5
    Bog'liq
    2 amaliy topshiriq Narzikulov Zafarbek

    5 Har bir so‘z uchun ushbu so‘zning har bir sinfda paydo bo‘lish ehtimolini
    hisoblang
    Quyidagi kodda, MultinomialNB bayes klassifikatori va CountVectorizer BoW modeli ishlatiladi:
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.metrics import accuracy_score
    from sklearn.model_selection import train_test_split
    X = ma'lumotlar_tayyor # ma'lumotlarning tayyorlangan shakli
    y = labels # labels degan ma'lumotlar (ijobiy yoki salbiy) - o'zgaruvchi sifatida kiritilgan o'zingizning ma'lumotlariz bo'lishi mumkin
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    vectorizer = CountVectorizer()
    X_train_bow = vectorizer.fit_transform(X_train)
    X_test_bow = vectorizer.transform(X_test)
    classifier = MultinomialNB()
    classifier.fit(X_train_bow, y_train)
    y_pred = classifier.predict(X_test_bow)
    accuracy = accuracy_score(y_test, y_pred)
    print("Modelni baholash natijasi:", accuracy)
    Bu kod bo'yicha, X ma'lumotlarining har bir so'zi uchun ushbu so'zning har bir sinfda paydo bo'lish ehtimolini hisoblash uchun Bayes klassifikatori modelini ishlab chiqiladi. Natijalarni baholashda, modelning qanday chuqur tanlashi va ushbu tanlovning qanday baxolishi bilan tanishishingiz mumkin.
    6 Ma’lumotlar uchun frequency va likehood tablellarni quring.
    import pandas as pd
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.naive_bayes import MultinomialNB
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    vectorizer = CountVectorizer()
    X_train_bow = vectorizer.fit_transform(X_train)
    X_test_bow = vectorizer.transform(X_test)
    classifier = MultinomialNB()
    classifier.fit(X_train_bow, y_train)
    features = vectorizer.get_feature_names_out()
    likelihood_table = pd.DataFrame(data={'So'zlar': features, 'Ijobiy (Positive) Likelihood': classifier.feature_log_prob_[1], 'Salbiy (Negative) Likelihood': classifier.feature_log_prob_[0]})
    likelihood_table['Ijobiy (Positive) Likelihood'] = likelihood_table['Ijobiy (Positive) Likelihood'].apply(lambda x: 2 ** x) # likelihood qiymatlarini tartibga solish
    likelihood_table['Salbiy (Negative) Likelihood'] = likelihood_table['Salbiy (Negative) Likelihood'].apply(lambda x: 2 ** x) # likelihood qiymatlarini tartibga solish

    frequency_table = pd.DataFrame(data={'So'zlar': features, 'Ijobiy (Positive) Frequency': X_train_bow[y_train == 1].sum(axis=0).tolist()[0], 'Salbiy (Negative) Frequency': X_train_bow[y_train == 0].sum(axis=0).tolist()[0]})


    frequency_table['Ijobiy (Positive) Frequency'] = frequency_table['Ijobiy (Positive) Frequency'] + 1 # Qolgan bo'lmagan so'zlarni hisobga olish
    frequency_table['Salbiy (Negative) Frequency'] = frequency_table['Salbiy (Negative) Frequency'] + 1 # Qolgan bo'lmagan so'zlarni hisobga olish
    likelihood_table.to_csv('likelihood_table.csv', index=False)
    frequency_table.to_csv('frequency_table.csv', index=False)

    Download 0,6 Mb.
    1   2   3   4   5




    Download 0,6 Mb.

    Bosh sahifa
    Aloqalar

        Bosh sahifa



    Har bir so‘z uchun ushbu so‘zning har bir sinfda paydo bo‘lish ehtimolini

    Download 0,6 Mb.