jarayoni deb nomlanadi. TF-IDF usulining Python tilidagi tatbigʻini  koʻrib chiqamiz. Yuqoridagi BoW (Tf-IDF)




Download 379,37 Kb.
Pdf ko'rish
bet12/16
Sana11.12.2023
Hajmi379,37 Kb.
#116133
1   ...   8   9   10   11   12   13   14   15   16
Bog'liq
b.elov n.xudayberganov z.xusainova til va madaniyat

jarayoni deb nomlanadi. TF-IDF usulining Python tilidagi tatbigʻini 
koʻrib chiqamiz. Yuqoridagi BoW (Tf-IDF) modelini amalga oshirish 
uchun Sklearn kutubxonasidagi TfidfVectorizer () funksiyasidan 
foydalanish mumkin.
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer, 
TfidfVectorizer


46
Botir ELOV, Nizomaddin XUDAYBERGANOV, Zilola XUSAINOVA 
sentence_1=”Bu uylar narxi arzon, lekin ular shahar 
markaziga yaqin emas”
sentence_2=”Bu mahsulot narxi arzon emas”
# silliq IDF holida
print(“Silliq IDF holida:”)
# tf-idf qiymatni hiosblash
tf_idf_vec = TfidfVectorizer(use_idf=True, 
smooth_idf=False,
ngram_range=(1,1),stop_
words=[“bu”,”lekin”,”ular”]) # to use only bigrams ngram_
range=(2,2)
# transformatsiya
tf_idf_data = tf_idf_vec.fit_
transform([sentence_1,sentence_2])
# dataframeni shakllantirish
tf_idf_dataframe=pd.DataFrame(tf_idf_data.
toarray(),columns=tf_idf_vec.get_feature_names())
print(tf_idf_dataframe)
print(“\n”)
# silliq IDF
tf_idf_vec_smooth = TfidfVectorizer(use_idf=True,
smooth_idf=True,
ngram_range=(1,1),stop_
words=[“bu”,”lekin”,”ular”])
tf_idf_data_smooth = tf_idf_vec_smooth.fit_
transform([sentence_1,sentence_2])
print(“Silliq IDF:”)
tf_idf_dataframe_smooth=pd.DataFrame(tf_idf_data_smooth.
toarray(),columns=tf_idf_vec_smooth.get_feature_names())
print(tf_idf_dataframe_smooth)
_________________________
Silliq IDF holida:
arzon emas mahsulot markaziga narxi shahar


Tabiiy tilni qayta ishlashda bag of words algoritmidan foydalanish
47
uylar yaqin
1 0.262912 0.262912 0.00000 0.445149 0.262912
0.445149 0.445149 0.445149
2 0.412859 0.412859 0.69903 0.000000 0.412859
0.000000 0.000000 0.000000
Silliq IDF:
arzon emas mahsulot markaziga narxi shahar
uylar yaqin
1 0.302873 0.302873 0.000000 0.425677 0.302873
0.425677 0.425677 0.425677
2 0.448321 0.448321 0.630099 0.000000 0.448321
0.000000 0.000000 0.000000

Download 379,37 Kb.
1   ...   8   9   10   11   12   13   14   15   16




Download 379,37 Kb.
Pdf ko'rish

Bosh sahifa
Aloqalar

    Bosh sahifa



jarayoni deb nomlanadi. TF-IDF usulining Python tilidagi tatbigʻini  koʻrib chiqamiz. Yuqoridagi BoW (Tf-IDF)

Download 379,37 Kb.
Pdf ko'rish