|
NLTK (Natural Language Toolkit)
|
bet | 96/182 | Sana | 19.05.2024 | Hajmi | 5,69 Mb. | | #244351 |
Bog'liq Python sun\'iy intellekt texnologiyasi Dasrlik 2024NLTK (Natural Language Toolkit):
NLTK, matnlar bilan ishlash, tahlil qilish, va tasniflash uchun juda kuchli kutubxonadir.
pip install nltk
NLTK ni yuklab oling:
import nltk
nltk.download('punkt’)
Scikit-learn:
Scikit-learn, klassifikatsiya modellari yaratish uchun juda oson va samarali bir kutubxonadir.
pip install scikit-learn
TensorFlow yoki PyTorch:
Bu, yuqoridagi kutubxonalardan foydalanmagan holda deep learning modelini yaratish uchun ishlatiladi.
pip install tensorflow
yoki
pip install torch
TensorFlow yoki PyTorch orqali NLP modelini yaratishingiz mumkin.
Yuqoridagi kutubxonalardan birini yuklab olinganidan so‘ng, siz matnlarni tahlil qilish, kelmalarni va bayonotlarni tanib olish uchun funksiyalarni ishlatishingiz mumkin. Shuningdek, skikit-learn kutubxonasi yordamida klassifikatsiya modellari yaratishingiz mumkin.
Quyidagi kod misoli sizga yordam bera oladi:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score # Ma’lumotlarni tayyorlash
texts = ["Matn 1", "Matn 2", ..., "Matn N"]
labels = [1, 0, ..., 1] # Tasniflab bo‘lgan kategoriyalarni ifodalovchi
# Matnlarni va bayonotlarni tanib olish uchun CountVectorizer dan foydalanish
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts) # Ma’lumotlarni test va trenirovka qilish uchun ajratish
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2, random_state=42) # Klassifikatsiya modelini yaratish
classifier = MultinomialNB()
classifier.fit(X_train, y_train)
# Test qismi uchun modelni sinovdan o‘tqazish
y_pred = classifier.predict(X_test) # To‘g’ri natijalarni hisoblash
accuracy = accuracy_score(y_test, y_pred)
print(f"Modelning to‘g’ri aniqlash darajasi: {accuracy}")
Bu misol, “Matn 1”, “Matn 2”, ..., “Matn N” kabi matnlarni tasniflash uchun yaratilgan bo‘lib, ushbu matnlarni tasniflashda Naive Bayes klassifikatori ishlatilgan. Bu boshqa klassifikatsiya algoritmlarini ham sinab ko‘ring, shuningdek TensorFlow yoki PyTorch orqali deep learning modelini yaratishingiz mumkin.
Python-da avtomatik tanib olish (machine learning) va tasniflash (classification) tizimlarini yaratish uchun bir necha umumiy kutubxonalardan foydalanish mumkin. Bu tizimlar, ML-algoritm, matn, rasm, yoki boshqa turdagi ma’lumotlarni o‘rganish va tashkil etish uchun ishlatiladi.
Bu yo‘nalishni o‘rganish uchun quyidagi kutubxonalardan bir necha misollarni ko‘rsataman:
Scikit-Learn: Bu, oddiy va kuchli tanib olish tizimlarini o‘z ichiga olgan kutubxonadir. Uning ichida turli ML-algoritm turlari mavjud.
Masalan, sodda sinfi tanib olish uchun:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Ma’lumotlar vaqtiylarini tayyorlash
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42) # Sinfi tanib olish modelini yaratish va o‘qitish
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)# Test ma’lumotlarini sinfga o‘tkazish va natijalarni hisoblash
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")
TensorFlow va Keras: Bu, kuchli deep learning tizimlarini yaratish uchun juda mashhur kutubxonadirlar. TensorFlow, graphni ishlab chiqish va uni ishga tushirish imkoniyatlarini beradi.
Masalan, TensorFlow va Keras orqali keras modelini yaratish:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense # Keras modelini yaratish
model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=input_dim))
model.add(Dense(units=10, activation='softmax')) # Modelni o‘qitish
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val))
Natural Language Toolkit (NLTK): Matnlar bilan ishlash uchun juda qo‘l ma’lumotlar va funksiyalarni o‘z ichiga olgan kutubxonadir.
Masalan, NLTK orqali matnlarni tanib olish va tasniflash:
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score # Ma’lumotlarni tayyorlash
X_train, X_test, y_train, y_test = train_test_split(text_data, labels, test_size=0.2, random_state=42) # NLTK funksiyalari orqali matnlarni tahlil qilish
stop_words = set(stopwords.words('english'))
vectorizer = TfidfVectorizer(tokenizer=word_tokenize, stop_words=stop_words)
classifier = MultinomialNB() # Pipeline orqali barcha qadamni bir-biridan alohida bajarish
model = Pipeline([('vectorizer', vectorizer), ('classifier', classifier)])
# Modelni o‘qitish va sinovlarni bajarish
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")
Bu misollar umumiy ravishda tanib olish va tasniflash tizimlarini yaratishda yordam beradi. Xotira qilishingiz kerakki, har bir muammoga mos tizim va algoritmnin tanlanishi va o‘zgartirilishi lozim.
Ma’lumotlarni aniqlash va tasniflash asoslarini Python yordamida amalga oshirish uchun quyidagi muhim konseptlardan foydalanishingiz mumkin:
|
| |