Tabiiy tilni qayta ishlashda so‘zlar orasidagi masofani aniqlash algoritmlaridan foydalanish
81
to‘plam hosil qilamiz
rvector = X_set.union(Y_set)
for w in rvector:
if w in X_set: l1.append(1) # maxsus vektor
yaratamiz
else: l1.append(0)
if w in Y_set: l2.append(1)
else: l2.append(0)
c = 0
#
kosinus formulasi
for i in range(len(rvector)):
c+= l1[i]*l2[i]
cosine = c / float((sum(l1)*sum(l2))**0.5)
print(“kosinus o‘xshashligi: “, cosine)
Natijada:
kosinus o‘xshashligi 0.4
Xulosa
So‘zlar orasidagi masofa qiymatini tavsiflovchi bir qancha
qiymatlar mavjud. Ulardan Hamming, Levenshteyin masofalari
va Kosinus o‘xshashligini yuqorida ko‘rib
chiqqan holda shuni
xulosa qilish mumkinki, har bir nazariyada kelib chiqadigan natija
hamda ularning samaradorligi orqali o‘zining ishlatilish o‘rnini
aniqlash mumkin. Aynan bir xil belgilar bilan boshlanuvchi so‘zlarni
tekshirgan vaziyatda, Hamming masofasi boshqalarga nisbatan
tabiiy tilni qayta ishlash jarayonida soddaroq hamda samaraliroq
bo‘ladi. Biroq, satrlar hajmi ortishi hamda
farq qiluvchi nuqtalar
o‘zgarishi bilan Levenshteyin masofasi samaradorlik darajasi
yuqoriga ko‘tariladi. Bundan tashqari, yuqorida berilgan nazariyalar
orasida, katta hajmdagi matn yoki hujjatlar bilan ishlagan vaziyatda
Kosinus o‘xshashligi eng munosib tanlov bo‘ladi. Demak, Hamming
masofasini boshlang‘ich asos deb qaraladigan bo‘lsa, Levenshteyin
masofasini uning so‘zlarga
nisbatan mukammalroq vaziyati, shu
bilan birgalikda, Kosinus o‘xshashligi esa so‘zlar jamlanmasi bo‘lgan
matnlar yoki hujjatlar bilan ishlash uchun eng munosib nazariya
sifatida qaraladi.
Adabiyotlar:
Waggener Bill. Pulse Code Modulation Techniques. Springer. p. 206. ISBN.
Retrieved 13 June, 2020.
Robinson, Derek J. S. (2003). An Introduction to Abstract Algebra. Walter
de Gruyter. pp. 255–257.ISBN.
82
Nizomaddin XUDAYBERGANOV, Shaxboz HASANOV
Levenshteyin, Vladimir I. (February 1966). “Binary codes capable of
correcting deletions, insertions, and reversals”. Soviet Physics
Doklady.
Levenshteyin Distance Computation by Sergey Grashchenko November 16, 2022.
https://www.baeldung.com/cs/Levenshteyin-distance-
computation
Cosine similarity https://en.wikipedia.org/wiki/Cosine_similarity
Connor, Richard (2016). A Tale of Four Metrics.
Similarity Search and
Applications. Tokyo: Springer.
Cosine distance, cosine similarity, angular cosine distance, angular cosine similarity.
https://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/
cosdist.htm.
Understanding cosine similarity and its application.
Richmond Alake
Sep 15,2020. Connor, Richard (2016). A Tale of Four Metrics.
Similarity Search and Applications. Tokyo: Springer.
Sidorov, Grigori; Velasquez, Francisco; Stamatatos, Efstathios; Gelbukh,
Alexander; Chanona-Hernández, Liliana (2013).
Advances
in Computational Intelligence. Lecture Notes in Computer
Science. Vol.7630. LNAI 7630. pp.1–11.