|
Image Recognition
|
bet | 130/182 | Sana | 19.05.2024 | Hajmi | 5,69 Mb. | | #244351 |
Bog'liq Python sun\'iy intellekt texnologiyasi Dasrlik 2024Distributed File Systems:
Python, distributed file systemlariga (mas., HDFS) ulash uchun vositalarni o‘z ichiga oladi.
Python, undan tashqari, bir qancha taniqli Big Data ekosistemalari (mas., Apache Kafka, Apache Flink) bilan ham integratsiya qilinadi va uzoq muddatli, yirik hajmli ma’lumotlarni boshqarishda yaxshi natijalarga olib keladi.
Amaliy misollar quyidagi darajada bo‘lishi mumkin, lekin ma’lumotlar, kutubxonalar va platformalar o‘zgarishi mumkin:
1. Apache Spark va PySpark:
PySpark orqali Spark-dan foydalanib, ma’lumotlarni ishlash misoli:
from pyspark.sql import SparkSession # Spark session yaratish
spark = SparkSession.builder.appName("example").getOrCreate()
# Ma’lumotlarni yuklab olish
data = spark.read.csv("path/to/data.csv", header=True, inferSchema=True)
# Ma’lumotlarni ko‘rsatish
data.show() # Qandaydir statistik ma’lumotlar olish
data.describe().show()
2. Hadoop MapReduce Streaming:
MapReduce Streaming yordamida Python skriptlarni ishlatish:
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming*.jar \
-files mapper.py,reducer.py -mapper mapper.py -reducer reducer.py \
-input input_dir -output output_dir
Mapper skripti (mapper.py):
#!/usr/bin/env python
import sys
for line in sys.stdin:
words = line.strip().split()
for word in words:
print(f"{word}\t1")
Reducer skripti (reducer.py):#!/usr/bin/env python
from operator import itemgetter
import sys
current_word = None
current_count = 0
word = None
for line in sys.stdin:
line = line.strip()
word, count = line.split('\t’, 1)
try:
count = int(count)
except ValueError:
continue
if current_word == word:
current_count += count
else:
if current_word:
print(f"{current_word}\t{current_count}")
current_count = count
current_word = word
if current_word == word:
print(f"{current_word}\t{current_count}")
|
| |