• 1. Apache Spark va PySpark
  • 2. Hadoop MapReduce Streaming
  • Image Recognition




    Download 5,69 Mb.
    bet130/182
    Sana19.05.2024
    Hajmi5,69 Mb.
    #244351
    1   ...   126   127   128   129   130   131   132   133   ...   182
    Bog'liq
    Python sun\'iy intellekt texnologiyasi Dasrlik 2024

    Distributed File Systems:
    Python, distributed file systemlariga (mas., HDFS) ulash uchun vositalarni o‘z ichiga oladi.
    Python, undan tashqari, bir qancha taniqli Big Data ekosistemalari (mas., Apache Kafka, Apache Flink) bilan ham integratsiya qilinadi va uzoq muddatli, yirik hajmli ma’lumotlarni boshqarishda yaxshi natijalarga olib keladi.
    Amaliy misollar quyidagi darajada bo‘lishi mumkin, lekin ma’lumotlar, kutubxonalar va platformalar o‘zgarishi mumkin:
    1. Apache Spark va PySpark:
    PySpark orqali Spark-dan foydalanib, ma’lumotlarni ishlash misoli:
    from pyspark.sql import SparkSession # Spark session yaratish
    spark = SparkSession.builder.appName("example").getOrCreate()
    # Ma’lumotlarni yuklab olish
    data = spark.read.csv("path/to/data.csv", header=True, inferSchema=True)
    # Ma’lumotlarni ko‘rsatish
    data.show() # Qandaydir statistik ma’lumotlar olish
    data.describe().show()
    2. Hadoop MapReduce Streaming:
    MapReduce Streaming yordamida Python skriptlarni ishlatish:
    hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming*.jar \
    -files mapper.py,reducer.py -mapper mapper.py -reducer reducer.py \
    -input input_dir -output output_dir
    Mapper skripti (mapper.py):
    #!/usr/bin/env python
    import sys
    for line in sys.stdin:
    words = line.strip().split()
    for word in words:
    print(f"{word}\t1")
    Reducer skripti (reducer.py):#!/usr/bin/env python
    from operator import itemgetter
    import sys
    current_word = None
    current_count = 0
    word = None
    for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t’, 1)
    try:
    count = int(count)
    except ValueError:
    continue
    if current_word == word:
    current_count += count
    else:
    if current_word:
    print(f"{current_word}\t{current_count}")
    current_count = count
    current_word = word
    if current_word == word:
    print(f"{current_word}\t{current_count}")

    Download 5,69 Mb.
    1   ...   126   127   128   129   130   131   132   133   ...   182




    Download 5,69 Mb.