Big data accelerated
the digital economy,
enabling use cases
that had previously been
impossible or cost-prohibitive to achieve.
New big data approaches
–
Around 2005, we entered the era of Web 2.0, when companies began to realize just how much
data users generated through social media and other online services.
Data of all types,
structured
and unstructured
, needed to be collected, processed, and analyzed. Current technologies couldn’t
process it, at least not economically. A new approach was needed.
Google
published a paper on MapReduce, a programming model that defined a system for
processing large data sets. Yahoo got involved in the project, and Hadoop was created. In 2008,
Yahoo released Hadoop to
the Apache Software Foundation, followed by the Apache Software
Foundation releasing Apache Hadoop 1.0 in 2011.
Hadoop, an open source framework, accelerated the utility and growth of big data.
The Hadoop
Distributed File System is a storage system that can distribute data across clusters of computers.
MapReduce enables parallel processing of that distributed data to increase performance. The
combination enabled the big data use cases that accelerated the digital economy, such as the
360-degree views of ecommerce customers. These use cases had previously
been impossible or
cost prohibitive to achieve.
The Hadoop framework rapidly expanded with tools for deploying and managing clusters,
scheduling processes,
querying data, and more. Spark, an open source data processing engine
for large data sets, became popular because
it enabled computational speed, scalability, and
programmability for big data—specifically with applications for streaming data,
graph data,
machine learning (ML), and artificial intelligence (AI). Spark stores and processes data in memory.
This is key to Spark’s performance because it lets applications avoid slow disk accesses.