|
The following are some of the data toolsBog'liq 050623-The Future of Big Data with Data LakehouseThe following are some of the data tools
that many cloud providers offer their users:
Object Storage
Enables organizations to store any type
of data in its native format—this is ideal
for building modern applications that
require scale and flexibility
Data Integration
Easy-to-use tools that connect to
public and private data sources such as
databases and applications and reliably
transfer and synchronize the data to the
datastores in the data lake
Data Preparation
Visual tools to create data
transformations between the source
and the target
Data catalog
An inventory of enterprisewide data
assets to help search, explore, and
govern data in the data lake
Data streaming
Lets organizations process data in
real time, enabling resilient stream
processing operations such as filters,
joins, maps, aggregations, and
other transformations
Data management
Hadoop, Spark, databases, and query
tools that help organizations manage
data across all stores in the data lake
Analytics
Tools to help organizations understand
and discover trends in their data and
use them to guide decision-making
Using those tools, companies can
start data lakes for their unstructured
data on a small scale and continually
expand them with new data types, data
sources, and applications to derive
value from the data.
Learn how to build a data lake
→
Introduction
Big data beginnings
New big data approaches
Big data challenges
Data lakes
Data platforms
AI and ML
Business Use Cases
Conclusion
Data lakes
–
The emergence of public clouds had a profound impact on the way
organizations could tackle big data challenges. The availability of cheap,
reliable, and infinitely scalable storage let companies ingest and store the data
raw and unchanged, instead of cleaning, transforming, and aggregating it
before storage. That, in turn, enabled new methods of analyzing the data that
previously weren’t available.
James Dixon, then chief technology officer at Pentaho, coined the term “data
lake” for this new approach. Rather than creating isolated data warehouses, a
data lake promised to be a single repository for all of a company’s information.
Data lakes
can be built with
Hadoop technologies or with
object storage and managed
data services provided by a
cloud provider. By delegating
the infrastructure work and
applications management to
a cloud provider, companies
can decrease the IT work of
big data tasks and focus on
data management.
|
| |