我爱电子书-《Learning Spark, 2nd Edition: Lightning-Fast Data Analytics》| pdf + epub + mobi + awz3, 高清版, 带目录，Kindle版, 多看精排版下载

注重体验与质量的电子书资源下载网站

分类于: 其它编程语言

简介

Learning Spark, 2nd Edition: Lightning-Fast Data Analytics 豆 0.0分

资源最后更新于 2020-10-05 18:44:10

作者：Tathagata Das

出版社：O'Reilly Media

出版日期：2020-01

ISBN：9781492050049

文件格式： pdf

标签： Spark 大数据计算机科学数据分析分布式软件工程 BigData

简介· · · · · ·

Data is getting bigger, arriving faster, and coming in varied formats—and it all needs to be processed at scale for analytics or machine learning. How can you process such varied data workloads efficiently? Enter Apache Spark.

Updated to emphasize new features in Spark 2.x., this second edition shows data engineers and scientists why structure and unification in Spark matters. ...

直接下载

1. Introduction to Unified Analytics with Apache Spark
The Genesis of Big Data and Distributed Computing at Google
Hadoop at Yahoo!
Spark’s Early Years at AMPLab
What is Apache Spark?
Speed
Ease of Use
Modularity
Extensibility
Why Unified Analytics?
Apache Spark Components as a Unified Stack
Apache Spark’s Distributed Execution and Concepts
Developer’s Experience
Who Uses Spark, and for What?
Data Science Tasks
Data Engineering Tasks
Machine Learning or Deep Learning Tasks
Community Adoption and Expansion
2. Downloading Apache Spark and Getting Started
Step 1: Download Apache Spark
Spark’s Directories and Files
Step 2: Use Scala Shell or PySpark Shell
Using Local Machine
Step 3: Understand Spark Application Concepts
Spark Application and SparkSession
Spark Jobs
Spark Stages
Spark Tasks
Transformations, Actions, and Lazy Evaluation
Spark UI
Databricks Community Edition
First Standalone Application
Using Local Machine
Counting M&Ms for the Cookie Monster
Building Standalone Applications in Scala
Summary
3. Apache Spark’s Structured APIs
A Bit of History…
Unstructured Spark: What’s Underneath an RDD?
Structuring Spark
Key Merits and Benefits
Structured APIs: DataFrames and Datasets APIs
DataFrames API
Common DataFrame Operations
Datasets API
DataFrames vs Datasets
What about RDDs?
Spark SQL and the Underlying Engine
Catalyst Optimizer
Summary
4. Spark SQL and DataFrames — Introduction to Built-in Data Sources
Using Spark SQL in Spark Applications
Basic Query Example
SQL Tables and Views
Data Sources for DataFrames and SQL Tables
DataFrameReader
DataFrameWriter
Parquet
JSON
CSV
Avro
ORC
Image
Summary
5. Spark SQL and Datasets
Single API for Java and Scala
Scala Case Classes and JavaBeans for Datasets
Working with Datasets
Creating Sample Data
Transforming Sample Data
Memory Management for Datasets and DataFrames
Dataset Encoders
Spark’s Internal Format vs Java Object Format
Serialization and Deserialization (SerDe)
Costs of Using Datasets
Strategies to Mitigate Costs
Summary
6. Loading and Saving Your Data
Motivation for Data Sources
File Formats: Revisited
Text Files
Organizing Data for Efficient I/O
Partitioning
Bucketing
Compression Schemes
Saving as Parquet Files
Delta Lake Storage Format
Delta Lake Table
Summary

简介

Learning Spark, 2nd Edition: Lightning-Fast Data Analytics 豆 0.0分

简介· · · · · ·

目录

猜你喜欢

Apache Kylin权威指南（第2版）: 大数据技术丛书

智联网：未来的未来

Prometheus监控实战: 云计算与虚拟化技术丛书

大数据掘金: 挖掘商业世界中的数据价值

智慧银行——未来银行服务新模式: 中国金融四十人论坛书系

大数据技术体系详解：原理、架构与实践