Preface 
  Acknowledgements 
About the Author
Chapter 1 Introduction to Data Analytics 
1.1                Introduction                                                                                      
1.2                What Is Data?                                                                                   
1.1.1     Data Relationships                                                                 
1.1.2     Data Models                                                                           
1.3                Types of Data                                                                                   
1.4                Nature of Data                                                                                  
1.5                Data Visualization                                                                            
1.6                Data Analysis Methods                                                                    
1.6.1     Correlation                                                                              
1.6.2     Regression                                                                              
1.6.3     Forecasting                                                                             
1.6.4     Clustering                                                                               
1.6.5     Classification                                                                          
1.7                Web Data                                                                                          
1.7.1     Evolution of Analytic Scalability                                           
1.7.2     Reporting vs. Analysis                                                           
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 2 Data Analytics Life-cycle 
2.1                Introduction                                                                                      
2.2                Business Drivers for Analytics                                                        
2.2.1     Increasing Profitability and Growth                                       
2.2.2     Strengthening Customer Experience and Intimacy               
2.2.3     Driving Digital Transformation and Innovation                    
2.2.4     Managing Regulatory and Compliance Risks                       
2.2.5     Increasing Operational Efficiency                                         
2.3                Typical Analytical Architecture                                                       
2.3.1     Data Analytical Architecture                                                 
2.3.2     Challenges of Conventional Systems
2.4                Analytic Processes and Tools                                                                                                          
2.4.1     Types of Analytics
2.4.2     Modern Data Analytic Tools
2.5                Data Analytic Life-cycle
2.5.1     Need of Data Analytic Life-cycle
2.5.2     Phases of Data Analytic Life-cycle
2.6                Key Roles for Successful Analytic Projects
2.7                Modern-day Intelligence
2.7.1     Business Intelligence vs. Data Science
2.7.2     Intelligent Data Analysis
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 3 Fundamentals of Big Data  
3.1                Introduction to Big Data                                                                                                          
3.2                Big Data Concepts and Terminology
3.2.1     Big Data Processing Activities
3.2.2     Common Terminologies
3.3                Fundamentals of Big Data Types                                                    
3.4                Big Data Analytics                                                                           
3.4.1     Text Analytics                                                                        
3.4.2     Audio Analytics                                                                   
3.4.3     Video Content Analytics                                                      
3.4.4     Social Media Analytics                                                           
3.4.5     Predictive Analytics                                                                
3.5                Distributed File System in Big Data                                               
3.6                Big Data Characteristics                                                                
3.6.1     The 5 V ’s of Big Data                                                           
3.6.2     Challenges of Processing Big Data                                       
3.7                Drivers for Big Data                                                                       
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 4 Big Data Analytics Technology 
4.1                Introduction to Big Data Analytics                                                
4.2                Big Data Analysis Framework                                                          
4.3                Approaches for Big Data Analysis                                                    
4.4                Understanding Text Analytics and Big Data                                    
4.4.1     Text Mining Process                                                             
4.4.2     Applications of Text Analytics
4.5                Predictive Analysis of Big Data                                                        
4.5.1     Predictive Analytics Models                                                   
4.5.2     Predictive Analytics Algorithms                                             
4.6                Procedural vs. Functional Programming Models for Big Data       
4.7                Big Data Integration Process                                                          
4.8                Big Data Technology Landscape                                                    
4.8.1     Big Data Architecture                                                          
4.8.2     Big Data Storage                                                                  
4.9                Big Data Key Roles                                                                        
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 5 Fundamentals of Hadoop 
5.1                Introduction                                                                                    
5.2                Problems with Traditional Large-scale Systems                               
5.3                Five V ’s of Big Data                                                                        
5.4                What Is Hadoop?                                                                            
5.5                History of Hadoop                                                                          
5.6                Why Hadoop?                                                                                 
5.7                Different Flavors of Hadoop                                                           
5.8                Different Modes of Hadoop                                                           
5.8.1     Standalone Mode                                                                  
5.8.2     Pseudo-distributed Mode (Single-node Cluster)                  
5.8.3     Fully Distributed Mode                                                        
5.9                Core Components of Hadoop                                                        
5.10           Hadoop Ecosystem                                                                         
5.11           Data Ingestion Layer                                                                      
5.12           ETL and ELT                                                                                  
5.13           Ingestion Tools in Hadoop Ecosystem                                           
5.14           Data Storage Layer                                                                         
5.14.1     Data Storage Tools                                                             
5.15           Processing Layer                                                                              
5.16           Analysis Layer                                                                                 
5.17           Management and Coordination                                                     
5.18           Anatomy of a Hadoop Cluster: HDFS Architecture                      
5.19           Data Locality in Hadoop                                                                
5.20           Configuration files in Hadoop                                                        
5.21           Limitations of Hadoop                                                                   
5.22           Distributed Cache in Apache Hadoop
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 6 Hadoop Distributed File System 
6.1                Introduction                                                                                    
6.2                Virtualization                                                                                 
6.3                Downloading VMware                                                                   
6.4                Installing VMware                                                                          
6.5                VirtualBox                                                                                      
6.5.1     VirtualBox Installation Steps                                                 
6.6                HDP Sandbox Download and Installation                                     
6.7                Ambari Administration                                                                  
6.8                HDFS Command Line Interface                                                    
6.8.1     JPS Command                                                                      
6.8.2     List of Files                                                                            
6.8.3     File Management                                                                 
6.8.4     Upload and Download Files                                                 
6.8.5     Ownership and Validation
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 7 MapReduce
7.11           Hadoop Reducer                                                                               
7.12            Hadoop Key-Value Pair                                                                   
7.13            Input Format in MapReduce                                                            
7.14            InputSplit in MapReduce                                                                 
7.15            Hadoop Record Reader                                                                    
7.16            MapReduce Partitioner                                                                    
7.16.1     MapReduce Combiner                                                         
7.17            Shuffling and Sorting in MapReduce                                              
7.17.1     Hadoop Output Format                                                        
7.18            Input Split vs. HDFS Block in MapReduce                                    
7.19            MapOnly Job in MapReduce                                                           
7.20            Hadoop Speculative Execution                                                        
7.21            Hadoop Counters                                                                              
7.22            Hadoop Optimization                                                                       
7.23            MapReduce Performance Tuning: Best Practices                           
7.23.1     System Level Best Practices                                                
7.23.2     Application Level Best Practices                                         
7.24            YARN                                                                                              
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 8 Hadoop Ingestion
8.1                Introduction                                                                                      
8.2                Data Ingestion Types                                                                       
8.2.1     Real-time Data Ingestion (RTDI)                                          
8.2.2     Batch-based Data Ingestion (BBDI)                                      
8.2.3     Lambda Architecture Data Ingestion (LADI)                        
8.3                Benefits of Data Ingestion                                                               
8.3.1     Data Ingestion Tools Selection                                              
8.4                Introduction to Sqoop                                                                      
8.5                Features of Sqoop                                                                            
8.6                Basic SQL Commands and Connecting from Cloudera                  
8.7                Basic Sqoop Commands from Cloudera Command Prompt           
8.8                Sqoop Importing                                                                              
8.9                Sqoop Incremental Import                                                               
8.10           Sqoop Export                                                                                    
8.11           Advantages of Sqoop                                                                       
8.12           Disadvantages of Sqoop
10.8            HBase Coprocessor                                                                          
10.9            Setting HBase Environment                                                             
10.10       Creating HBase Tables                                                                    
10.11       Listing all Tables                                                                              
10.12       Adding Data to a Table                                                                    
10.13       Getting a Row of Data                                                                     
10.14       Scanning a Table                                                                              
10.15       Counting the Number of Rows in a Table                                       
10.16       Altering a Table                                                                               
10.17       Deleting a Table Row, Column                                                       
10.18       Disabling and Enabling a Table                                                       
10.19       Truncating and Dropping a Table                                                    
10.20       Determining if Table Exists                                                             
10.21       Creating a Hive External Table Stored by HBase                           
10.21.1     Defining an External Table over HBase Tables                 
10.21.2    Mapping Specific HBase Columns and Column Families     
10.21.3     Working Hive with HBase (Integration)                            
10.22       Advanced Indexing in HBase                                                          
10.23       HIndex                                                                                              
10.23.1     Writing Data with Index                                                    
10.23.2     Reading Data with Index                                                    
10.23.3     HIndex Features                                                                 
10.24       HBase Admin API                                                                           
10.25       HBAse Client API                                                                            
10.25.1     Put Method                                                                         
10.25.2     Get Method                                                                         
10.26       Using HBase in Hadoop Applications                                             
10.27       HBase Advanced Usage                                                                   
10.27.1     Filters                                                                                  
10.27.2     The Filter Hierarchy                                                           
10.27.3     Comparison Operators                                                        
10.27.4     Comparators                                                                       
10.27.5     Comparison Filters                                                             
10.28       Dedicated Filters                                                                              
10.29       Decorating Filters
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 11 Hadoop Streaming 
11.1           Introduction                                                                                    
11.2           Real-time Analytics                                                                        
11.2.1     Choosing the Proper Tool for Real-time Analytics            
11.2.2     Apache Spark Streaming                                                    
11.2.3     Apache Samza                                                                      
11.2.4     What Would a Perfect Solution Entail?                              
11.2.5     Challenges to Be Solved                                                     
11.3           Thread Pooling                                                                               
11.4           Stream Computing                                                                         
11.5           The Future of Data Streaming                                                        
11.6            Stream Computing’s Advantages in the Big Data world                
11.7            How Streaming Works                                                                   
11.8            Real-time Streams vs. Batch Processing                                         
11.9            Hadoop Streaming                                                                          
11.9.1     Hadoop Streaming Characteristics                                     
11.9.2     Specifying Other Plugins for Jobs                                      
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 12 Pig Latin 
12.1           Introduction                                                                                    
12.2           Basic Features of Apache Pig
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 13 Fundamentals of Spark
13.6            Design Principles of Apache Spark                                                 
13.7            Advantages of Spark                                                                        
13.8            Disadvantages of Apache Spark                                                      
13.9            Installation of Apache Spark on Windows                                      
13.10       Apache Spark Physical Architecture                                               
13.11       Apache Spark Layered Architecture                                                
13.11.1     Resilient Distributed Dataset                                              
13.11.2     Directed Acyclic Graph (DAG)                                         
13.12       Ways to Create RDD in Spark                                                         
13.13       Paired RDD                                                                                      
13.14       Features of Spark RDD                                                                    
13.15       Persistence and Caching Mechanisms in Apache Spark                  
13.16       Operations of Apache Spark RDD                                                   
13.16.1     Transformations                                                                  
13.16.2     Actions                                                                                
13.17       Limitations of Apache Spark RDD and Ways to Overcome It        
13.18       Directed Acyclic Graph (DAG)                                                       
13.19       DAG in Apache Spark                                                                     
13.19.1     Need for DAG in Apache Spark                                        
13.19.2     Working Principle of DAG in Spark                                  
13.20       Applications of Apache Spark                                                         
13.20.1     Streaming Data                                                                   
13.21       Spark in Real-world                                                                         
13.22       Use Cases of Spark                                                                          
13.23       Spark vs. Hadoop                                                                             
13.24       Sample Program                                                                               
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 14 Introduction to NoSQL Database Concepts 
14.1           Introduction                                                                                      
14.2           Relational Databases
14.3           NoSQL Definition                                                                         
14.4            Types of NoSQL Databases                                                            
14.4.1     Column Family Databases                                                  
14.4.2     Key-Value Pair Database                                                      
14.4.3     Document Store                                                                 
14.4.4     Graph Database                                                                  
14.5            Examples of NoSQL Databases                                                     
14.6            Advantages of NoSQL Databases                                                     
14.7            NoSQL Usage                                                                                
14.8            SQL vs. NoSQL                                                                             
14.9            New SQL                                                                                        
14.10       ACID                                                                                              
14.10.1     Atomicity                                                                         
14.10.2     Consistency                                                                      
14.10.3     Isolation                                                                            
14.10.4     Durability                                                                         
14.11       BASE                                                                                              
14.12       Two-phase Commit                                                                        
14.12.1     Commit–request Phase                                                     
14.13       Schema                                                                                           
14.13.1     Sharding and Share Nothing Architecture                       
14.13.2     Partitioning Horizontal and Vertical Data                        
14.13.3     Four Basic Strategies for Shard Structure                         
14.14       Brewer’s CAP Theorem                                                                 
14.15       Cassandra – Definition and Features                                              
14.15.1     Definition                                                                         
14.15.2     Features                                                                            
14.15.3     Key Structures in Cassandra                                               
14.15.4     Cassandra Advantages and Use Cases                                 
14.16       MongoDB                                                                                       
14.16.1     Architecture of MongoDB                                                
14.16.2     MongoDB Advantages and Use Cases                             
14.17       HBase                                                                                             
14.17.1     HBase Architecture                                                          
14.18       Comparing Cassandra, MongoDB, and HBase                              
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 15 Cassandra Data Model 
15.1           Introduction                                                                                    
15.2           Use Cases of Cassandra                                                                    
15.3            Cassandra Installation in Windows Environment                           
15.3.1     Installing Python 2.7.x Edition                                          
15.3.2     Installing Apache Cassandra                                                 
15.4            Cassandra Basic CQL                                                                     
15.5            How to Create, Alter, Drop and Use Keyspace in Cassandra         
15.5.1     Create Keyspace                                                                 
15.5.2     Simple Strategy                                                                  
15.5.3     Network Topology Strategy                                               
15.6            Column Families                                                                            
15.6.1     Types of Columns                                                              
15.7            Cassandra Table                                                                             
15.7.1     Inserting and Displaying Data from the Table                    
15.7.2     Updating the Table Data                                                    
15.8            Data Types in Cassandra                                                                   
15.8.1     Collection Data Type in Cassandra                                    
15.9            Cassandra BATCH                                                                         
15.10       Difference Between Cassandra and RDBMS                                 
15.11       Denormalization                                                                            
15.12       Design Patterns                                                                              
15.12.1     Coexistence Patterns                                                        
15.13       RDBMS Migration Patterns                                                          
15.14       CAP Patterns                                                                                  
15.15       Temporal Patterns                                                                          
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 16 Cassandra Architecture 
16.1           Introduction                                                                                    
16.1.1     Cassandra Architecture                                                      
16.1.2     Features of Cassandra                                                           
16.2           Cassandra’s Peer-to-Peer Approach                                               
16.3            Gossip and Failure Detection                                                         
16.4            SS Tables and Commit Log                                                            
16.4.1     Partition and Token                                                            
16.4.2     Compression Offset Map                                                   
16.4.3     Cassandra Commit Log
16.5           Cassandra Memtable                                                                      
16.5.1     Memtable Allocation Types                                               
16.5.2     Slab Allocator                                                                       
16.5.3     Memtable Flush                                                                  
16.5.4     Row Cache                                                                         
16.5.5     Cassandra Memtable Metrics                                             
16.6            Hashing to the Rescue                                                                   
16.7            Compaction in Cassandra                                                               
16.8            Tombstones in Cassandra                                                               
16.9            Hinted Handoff                                                                              
16.10       Anti-entropy and Read Repair                                                       
16.10.1     Anti-entropy                                                                     
16.10.2     Read repair                                                                       
16.11       Bloom Filters in Cassandra                                                             
16.11.1     Bloom Filter                                                                     
16.11.2     Changing Bloom Filter                                                    
16.12       Load Balancing in Cassandra                                                            
16.13       Cassandra Read Process                                                                 
16.13.1     Example of Cassandra Read Process                                
16.14       Cassandra Write Process                                                                
16.15       Staged Event-Driven Architecture (SEDA)                                    
16.16       Cassandra Migration                                                                      
16.16.1     Migration Approaches                                                      
16.16.2     Partition Key Cache                                                          
16.16.3     Partition Summary                                                           
16.16.4     Partition Index                                                                  
16.16.5     Cache Migration Pattern                                                  
16.16.6     Estimating a Migration                                                     
16.17       Streaming                                                                                       
16.17.1     Streaming Based on Netty                                                
16.17.2     Zero-copy Streaming                                                        
16.17.3     Parallelizing of Streaming of Keyspaces                              
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 17 MongoDB 
17.1           Introduction                                                                                    
17.2           History of MongoDB
17.3           MongoDB Environment Setup                                                      
17.3.1     Install MongoDB on Windows                                          
17.3.2     Starting the MongoDB Server                                           
17.4            MongoDB Schema Design                                                             
17.5            Key Features of MongoDB                                                             
17.6            RDBMS vs. MongoDB                                                                  
17.7            MongoDB Query Language (MQL)                                              
17.8            MongoDB Database, Collection and Documents                          
17.9            MongoDB Server                                                                           
17.10       MongoDB Client Through the JavaScript’s Shell                          
17.11       CRUD Operation in MongoDB                                                     
17.11.1     Creating Database in MongoDB (C of CRUD)               
17.11.2     Creating Collection in MongoDB                                    
17.11.3     Listing Down the Databases Available in MongoDB       
17.11.4     Inserting Records into Collection (Table)                        
17.11.5     Showcasing the Current Database Used                           
17.11.6     Showcasing the Tables (Collections) in the
Current Database                                                              
17.11.7     Reading Collections in MongoDB (R of CRUD)           
17.11.8     Updating documents in MongoDB (U of CRUD)          
17.11.9     Delete Operation in MongoDB (D of CRUD)                
17.11.10     Dropping (Deleting) a Particular Database                     
17.12       Pretty () Method                                                                       
17.13       AND in MongoDB                                                                          
17.14       OR in MongoDB                                                                            
17.15       Using AND and OR Together                                                          
17.16       NOR in MongoDB                                                                          
17.17       NOT in MongoDB                                                                          
17.18       Creating and Querying Through Indexes                                       
17.18.1     The createIndex () method                                      
17.18.2     MongoDB’s dropIndex () Method                              
17.18.3     The dropIndexes () Method                                    
17.18.4     The getIndexes () Method                                      
17.19       Mongo Compass                                                                            
17.19.1     MongoDB Connection                                                    
17.19.2     Creating Database in Compass                                         
17.19.3     Adding Documents in Compass
17.19.4      MongoDB View                                                               
17.19.5      Filters in Compass                                                            
17.19.6      Sorting in Compass                                                          
17.19.7      Limit Option in Compass                                                 
17.19.8      Skip Option in Compass                                                  
17.19.9      Project Option in Compass                                              
17.19.10     Dropping a Database in Compass                                   
17.19.11     Dropping a Collection in Compass                                
17.19.12     Importing Documents in Compass                                 
17.19.13     Aggregations Option in Compass                                   
17.19.14     Schema Option in Compass                                           
17.19.15     Update MongoDB Compass with the Latest Version    
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 18 Big Data Visualizations  
18.1           Introduction                                                                                    
18.2           History of Data Visualization                                                         
18.3            Big Data Visualization                                                                     
18.4            Importance of Big Data Visualization                                            
18.5            How Does Data Visualization Work?                                            
18.6            Types of Data Visualization                                                              
18.7            Challenges of Big Data Visualization                                               
18.8            Introduction to Tableau                                                                  
18.8.1     Features of Tableau                                                            
18.8.2     Tableau Product Suite                                                        
18.8.3     Installation of Tableau                                                        
18.8.4     Tableau for Big Data Visualization                                       
18.9            Python for Data Visualization                                                        
18.9.1     Installation of Python                                                         
18.9.2     Visualization of Data Using Python                                   
18.9.3     Matplotlib                                                                          
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 19 Business Implementation of Big Data 
19.1           Introduction                                                                                    
19.2           Big Data in Business                                                                       
19.2.1     Big Data in Marketing                                                       
19.2.2     Big Data in Banking Sector
19.2.3     Big Data in Healthcare Sector
19.2.4     Big Data in Education Sector
19.3            Security in Big Data                                                                         
19.3.1     User Access Control                                                             
19.4            Big Data on Cloud                                                                           
19.5            Best Practices in Big Data Implementation                                     
19.6            Latest Trends in Big Data                                                                
19.6.1     Big Data Analytics Will Incorporate Artificial Intelligence
19.6.2     The Use of Blockchain for Data Security Will Increase      
19.6.3     The Internet of Things (IoT) Will Drive Streaming
Analytics Adoption                                                              
19.6.4     The Rise of DataOps                                                            
19.6.5     Data-as-a-Service (DaaS)                                                     
19.6.6     Data Mesh                                                                            
19.6.7     Synthetic Data                                                                      
19.6.8     Empowerment of Self-service Analytics                             
19.6.9     Data Democratization                                                          
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 20 Limitations of Hadoop and Solutions to Overcome Them 
20.1           Introduction                                                                                      
20.2           Problem with Small Files                                                                 
20.3            Vulnerability                                                                                    
20.4            Long Processing Time                                                                     
20.5            Not Easy to Use                                                                               
20.6            Supports Only Batch Processing                                                      
20.7            No Delta Iteration                                                                            
20.8            Security Issues                                                                                  
Summary | Multiple Choice Questions | Short-answer Questions | Essay-type Questions
Chapter 21 Big Data Case Studies 
21.1           Applications of Big Data in the Retail Industry                              
21.1.1     Customer Segmentation                                                       
21.1.2     Inventory Management                                                        
21.1.3     Price Optimization                                                               
21.1.4     Fraud Detection                                                                    
21.1.5     Supply Chain Optimization                                                  
21.1.6     Predictive Analytics
21.2            Applications of Big Data in the Logistics Industry                       
21.2.1     Route Optimization                                                              
21.2.2     Supply Chain Visibility                                                        
21.2.3     Risk Management                                                                
21.2.4     Fleet Management                                                                
21.2.5     Warehouse Optimization                                                      
21.2.6     Pricing Optimization                                                            
21.2.7     Quality Control                                                                    
21.2.8     Environmental Sustainability                                               
21.3            Applications of Big Data in the Manufacturing Industry                
21.3.1     Predictive Maintenance                                                        
21.3.2     Quality Control                                                                    
21.3.3     Supply Chain Optimization                                                  
21.3.4     Production Optimization                                                      
21.3.5     Energy Efficiency                                                                
21.3.6     Product Development                                                           
21.3.7     Risk Management                                                                
21.3.8     Warranty Analytics                                                              
21.3.9     Customer Analytics                                                              
21.4            Applications of Big Data in the Travel Industry                              
21.4.1     Customer Service                                                                 
21.4.2     Predictive Maintenance                                                        
21.4.3     Weather Forecasting                                                            
21.4.4     Customer Sentiment Analysis                                              
21.4.5     Destination Management                                                     
21.4.6     Operational Efficiency                                                         
21.4.7     Revenue Management                                                          
Summary
Appendix A: Model Questions                                                                
Appendix B: Capstone Projects                                                              
Appendix C: Model Syllabi                                                                     
Index