Data Mining Techniques
Arun K Pujari
Price
950
ISBN
9789386235053
Language
English
Pages
432
Format
Paperback
Dimensions
120 x 180 mm
Year of Publishing
2016
Territorial Rights
World
Imprint
Universities Press

This book addresses all the major and latest techniques of data mining. It deals in detail with the algorithms for discovering association rules for clustering and building decision trees, and techniques such as neural networks, genetic algorithms, rough set theory and support vector machine used in data mining. The algorithmic details of different techniques such as Apriori, Pincer-search, Dynamic Itemset Counting, FP-Tree growth, SLIQ, SPRINT, BOAT, CART, RainForest, BIRCH, CURE, BUBBLE, ROCK, STIRR, PAM, CLARANS, DBSCAN, GSP, SPADE and SPIRIT are covered. The book also discusses the mining of web, spatial, temporal and text data. In the third edition, the chapter on data warehousing concepts was thoroughly revised to include multidimensional data modelling and  cube computation. The discussion on genetic algorithms was also expanded as a separate chapter. In the fourth edition, a chapter on ROC curve for visualizing the performance of a binary classifier and the method for computing AUC and its uses has been included.

Students of computer science, mathematical science and management will find this introductory textbook beneficial for a first course on the subject; the exposition of concepts with supporting illustrative examples and exercises makes it suitable for self-study as well.

Arun K Pujari, faculty and Dean of the School of Computer and Information Sciences, University of Hyderabad (UoH), is currently serving as the vice-chancellor of the Central University of Rajasthan. He obtained his post-graduation in mathematics from Sambalpur University (1974) and PhD from IIT Kanpur (1980). He joined UoH in 1985 as a reader and became a professor in 1990. Professor Pujari has wide experience as an administrator. He has served as a member of UGC, DST, DRDO, ISRO and AICTE, and as vice-chancellor of Sambalpur University (November 2008 to November 2011). He has also been on visiting assignments to several institutions that include the Institute of Industrial Sciences, University of Tokyo; International Institute of Software Technology, United Nations University, Macau; University of Memphis, USA; and Griffith University, Australia, among others.

Foreword xv
Prologue xvii
Preface to the Fourth Edition xix
Preface to the First Edition xxi
Acknowledgements
1. INTRODUCTION
1.1 Introduction 1.2 Data Mining as a Subject
1.3 Guide to this Book
2. DATA WAREHOUSING
2.1 Introduction
2.2 Data Warehouse Architecture
2.3 Dimensional Modelling
2.4 Categorisation of Hierarchies 2.5 Aggregate Function
2.6 Summarisability
2.7 Fact–Dimension Relationships
2.8 OLAP Operations
2.9 Lattice of Cuboids
2.10 OLAP Server
2.11 ROLAP
2.12 MOLAP
2.13 Cube Computation
2.14 Multiway Simultaneous Aggregation (ArrayCube)
2.15 BUC - Bottom-Up Cubing Algorithm
2.16 Condensed Cube
2.17 Coalescing
2.18 Dwarf
2.19 Other Cubing Techniques
2.20 Skycube
2.21 View Selection - Partial Materialisation
2.22 Data Marting
2.23 ETL
2.24 Data Cleaning
2.25 ELT vs. ETL
2.26 Cloud Data Warehousing Further Reading
Exercises
Bibliography
3. DATA MINING
3.1 Introduction
3.2 What is Data Mining?
3.3 Data Mining: Definitions
3.4 KDD vs. Data Mining 
3.5 DBMS vs. DM 
3.6 Other Related Areas 
3.7 DM Techniques 
3.8 Other Mining Problems 
3.9 Issues and Challenges in DM 
3.10 DM Application Areas 
3.11 DM Applications—Case Studies 
3.12 Conclusions 
Further Reading 
Exercises 
Bibliography 
4. ASSOCIATION RULES 
4.1 Introduction 
4.2 What is an Association Rule? 
4.3 Methods to Discover Association Rules 
4.4 Apriori Algorithm 
4.5 Partition Algorithm 
4.6 Pincer-Search Algorithm 
4.7 Dynamic Itemset Counting Algorithm 
4.8 FP-tree Growth Algorithm 
4.9 Eclat and dEclat 
4.10 Rapid Association Rule Mining (RARM) 
4.11 Discussion on Different Algorithms 
4.12 Incremental Algorithm 
4.13 Border Algorithm 
4.14 Generalised Association Rule
4.15 Association Rules with Item Constraints 
4.16 Summary 
Further Reading 
Exercises 
Bibliography 
5. CLUSTERING TECHNIQUES 
5.1 Introduction 
5.2 Clustering Paradigms 
5.3 Partitioning Algorithms 
5.4 k-Medoid Algorithms 
5.5 CLARA 
5.6 CLARANS 
5.7 Hierarchical Clustering 
5.8 DBSCAN 
5.9 BIRCH 
5.10 CURE 
5.11 Categorical Clustering Algorithms 
5.12 STIRR 
5.13 ROCK 
5.14 CACTUS 
5.15 Conclusions 
Further Reading 
Exercises 
Bibliography 
6. DECISION TREES 
6.1 Introduction 
6.2 What is a Decision Tree? 
6.3 Tree Construction Principle 
6.4 Best Split 
6.5 Splitting Indices 
6.6 Splitting Criteria 
6.7 Decision Tree Construction Algorithms 
6.8 CART 
6.9 ID3 
6.10 C4.5 
6.11 CHAID 
6.12 Summary 
6.13 Decision Tree Construction with Presorting 
6.14 RainForest 
6.15 Approximate Methods 
6.16 CLOUDS 
6.17 BOAT 
6.18 Pruning Technique 
6.19 Integration of Pruning and Construction 
6.20 Summary: An Ideal Algorithm 
6.21 Other Topics 
6.22 Conclusions 
Further Reading 
Exercises 
Bibliography 
7. ROUGH SET THEORY 
7.1 Introduction 
7.2 Definitions 
7.3 Example 
7.4 Reduct 
7. 5 Propositional Reasoning and PIAP to Compute Reducts 
7.6 Types of Reducts 
7.7 Rule Extraction 
7.8 Decision tree 
7.9 Rough Sets and Fuzzy Sets 
7.10 Granular Computing 
Further Reading 
Exercises 
Bibliography 
8. GENETIC ALGORITHM 
8.1 Introduction 
8.2 Basic Steps of GA 
8. 3 Selection 
8.4 Crossover 
8.5 Mutation 
8.6 Data Mining Using GA 
8.7 GA for Rule Discovery 
8.8 GA and Decision Tree 
8.9 Clustering Using GA 
Conclusions 
Further Reading 
Exercises 
Bibliography 
9. OTHER TECHNIQUES 
9.1 Introduction 
9.2 What is a Neural Network? 
9.3 Learning in NN 
9.4 Unsupervised Learning 
9.5 Data Mining Using NN: A Case Study 
9.6 Support Vector Machines 
9.7 Conclusions 
Further Reading 
Exercises 
Bibliography 

10. Performance Evaluation - ROC Curve
10.1 Introduction
10.2 Classification Accuracy
10.3 ROC Space
10.4 ROC Curves
10.5 ROC Curves and Class Distribution
10.6 ROC Convex Hull (ROCCH)
10.7 Method to Find the Optimal Threshold Point
10.8 Combining Classifiers
10.9 Area Under the ROC Curve (AUC )
10.10 Methods to Compute AUC     
10.11 Averaging ROC Curves
10.12 R OC for Multi-class Classifiers
10.13 Precision–Recall Graph
10.14 DET Curves
10.15 Cost Curves
Further Reading
Exercises
Bibliography
11. WEB MINING 
11.1 Introduction 
11.2 Web Mining 
11.3 Web Content Mining 
11.4 Web Structure Mining 
11.5 Web Usage Mining 
11.6 Text Mining 
11.7 Unstructured Text 
11.8 Episode Rule Discovery for Texts 
11.9 Hierarchy of Categories 
11.10 Text Clustering 
11.11 Conclusions 
Further Reading 
Exercises 
Bibliography 
12. TEMPORAL AND SPATIAL DATA MINING 
12.1 Introduction 
12.2 What is Temporal Data Mining? 
12.3 Temporal Association Rules 
12.4 Sequence Mining 
12.5 The GSP Algorithm 
12.6 SPADE 
12.7 SPIRIT 
12.8 WUM 
12.9 Episode Discovery 
12.10 Event Prediction Problem 
12.11 Time-series Analysis 
12.12 Spatial Mining 
12.13 Spatial Mining Tasks 
12.14 Spatial Clustering 
12.15 Spatial Trends 
12.16 Conclusions 
Further Reading 
Exercises 
Bibliography 
Index