Machine Learning for Data Streams

with Practical Examples in MOA

A hands-on approach to tasks and techniques in data stream mining and real-time analytics, with examples in MOA, a popular freely available open-source software framework.

Today many information sources—including sensor networks, financial markets, social networks, and healthcare monitoring—are so-called data streams, arriving sequentially and at high speed. Analysis must take place in real time, with partial data and without the capacity to store the entire data set. This book presents algorithms and techniques used in data stream mining and real-time analytics. Taking a hands-on approach, the book demonstrates the techniques using MOA (Massive Online Analysis), a popular, freely available open-source software framework, allowing readers to try out the techniques after reading the explanations.

The book first offers a brief introduction to the topic, covering big data mining, basic methodologies for mining data streams, and a simple example of MOA. More detailed discussions follow, with chapters on sketching techniques, change, classification, ensemble methods, regression, clustering, and frequent pattern mining. Most of these chapters include exercises, an MOA-based lab session, or both. Finally, the book discusses the MOA software, covering the MOA graphical user interface, the command line, use of its API, and the development of new methods within MOA. The book will be an essential reference for readers who want to use data stream mining as a tool, researchers in innovation or data stream mining, and programmers who want to create new algorithms for MOA.

Albert Bifet is Professor of Computer Science at Télécom ParisTech.

Ricard Gavaldà is Professor of Computer Science at the Politècnica de Catalunya, Barcelona.

Geoff Holmes is Professor and Dean of Computing at the University of Waikato in Hamilton, New Zealand.

Bernhard Pfahringer is Professor of Computer Science at the University of Auckland, New Zealand.
List of Figures xiii
List of Tables xvii
Preface xix
I Introduction 1
1 Introduction 3
2 Big Data Stream Mining 11
3 Hands-on Introduction to MOA 21
II Stream Mining 33
4 Streams and Sketches 35
5 Dealing with Change 67
6 Classification 85
7 Ensemble Methods 129
8 Regression 143
9 Clustering 149
10 Frequent Pattern Mining 165
III The MOA Software 185
11 Introduction to MOA and Its Ecosystem 187
12 The Graphical User Interface 201
13 Using the Command Line 217
14 Using the API
15 Developing New Methods in MOA 227
Bibliography 239
Index 257

About

A hands-on approach to tasks and techniques in data stream mining and real-time analytics, with examples in MOA, a popular freely available open-source software framework.

Today many information sources—including sensor networks, financial markets, social networks, and healthcare monitoring—are so-called data streams, arriving sequentially and at high speed. Analysis must take place in real time, with partial data and without the capacity to store the entire data set. This book presents algorithms and techniques used in data stream mining and real-time analytics. Taking a hands-on approach, the book demonstrates the techniques using MOA (Massive Online Analysis), a popular, freely available open-source software framework, allowing readers to try out the techniques after reading the explanations.

The book first offers a brief introduction to the topic, covering big data mining, basic methodologies for mining data streams, and a simple example of MOA. More detailed discussions follow, with chapters on sketching techniques, change, classification, ensemble methods, regression, clustering, and frequent pattern mining. Most of these chapters include exercises, an MOA-based lab session, or both. Finally, the book discusses the MOA software, covering the MOA graphical user interface, the command line, use of its API, and the development of new methods within MOA. The book will be an essential reference for readers who want to use data stream mining as a tool, researchers in innovation or data stream mining, and programmers who want to create new algorithms for MOA.

Author

Albert Bifet is Professor of Computer Science at Télécom ParisTech.

Ricard Gavaldà is Professor of Computer Science at the Politècnica de Catalunya, Barcelona.

Geoff Holmes is Professor and Dean of Computing at the University of Waikato in Hamilton, New Zealand.

Bernhard Pfahringer is Professor of Computer Science at the University of Auckland, New Zealand.

Table of Contents

List of Figures xiii
List of Tables xvii
Preface xix
I Introduction 1
1 Introduction 3
2 Big Data Stream Mining 11
3 Hands-on Introduction to MOA 21
II Stream Mining 33
4 Streams and Sketches 35
5 Dealing with Change 67
6 Classification 85
7 Ensemble Methods 129
8 Regression 143
9 Clustering 149
10 Frequent Pattern Mining 165
III The MOA Software 185
11 Introduction to MOA and Its Ecosystem 187
12 The Graphical User Interface 201
13 Using the Command Line 217
14 Using the API
15 Developing New Methods in MOA 227
Bibliography 239
Index 257