COMP08142 2018 Big Data

General Details

Full Title
Big Data
Transcript Title
Big Data
Code
COMP08142
Attendance
N/A %
Subject Area
COMP - Computing
Department
COEL - Computing & Electronic Eng
Level
08 - NFQ Level 8
Credit
05 - 05 Credits
Duration
Semester
Fee
Start Term
2018 - Full Academic Year 2018-19
End Term
9999 - The End of Time
Author(s)
Fran O'Regan, John Weir, Donny Hurley
Programme Membership
SG_KAPPL_H08 201800 Bachelor of Arts (Honours) in Computing in Application Design and User Experience SG_KSMAR_H08 201800 Bachelor of Science (Honours) in Computing in Smart Technologies
Description

The module is intended to introduce students to the concept of Big Data. By using Big Data techniques the student will learn how to work with problems in this field.

Learning Outcomes

On completion of this module the learner will/should be able to;

1.

Discuss the problem of managing data at scale and why traditional data management systems are insufficient.

2.

Describe Big Data programming models such as MapReduce and how to use them on real examples.

3.

Utilise distributed file systems and learn how to manage a cluster.

4.

Query large data sets in near real time and the importance of proper query languages for Big Data​.

Teaching and Learning Strategies

A practical approach to teaching and learning will be used. Problem-based learning will be used where possible. The one hour lecture will be used to introduce core concepts about the issue of Big Data Analytics. The lab practicals will be used to apply the concepts talked about in the lectures and to see them working on continuous data collections.

Module Assessment Strategies

The students will be assessed by a final exam contributing to 60% of their final grade. An ongoing project will be submitted before the end of term and will consist of implementing and querying a big data cluster. This project will be worked on and iterated throughout the semester with milestones applied throughout.

Repeat Assessments

Repeat exam and/or project

Indicative Syllabus

Discuss the problem of managing data at scale and why traditional data management systems are insufficient.

  • Examining the scale of the problem.
  • The possibilities and ethics of big data collection.

Describe Big Data programming models such as MapReduce and how to use them on real examples.

  • Discuss the various data management tools in the context of big data (e.g. relational, NoSQL).
  • Implement a big data programming model such as MapReduce.

Utilise distributed file systems and learn how to manage a cluster.

  • Hadoop.
  • HDFS.
  • Amazon S3.

Query large data sets in near real time and the importance of proper query languages for Big Data.

  • Utilise a big data query language such as Hive.
  • Compare with SQL.
  • Built-in user-defined functions (UDFs) to manipulate dates, strings, and other data-mining tools.

Coursework & Assessment Breakdown

Coursework & Continuous Assessment
40 %
End of Semester / Year Formal Exam
60 %

Coursework Assessment

Title Type Form Percent Week Learning Outcomes Assessed
1 Big Data Project Project Project 40 % OnGoing 2,3,4
             
             

End of Semester / Year Assessment

Title Type Form Percent Week Learning Outcomes Assessed
1 Final Exam Final Exam Closed Book Exam 60 % End of Semester 1,2,3,4
             
             

Full Time Mode Workload


Type Location Description Hours Frequency Avg Workload
Lecture Not Specified Lecture 1 Weekly 1.00
Laboratory Practical Computer Laboratory Practical 2 Weekly 2.00
Independent Learning Not Specified Independent Learning 4 Weekly 4.00
Total Full Time Average Weekly Learner Contact Time 3.00 Hours

Online Learning Mode Workload


Type Location Description Hours Frequency Avg Workload
Online Lecture Distance Learning Suite Lecture 1 Weekly 1.00
Directed Learning Not Specified Directed Learning 1 Weekly 1.00
Independent Learning Not Specified Independent Learning 5 Weekly 5.00
Total Online Learning Average Weekly Learner Contact Time 2.00 Hours

Required & Recommended Book List

Recommended Reading
2015-04-11 Hadoop: The Definitive Guide O'Reilly Media
ISBN 1491901632 ISBN-13 9781491901632

Ready to unlock the power of your data? With the fourth edition of this comprehensive guide, you'll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. You'll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This edition includes new case studies, updates on Hadoop 2, a refreshed HBase chapter, and new chapters on Crunch and Flume. Author Tom White also suggests learning paths for the book.Store large datasets with the Hadoop Distributed File System (HDFS) Run distributed computations with MapReduce Use Hadoop's data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster - or run Hadoop in the cloud Load data from relational databases into HDFS, using Sqoop Perform large-scale data processing with the Pig query language Analyze datasets with Hive, Hadoop's data warehousing system Take advantage of HBase for structured and semi-structured data, and ZooKeeper for building distributed systems

Recommended Reading
2017-04-21 Big-Data Analytics for Cloud, IoT and Cognitive Computing Wiley-Blackwell
ISBN 1119247020 ISBN-13 9781119247029
Recommended Reading
2012-12-22 MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems O'Reilly Media
ISBN 1449327176 ISBN-13 9781449327170

Design patterns for the MapReduce framework, until now, have been scattered among various research papers, blogs, and books. This handy guide brings together a unique collection of valuable MapReduce patterns that will save you time and effort regardless of the domain, language, or development framework you're using. Each pattern is explained in context, with pitfalls and caveats clearly identified - so you can avoid some of the common design mistakes when modeling your Big Data architecture. This book also provides a complete overview of MapReduce that explains its origins and implementations, and why design patterns are so important. Hadoop MapReduce code is provided to help you learn how to apply the design patterns by example. Topics include: Basic patterns, including map-only filter, group by, aggregation, distinct, and limit Joins: traditional reduce-side join, reduce-side join with Bloom filter, replicated join with distributed cache, merge join, Cartesian products, and intersections Binning, sharding for other systems, sorting, sampling, unions, and other patterns for organizing data Job optimization patterns, including multi-job map-only job folding, and overloading the key grouping to perform two jobs at once

Module Resources