Full Title Programming for Big Data

Short Title Programming for Big Data

Code COMP09017
Level 09
Credit 05

Author Loftus, Mary
Department Computing & Electronic Eng

Subject Area Computing
Attendence N/A%
Fee

Description

This module introduces students to the architectures and tools underpinning the management and processing of large scale datasets, which are too big for conventional approaches. Students will understand these architectures and tools and be able to use them to code solutions, query data from structured, unstructured, and streamed sources, and analyse that data using appropriate algorithms.

Students will also be able to evaluate a variety of Big Data Cloud platform providers e.g. Amazon AWS, Microsoft Azure, in order to deploy and host data solutions. 


Indicative Syllabus

Understanding the Big Data Context 

Outline the challenges that come with Big Data, and how they break traditional paradigms. 

 

Analysing a typical Big Data Technology Stack 

Outline a typical Big Data technology stack and examine the technologies at each layer: 

Describe the role of distributed file systems e.g. Hadoop, Apache Spark 

Describe the role of a distributed processing system e.g. MapReduce

Describe some querying approaches for different types of data stores
 

Installing and/or Configuring a Distributed File System e.g. Hadoop 

- Downloading and installing a Distributed File System

- Downloading and installing Apache Spark

- Running HDFS on Amazon AWS

- Running HDFS on Azure
 

Working with Data - Query Languages & Environments  

- Relational Data e.g. Hive, MySQL 

- Non-Relational Data e.g. HBase & Cassandra

- Streaming Data e.g. Apache Spark
 

Machine Learning with Spark and Python

- Implementing Machine Learning Algorithms e.g. Linear Regression, Logistic Regression using Spark & Python

- Collaborative Filtering for Recommender Systems

 

Graph Analytics with GraphX

- Define & Describe a Graph

- Identify scenarios where Graph Databases suit your data

- Analyse data using GraphX 

 

Building Real World Applications

- Design and implement systems using Big Data architectures, tools & frameworks across a variety of industry domains e.g. business intelligence, recommender systems, Internet of Things, industrial and manufacturing sensors, health informatics

- Consider the ethical implications and risks of your proposed  solution within the design process


Learning Outcomes
On completion of this module the learner will/should be able to
  1. Discuss the problem of managing data at scale and why traditional data management systems are insufficient

  2. Evaluate state of the art architectures, tools & frameworks for working with Big Data

  3. Implement Big Data solutions using a synthesis of different data paradigms e.g. distributed data and streaming data, structured and unstructured data

  4. Compare a variety of Big Data query languages and identify optimum query approaches for a variety of scenarios

  5. Outline some well-known Big Data problem scenarios from a variety of domains, and from student's own experience, and evaluate some standard, state-of-the-art approaches to solving them with appropriate architectures, tools, & frameworks

  6. Evaluate some of the human and organisational issues involved in integrating Big Data solutions across the enterprise, and in current research questions from the domain e.g. ethics, privacy, bias, and cybersecurity


Assessment Strategies

Problem based learning will be used in Weekly Labs, which will build week-on-week to form a semester-long project.

An end of semester project will challenge students to integrate and synthesise module knowledge into a cohesive fully formed piece of assessable work.


Module Dependencies
Pre Requisite Modules
Co Requisite Modules
Incompatible Modules

Coursework Assessment Breakdown %
Course Work / Continuous Assessment 100 %

Coursework Assessment Breakdown

Description Outcome Assessed % of Total Assessment Week
Project 1,2,3,4,5,6 50 End of Semester
Big Data Implementation I 2,3,4,5,6 10 Week 4
Big Data Implementation II 2,3,4,5 20 Week 8
Big Data Implementation III 3,4,5,6 20 Week 11


End Exam Assessment Breakdown

Description Outcome Assessed % of Total Assessment Week


Mode Workload

Type Location Description Hours Frequency Avg Weekly Workload
Lecture Computer Laboratory Lecture & Computer Lab 3 Weekly 3.00
Independent Learning Not Specified Independent Research & Reading 4 Weekly 4.00

Total Average Weekly Learner Workload 3.00 Hours

Mode Workload

Type Location Description Hours Frequency Avg Weekly Workload

Total Average Weekly Learner Workload 0.00 Hours

Mode Workload

Type Location Description Hours Frequency Avg Weekly Workload

Total Average Weekly Learner Workload 0.00 Hours

Mode Workload

Type Location Description Hours Frequency Avg Weekly Workload
Lecture Online Lecture 1.5 Weekly 1.50
Directed Learning Online Virtual Lab 1.5 Weekly 1.50
Independent Learning Online Independent Research & Reading 4 Weekly 4.00

Total Average Weekly Learner Workload 3.00 Hours

Resources
Book Resources

Other Resources
Url Resources
Additional Info

ISBN BookList

Book Cover Book Details
Venkat Ankam 2016 Big Data Analytics with Spark and Hadoop
ISBN-10 1785884697 ISBN-13 9781785884696
Aris Gkoulalas-Divanis, Abderrahim Labbi 2014 Large-Scale Data Analytics Springer Science & Business Media
ISBN-10 9781461492429 ISBN-13 1461492424
David Loshin 2013 Big Data Analytics Elsevier
ISBN-10 9780124186644 ISBN-13 0124186645