Full Title Data Analytics and Visualisation

Short Title Data Analytics and Visualisati

Code COMP09014
Level 09
Credit 05

Author Unnikrishnan, Saritha
Department Computing & Electronic Eng

Subject Area Computing
Attendence N/A%
Fee

Description

This module covers the data analysis and visualisation skills required for a Masters in Data Science. This topic will introduce the learner to the SOTA data analysis tools and techniques, which help to interpret and extract meaningful information from data (using public datasets). The learner will gain expertise in data preprocessing, exploratory data analysis and visualisation, pattern recognition and discriminative classification. The learner will work on designing and creating interactive dashboard for data visualisation and also on solving real-world data analysis and classification problems.


Indicative Syllabus

Data preprocessing:

  • Feature scaling and standardisation 
  • Handling Noise / Missing data
  • Handling categorical data and label encoding
  • Splitting the dataset into training and testing sets

Data visualisation:

  • Graphics fundamentals
  • How data is visually encoded for human perception and understanding
  • Mapping visualisation techniques to specific datasets
  • 2D and 3D visualisation
  • Python/R libraries will be used
  • Data sets, Kaggle will be used as data sources
  • Data visualisation tools like Tableau and Power BI will be introduced

Pattern recognition:

  • How to perform exploratory data analysis
  • How to identify interesting patterns in the data
  • Unsupervised clustering techniques such as K-means and Principal Component Analysis (PCA) will be discussed.
  • Programming languages such as Python/R (but not restricted to) will be used..

Discriminative Classification:

  • How to classify new data into categories using supervised machine learning techniques
  • Logistic Regression, Linear Discriminant Analysis (LDA), Decision trees/Random Forest will be discussed
  • Python/R libraries (but not restricted to) will be used

 


Learning Outcomes
On completion of this module the learner will/should be able to
  1. Apply techniques such as feature scaling, standardisation, missing data handling and encoding to preprocess and clean data.  

  2. Visualise the processed data graphically, identify the correlation between the features, interpret the linear/non-linear relationships in the data.

  3. Make meaningful inferences from the data, remove/retain features based on variable importance.

  4. Create interactive dashboard for data visualisation.

  5. Identify patterns in the data using exploratory data analysis and clustering techniques.

  6. Discriminate patterns in new data using trained discriminative classification models.

  7. Summarise, analyse, and relate research in the area of exploratory data analysis and pattern recognition in writing. Appreciate the data ethics and constraints that apply to the use of data in real-world scenarios.

  8. Design, implement and test a real world problem using the above learned techniques.


Assessment Strategies

Three individual assignments and a final project are given to assess the Learning outcomes.

20% data preprocessing assessment

25% data visualisation and analysis assessment

10% research assessment

45% project to find and solve a research problem/use case provided by the learner in the data visualisation area. 

 

 


Module Dependencies
Pre Requisite Modules
Co Requisite Modules
Incompatible Modules

Coursework Assessment Breakdown %
End of Semester / Year Formal Examination 100 %

Coursework Assessment Breakdown

Description Outcome Assessed % of Total Assessment Week
Moodle Quiz 1,2 20 Week 6
Problem based assignment 3,4,5 25 Week 9
Written assignment on literature review 7 10 Week 11
Final project 1,2,3,4,5,6,8 45 Week 13


End Exam Assessment Breakdown

Description Outcome Assessed % of Total Assessment Week


Mode Workload

Type Location Description Hours Frequency Avg Weekly Workload

Total Average Weekly Learner Workload 0.00 Hours

Mode Workload

Type Location Description Hours Frequency Avg Weekly Workload

Total Average Weekly Learner Workload 0.00 Hours

Mode Workload

Type Location Description Hours Frequency Avg Weekly Workload

Total Average Weekly Learner Workload 0.00 Hours

Mode Workload

Type Location Description Hours Frequency Avg Weekly Workload
Lecture Online Lecture 1 Weekly 1.00
Laboratory Practical Online Practical 2 Weekly 2.00
Independent Learning Not Specified Independent Learning 4 Weekly 4.00

Total Average Weekly Learner Workload 3.00 Hours

Resources
Book Resources

Other Resources
Url Resources
Additional Info

ISBN BookList

Book Cover Book Details
Thomas H. Davenport, Jeanne G. Harris 2007 Competing on Analytics Harvard Business Press
ISBN-10 9781422103326 ISBN-13 1422103323
Bradley Efron, Trevor Hastie 2016 Computer Age Statistical Inference Cambridge University Press
ISBN-10 9781107149892 ISBN-13 1107149894
Stéphane Tufféry 2011 Data Mining and Statistics for Decision Making Wiley
ISBN-10 0470688297 ISBN-13 9780470688298