COMP07190 2020 Data Mining

General Details

Full Title
Data Mining
Transcript Title
Data Mining
Code
COMP07190
Attendance
N/A %
Subject Area
COMP - Computing
Department
HEAL - Health & Nutritional Sciences
Level
07 - NFQ Level 7
Credit
05 - 05 Credits
Duration
Semester
Fee
Start Term
2020 - Full Academic Year 2020-21
End Term
9999 - The End of Time
Author(s)
Padraig McGourty, Thomas Smyth, Richeal Burns, Dr. Sasirekha Palaniswamy Lecturer
Programme Membership
SG_SINFO_B07 202000 Bachelor of Science in Health and Medical Information Science SG_SDATA_E07 202000 Certificate in Health Data Analytics
Description

This module aims to introduce basic concepts, principles, methods and techniques of Data Mining and its applications. It will help develop skills and techniques for practical applications of data mining and engage in the pattern discovery on big data. The importance of pattern discovery and interesting applications of data mining will be discussed. Data mining tasks such as Clustering, Classification, Rule learning and Data mining processes namely Data preparation, task identification and classification/prediction algorithms will be presented. Machine learning algorithms, Neural networks, clustering approaches and text mining applications in Big data will be introduced.

Learning Outcomes

On completion of this module the learner will/should be able to;

1.

To understand basic concepts, principles, methods and techniques of Data Mining and its applications

2.

To develop skills and techniques for practical applications of data mining and engage in pattern discovery on big data (Data Exploration)

3.

To apply techniques in Knowledge Discovery in Databases: From data to knowledge using data mining tools and techniques (Data Mining).

4.

To apply techniques in visualization of data to aid data mining and displaying the results of data mining (Data Presentation).

Teaching and Learning Strategies

Teaching and learning for this module will be carried out through a combination of online lectures, computer based critical appraisal and online practical's. Blended learning approaches will be adapted consistent with digital learning paradigms. 

Online delivery of 1 lecture per week with self directed learning. Guidance provided on relevant areas for self directed learning.

Online delivery of 2 hour workshop weekly, where students will be directed to complete interactive type activities to enhance their study skills and knowledge.

Question and answer sessions provided in the live classroom.

A variety of methods of instruction such as discussion, group work, interactive exercises, use of online resources and/or use of audio/visual material will be provided. Core skills will be embedded into all modules to ensure all students have an equal opportunity to succeed. This may include academic writing, oral presentations, reading techniques or research abilities. Accessible materials will be provided to students, including slides, documents, audio/visual material and textbooks enabling students slow down speed up recordings etc in accordance with universal distance learning.

All module content will be based on the principles of UDL to ensure equitable access to content and learning.

Module Assessment Strategies

This module will be assessed by both a final project (50%) and continuous assessment (50%)

Repeat Assessments

Repeat examination will follow a similar format as applicable.

Indicative Syllabus

  • Introduction to basic concepts, principles, methods and techniques of Data Mining and its applications.

  • Introduction to tools, methods and techniques for practical applications of data mining 

  • Practical application of pattern/knowledge discovery on big data. 

  • Working with APIs

  • The importance of pattern discovery and interesting applications of data mining

  • Data mining tasks such as Clustering, Classification, Rule learning 

  • Data mining processes namely Data preparation, task identification and classification/prediction algorithms. 

  • Machine learning algorithms, Neural networks, clustering approaches and text mining applications in Big data will be introduced.

  • Knowledge discovery in databases

  • Privacy, Security and Legal aspects of Data Mining

  • Data Mining applications - eg healthcare, retail etc

  • Pitfalls of Data Mining

Coursework & Assessment Breakdown

Coursework & Continuous Assessment
100 %

Coursework Assessment

Title Type Form Percent Week Learning Outcomes Assessed
1 Data Mining - Assessment Continuous Assessment Assessment 50 % OnGoing 1,2
2 Data Mining - Project Project Project 50 % End of Semester 2,3,4
             

Online Learning Mode Workload


Type Location Description Hours Frequency Avg Workload
Lecture Online Data Mining - Lecture 1 Weekly 1.00
Problem Based Learning Online Data Mining - PBL 2 Weekly 2.00
Independent Learning Not Specified Independent study 4 Weekly 4.00
Total Online Learning Average Weekly Learner Contact Time 3.00 Hours

Required & Recommended Book List

Required Reading
2011-08-08 Data Mining John Wiley & Sons
ISBN 9781118029121 ISBN-13 1118029127

This book reviews state-of-the-art methodologies and techniques for analyzing enormous quantities of raw data in high-dimensional data spaces, to extract new information for decision making. The goal of this book is to provide a single introductory source, organized in a systematic way, in which we could direct the readers in analysis of large data sets, through the explanation of basic concepts, models and methodologies developed in recent decades. If you are an instructor or professor and would like to obtain instructors materials, please visit http://booksupport.wiley.com If you are an instructor or professor and would like to obtain a solutions manual, please send an email to: pressbooks@ieee.org

Required Reading
2011-04-18 Data Mining and Statistics for Decision Making Wiley
ISBN 0470688297 ISBN-13 9780470688298

Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge. Data mining is usually associated with a business or an organization's need to identify trends and profiles, allowing, for example, retailers to discover patterns on which to base marketing objectives. This book looks at both classical and recent techniques of data mining, such as clustering, discriminant analysis, logistic regression, generalized linear models, regularized regression, PLS regression, decision trees, neural networks, support vector machines, Vapnik theory, naive Bayesian classifier, ensemble learning and detection of association rules. They are discussed along with illustrative examples throughout the book to explain the theory of these methods, as well as their strengths and limitations. Key Features: Presents a comprehensive introduction to all techniques used in data mining and statistical learning, from classical to latest techniques. Starts from basic principles up to advanced concepts. Includes many step-by-step examples with the main software (R, SAS, IBM SPSS) as well as a thorough discussion and comparison of those software. Gives practical tips for data mining implementation to solve real world problems. Looks at a range of tools and applications, such as association rules, web mining and text mining, with a special focus on credit scoring. Supported by an accompanying website hosting datasets and user analysis. Statisticians and business intelligence analysts, students as well as computer science, biology, marketing and financial risk professionals in both commercial and government organizations across all business and industry sectors will benefit from this book.

Required Reading
20/12/2012 Data Mining - Concepts and Techniques Morgan Kaufmann Series in Data Management Systems

Required Reading
20/06/2020 Data Mining Methods and Models Wiley

Required Reading
2003-05-29 Exploratory Data Mining and Data Cleaning Wiley-Interscience
ISBN 0471268518 ISBN-13 9780471268512

Written for practitioners of data mining, data cleaning and database management. Presents a technical treatment of data quality including process, metrics, tools and algorithms. Focuses on developing an evolving modeling strategy through an iterative data exploration loop and incorporation of domain knowledge. Addresses methods of detecting, quantifying and correcting data quality issues that can have a significant impact on findings and decisions, using commercially available tools as well as new algorithmic approaches. Uses case studies to illustrate applications in real life scenarios. Highlights new approaches and methodologies, such as the DataSphere space partitioning and summary based analysis techniques. Exploratory Data Mining and Data Cleaning will serve as an important reference for serious data analysts who need to analyze large amounts of unfamiliar data, managers of operations databases, and students in undergraduate or graduate level courses dealing with large scale data analys is and data mining.

Module Resources

Other Resources

R

Python

SQL