Full Title Introductory Programming for Data Science 

Short Title Introductory Programming for D

Code COMP09016
Level 09
Credit 05

Author Hurley, Donny
Department Computing & Electronic Eng

Subject Area Computing
Attendence N/A%
Fee

Description

Programming for Data Science will introduce the learner to the core concepts of data science programming. The student will be introduced to the Python programming language (specifically SciPy) generally, and will employ functions to manipulate lists, before implementing multi-dimensional arrays using Numpy or similar in order to perform statistical operations and linear equations. The student will then manipulate data frames and time-series data using pandas or similar. The student will learn techniques for reading in data from multiple sources, scraping data from APIs and unstructured websites. Databases will be introduced with SQL programming. Creating, modifying and querying the database through Python and preparing the database results within the SciPy structures for future data analysis. The module will assume the student will have some experience with at least one programming language. 


Indicative Syllabus

Python Introduction 

  • Review common Python functionality – show the differences to other programming languages and “Pythonic” style of programming 

  • Introduced to the Jupyter Notebook and Spyder IDE for Python programming 

SciPy 

  • Numpy and its data structures 

  • Matplotlib for simple data presentations and visualisations 

  • Linalg for manipulating vectors and matrices, implementing linear algebra techniques and solving problems such as computing eigenvalues/eigenvectors 

  • Stats package for generating summary statistics, calculating p-values and running statistical tests – these will show how to perform the tasks in the concurrently running Applied Probability and Statistics module 

Pandas 

  • Learn how to read in data into DataFrame structures from difference files sources such as CSV, JSON, XML. 

  • How to query these DataFrame structures 

  • The details about how such structures are indexed 

  • Merge DataFrames, generate summary tables 

Data Gathering 

  • Tools to query APIs 

  • Converting Data received into a format that can be used directly with Pandas DataFrames 

  • Using BeautifulSoup to scrape unstructured webpages 

  • Techniques to ensure consistency of data and amalgamating data where appropriate 

  • Group data into logical pieces and manipulate dates 

Database 

  • Compare strengths/weaknesses of different Database Management Systems (DBMS) – SQLite, PostgeSQL, MySQL, SQL Server 

  • Deploy a DBMS 

  • Create a simple database structure  

  • Learn some of the basic SQL statements, write and practice basic SQL hands-on on a live database 

  • How to use string patterns and ranges to search data 

  • How to sort and group data in result sets 

  • Learn how to work with multiple tables in a relational database using join operations 

  • Using Python to connect to databases and then create tables, load data, query data using SQL and analyse data using Python 

  • Reading data directly into Pandas DataFrames or other appropriate data structures 

  • Insert results into the database from Python structures 

Algorithms 

  • Discuss the need for efficiency for algorithms, particularly for Data Science 

  • Big O notation and how to compare algorithms 

  • Demonstrate the difference between different algorithm speeds and the potential real time analysis of data 


Learning Outcomes
On completion of this module the learner will/should be able to
  1. Create and manipulate vectors, matrices and n-dimensional tensors using a data science programming language  

  2. Employ appropriate functions to implement linear algebra and statistical procedures 

  3. Employ appropriate packages to create, read and manipulate tabular and time-series data  

  4. Evaluate and implement techniques to gather and store information from various unstructured data sources  

  5. Design and deploy database systems ensuring durability, high availability and high performance

  6. Interrogate database systems using an appropriate querying language  

  7. Describe techniques to analyse the efficiency of algorithms and to compare the effectiveness of different algorithms  


Assessment Strategies

3 assignments that assess the learner over the three topics of Python, Database and Algorithms. A final project to create a solution that satisfies all LOs for a data science specific use case, provided by the learner.

20% Python assignment – this will involve completing a Jupyter workbook. The workbook will test the student’s knowledge of basic Python (deploying methods), using SciPy and Pandas to read in files. The assignment will have short questions that will be filled in on the workbook.    

20% Database assignment – this will involve creating a database and then populating the tables from an external source. Queries will be designed for specific uses of the database 

10% Algorithm assignment – Run tests on different algorithms, analysing the results and commenting on the efficiency 

50% project to create a solution satisfying all LOs, for a data science specific use case, provided by the learner  


Module Dependencies
Pre Requisite Modules
Co Requisite Modules
Incompatible Modules

Coursework Assessment Breakdown %
Course Work / Continuous Assessment 100 %

Coursework Assessment Breakdown

Description Outcome Assessed % of Total Assessment Week
Python Assignment 1,2,3,4 20 Week 6
Database Assignment 5,6 20 Week 11
Algorithm Assignment 7 10 Week 13
Project 1,2,3,4,5,6 50 End of Semester


End Exam Assessment Breakdown

Description Outcome Assessed % of Total Assessment Week


Mode Workload

Type Location Description Hours Frequency Avg Weekly Workload
Lecture Lecture Theatre Lecture 1 Weekly 1.00
Laboratory Practical Computer Laboratory Practical 2 Weekly 2.00
Independent Learning Not Specified Directed Learning 4 Weekly 4.00

Total Average Weekly Learner Workload 3.00 Hours

Mode Workload

Type Location Description Hours Frequency Avg Weekly Workload

Total Average Weekly Learner Workload 0.00 Hours

Mode Workload

Type Location Description Hours Frequency Avg Weekly Workload

Total Average Weekly Learner Workload 0.00 Hours

Mode Workload

Type Location Description Hours Frequency Avg Weekly Workload
Online Lecture Distance Learning Suite Lecture 1.5 Weekly 1.50
Independent Learning Not Specified Independent Learning 5.5 Weekly 5.50

Total Average Weekly Learner Workload 1.50 Hours

Resources
Book Resources

Other Resources
Url Resources
Additional Info

ISBN BookList

Book Cover Book Details
O'Reilly Media, Inc 2017 Python Data Science Handbook, Jake VanderPlas, 2017 Bukupedia
ISBN-10 9781491912058 ISBN-13 1491912057
Wes McKinney 2017 Python for Data Analysis O'Reilly Media
ISBN-10 1491957662 ISBN-13 9781491957660
Anthony Debarros 2017 Practical SQL
ISBN-10 1593278276 ISBN-13 9781593278274
Korry Douglas, Susan Douglas 2003 PostgreSQL Sams Publishing
ISBN-10 0735712573 ISBN-13 9780735712577