Banner Viadrina

Data Analysis and Visualization with Python (R-Module)

Exam number: 6765

Semester: from 1st semester

Duration of the module: One semester

Form of the module (i.e. obligatory, elective etc.): Elective

Frequency of module offer: Summer semester 2016

Prerequisites: Interest in learning programming and data analysis (statistical / marketing); preliminary knowledge in statistics and/or programming concepts (you know about Python not only as about a snake) is a plus.

Applicability of module for other study programmes:
Obligatory or elective in other study programmes. For further information check regulations of the study programme.

Person responsible for module: Prof. Dr. Achim Koberstein

Name of the professor: Prof. Dr. Achim Koberstein

Language of teaching: English

ECTS-Credits (based on the workload): 6

Workload and its composition (self-study, contact time):
Contact time (Lecture, tutorial etc.): 60 h; self-study: 120 h

Contact hours (per week in semester): 4

Methods and duration of examination:
Submission of two home programming assignments, implementation of a working data analysis solution and its presentation (30-40 min)

Emphasis of the grade for the final grade: Please check regulations of the study programme

Aim of the module (expected learning outcomes and competencies to be acquired):
The participants learn basic programming concepts on the example of Python language in a data analysis framework. Python is one of the most demanded programming languages in scientific research and on highly-qualified jobs in industry. The course consists of two milestone blocks: (1) introduction to programming on the example of Python; (2) hands-on experience utilizing data analysis capabilities of Python for the research of publicly available (big)data.

Contents of the module:
The first block is closer to the standard class: lecture – tutorial – (homework). During this part student will get to know about programming, Python language, its state-of-the-art capabilities in data analysis including data analysis theory.
Outline of the 1st block (under the consideration):
1. Baby Basics, Data Types, Data Collections
2. Strings, Decision Structures, Loop Structures
3. Functions, Objects, Classes, Interface, 1st Home Assignment
4. Work with Files, DBMS, SQL, Python as a Data Operator
5. Statistics, Plotting and Visualization
6. Regression, Clustering
7. Getting Data from the Internet, 2nd Home Assignment
In the second block students receive analysis cases with clearly defined research aims. In compact groups they have to develop a solution using received knowledge and perform data analysis. Students will acquire the capability to apply special analysis techniques and select appropriate programming methods to solve the business tasks.

Teaching and learning methods:
Lectures in programming are accompanied by tutorials and homework assignments. As a student you are expected to solve the exercises given home. Students will work in small groups to develop, implement and present working solutions of data analysis.

Literature (compulsory reading, recommended literature):
Zelle, John. B: Python Programming: An Introduction to Computer Science, 2nd Ed., Franklin, Beedle & Associates Inc. 2010.
Grus, Joel: Data Science from Scratch: First Principles with Python, O'Reilly Media 2015.
McKinney, Wes: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, O'Reilly Media 2012

Further information:
Registration in Moodle Viadrina required.