Banner Viadrina

International Business Administration

Data Analysis and Visualization with R

Exam number: 6853

Semester: from 1st semester

Duration of the module: One block

Form of the module (i.e. obligatory, elective etc.): Obligatory for DSDS, otherwise elective

Frequency of module offer: Every winter semester, usually 2nd block

Prerequisites: Students should bring strong interest in the R programming language and good knowledge of statistics on a Bachelor level. Moreover, students are strongly advised to attend an accompanying statistics course on the Master level to gain further insights in the theoretical foundations of the models applied in this course.

Applicability of module for other study programmes:
Obligatory or elective in other study programmes. For further information check regulations of the study programme.

Person responsible for module: Prof. Dr. Sven Husmann

Name of the professor: Prof. Dr. Sven Husmann, Dr. Antoniya Shivarova, Dr. Rick Steinert

Language of teaching: English

ECTS-Credits (based on the workload): 6

Workload and its composition (self-study, contact time):
Contact time: 42 h, self-study: 138 h

Contact hours (per week in semester): 4

Methods and duration of examination:
A written home assignment (approx. 3 hours) and an oral exam (approx. 20 minutes), both on the same day.

Emphasis of the grade for the final grade: Please check the regulations of the study programs.

Aim of the module (expected learning outcomes and competencies to be acquired):
In an increasingly digitized world, incredible amounts of data are continuously being produced, for example, in 2020, every minute 147000 messages were shared on Facebook, 500 hours of videos were uploaded to YouTube, and more than 41 billion messages were exchanged on WhatsApp (www.domo.com/learn/data-never-sleeps-8). It is obvious that there is a great interest in systematically collecting and analyzing these huge amounts of data and using them for future decision-making. In this course, we shed light on the topic of data analysis from a practitioner's perspective. Since large and complex data can only be properly handled with professional data analysis software, we will leave the convenience of simple data spreadsheet calculation applications like MS Excel and turn to the R programming language. We will start with the basics in R, i.e. installing the IDE RStudio, and work our way up by providing the answers to important questions, such as how to efficiently access data and use basic programming features, for example functions and loops, and finally, how to implement various modeling strategies. Since this is an applied course, we will always focus on the applications of the data analysis process and thus, visualize the corresponding results in an understandable way. The participants of this course will learn how to competently apply and critically analyze various standard techniques of data analysis and visualization. In detail, the students will gain insight on the whole process of data analysis: from the first screening of the data to the final model decision. Basic methods using brute force search as well as established and powerful models from various research areas will be implemented.

Contents of the module:

  • R programming
  • Bootstrapping and simulation
  • Heuristics and linear models
  • Constrained and unconstrained optimization
  • Logistic Regression, linear discriminant analysis
  • Decision trees
  • Bagging, random forests, boosting

Teaching and learning methods:
Online lectures and exercises, case studies, consultations, self-study. A computer with a webcam and a microphone is required to participate in this course.

Special features (e.g. percentage of online-work, practice, guest speaker, etc.):
All lectures and exercises are embedded in online tutorial sites where students can also run code and self-assess their learning progress. Furthermore, every week there will be a case study that will be discussed during consultation hours.

Literature (compulsory reading, recommended literature):
Literature will be announced in the Moodle course.

Further information:
Self-enrollment in the Moodle course is required.