Banner Viadrina

International Business Administration

Machine Learning with R

Name of module in english: Machine Learning with R

Exam number: 6794

Semester: From the first semester

Duration of the module: One block

Form of the module (i.e. obligatory, elective etc.): Elective

Frequency of module offer: Every summer semester, usually 2nd block

Prerequisites: Students should have a basic understanding of data analysis and a deep interest in statistics. The ability to program in the R language is recommended. The module "Introduction to Portfolio Management with R" is an optional, but very useful, pre-course for Machine Learning with R.

Applicability of module for other study programmes:
Obligatory or elective in other study programmes. For further information check regulations of the study programme.

Person responsible for module: Prof. Dr. Sven Husmann

Name of the professor: Prof. Dr. Sven Husmann

Language of teaching: English

ECTS-Credits (based on the workload): 6

Workload and its composition (self-study, contact time):
Contact time: 34 h.; self-study: 146 h

Contact hours (per week in semester): 3

Methods and duration of examination:
Oral exam of approximately 20 minutes and a home assignment, both on the same day.

Emphasis of the grade for the final grade: Please check the regulations of the study programs.

Aim of the module (expected learning outcomes and competencies to be acquired):
In practice, data prediction is a central concern of many economic actors, for example forecasting future stock returns based on current company and capital market data or predicting default risks on loans based on personal data such as age, occupation and place of residence. In the Machine Learning course, you will learn more generally how to use statistical methods to search for structural relationships (models) in historical datasets, for example, the influence of age, occupation, and place of residence (inputs) on the individual interest rate of personal loans (output). In particular, we are interested in models that lead to the best possible output prediction when the estimated models are applied to new input data. Under ideal conditions, this problem is best solved with a classical OLS regression. Unfortunately, these ideal conditions often do not exist in practice, so that even a nearly perfectly fitted model will yield relatively poor forecasts when applied to new data. Historically, statisticians have focused mainly on problems where the modeler wants to infer relationships from particularly small data. In recent years, however, and especially with the advent of the Internet, data sets have become quite large and also high-dimensional: In many analyses, we are dealing with potentially millions of examples with a large set of variables to study (Big Data) or we have only a moderate set of examples but still a large set of variables (high-dimensional). Since the true underlying relationship of the data is usually unknown, it must be properly estimated. Both the size of the data and the unknown relationship typically present challenges for many models, both numerically and statistically. In such cases, the modeler is often tempted to use increasingly complex models to achieve a better fit to the data at hand. However, this can lead to the problem of using sufficiently complex models to "discover" apparent relationships in historical data (in-sample), but the models turn out to be completely unsuitable for forecasting (out-of-sample). Machine learning (also called statistical learning) methods can provide you with a practical set of tools to make the best possible predictions based on large amounts of data with unknown structural relationships. The Machine Learning with R course will help you understand the relationship between in-sample and out-of-sample estimation and generally enable you to select and fit appropriate models for forecasting. Since R is one of the most widely used programming languages for statistics and machine learning, in this course you will learn how to program your own machine learning models using R. However, you will also learn about prominent machine learning frameworks that originate from other programming languages and how to interface with them directly from R.

Contents of the module:

  • Bias-Variance Tradeoff
  • Penalized Regression: Ridge, Lasso, Elastic Net
  • Cross-Validation, IC based Methods
  • Neural Networks: Design, Training, Regularization
  • Keras and Tensorflow

Teaching and learning methods:
Online lectures, online exercises, case studies, self-study. You need a computer with a webcam and a microphone to participate in this course.

Special features (e.g. percentage of online-work, practice, guest speaker, etc.):
All lectures and exercises are embedded in online tutorial sites where students can also run code and self-assess their learning progress. Furthermore, every week there will be a case study that needs to be prepared and will be discussed in groups.

Literature (compulsory reading, recommended literature):
Literature will be announced in the Moodle course.

Further information:
Registration in Moodle Viadrina required.