Introduction to the foundations of Data Science with focus on business applications. We emphasize supervised machine learning algorithms to build predictive decision support models for credit risk, marketing analytics, and several other use cases.
"Data is the new oil..."
You might have heard this or a similar phrase before. Big Data, Analytics, Data Science, Artificial Intelligence, Machine Learning, ... many 'colorful' terms refer to the increasing use of analytical models that aim at extracting insight from the vast amounts of data that the digital society generates.
The module Business Analytics and Data Science (BADS) is concerned with theories, concepts, and practices to inform and support decision-making by means of formal, data-driven methods. We revisit different forms of model-based decision support, examine the standard workflow of modern data analysis, and discuss a broad set of models for descriptive and predictive analytics. Predictive modeling is the main focus of the course. Many corporate use cases of analytics and data science involve the prediction of some future state or behavior. For example, a marker might want to predict the way in which customers respond to certain marketing stimuli.
We introduce statistical principles of learning from data and cover several common prediction methods, ranging from established industry workhorses like logistic regression to state-of-the-art machine learning algorithms such as gradient boosting and heterogeneous ensembles. Subsequently, we dive into specific tasks in the predictive modeling pipeline such as feature selection or remedies to the class imbalance problem. Given a variety of specialized modeling tasks and challenges we focus on topics with high relevance to the managerial decision problem. Cost sensitivity is a good example. A prediction may, and typically will, be inaccurate. When building a predictive model to guide managerial decision-making, different types of errors are often associated with different costs. How can we make our analytical models aware of error costs? Beyond error costs, what is a good approach to judging the adequacy of an analytical model in business applications? Another crucial point related to model explainability. Even if a model produces accurate predictions, to build trust and use it in mission-critical settings, we must ensure that the inner logic of the model is transparent, fully understood, and agrees with domain knowledge. To that end, we introduce the emerging XAI (explainable AI) movement and discuss selected approaches to explain and diagnose block-box machine learning models. Throughout the course, we highlight interdependencies between methods (e.g., a machine learning algorithm) and their applications in business, emphasizing use cases in marketing and credit risk analytics.
The module consists of a lecture and a tutorial session. The lecture introduces relevant concepts and provides room for discussion. The goal of the tutorial is to empower students to develop state-of-the-art analytical models using contemporary programming libraries for data science. Specifically, we will use the Python programming language. Students receive demos on how to implement specific algorithms from scratch and work with real-world data to solve common modeling tasks themselves.
In summary, the module pursues the following learning objectives:
Students are familiar with the three branches of descriptive, predictive, and prescriptive analytics and appreciate the relationships between these streams.
Given some data, students are able to select appropriate techniques to summarize and visualize the data so as to maximize managerial insight.
Students understand the potential and also the limitations of predictive analytics to aid decision-making. They comprehend when and how business applications can benefit from predictive analytics. Given a decision task, they are able to recommend suitable prediction methods. Students are familiar with statistical programming languages. Using standard tools, they can develop basic and advanced prediction models and assess their accuracy in a statistically sound manner.
It is not strictly necessary that students join the course with prior experience in computer programming. We reserve the first two weeks of the tutorial to introduce programming principles and the Python programming language. That said, high and continuous engagement with the module in general and the tutorial in particular including ample time for self-study is expected to ensure completion of our ambitious learning program. Students who wish to prepare for the course are invited to complete some of the many excellent tutorials on Python programming. A simple web search for "Python programming introduction" produces tons of results, or check out the corresponding resources on Python.org.
Looking forward to seeing you in BADS.
Stefan received a PhD from the University of Hamburg in 2007, where he also completed his habilitation on decision analysis and support using ensemble forecasting models in 2012. He then joined the Humboldt-University of Berlin in 2014, where he heads the Chair of Information Systems at the School of Business and Economics. He serves as an associate editor for the International Journal of Business Analytics, Digital Finance, and the International Journal of Forecasting, and as department editor of Business and Information System Engineering (BISE). Stefan has secured substantial amounts of research funding and published several papers in leading international journals and conferences. His research concerns the support of managerial decision-making using quantitative empirical methods. He specializes in applications of (deep) machine learning techniques in the broad scope of marketing and risk analytics. Stefan actively participates in knowledge transfer and consulting projects with industry partners; from start-up companies to global players and not-for-profit organizations.