This is a course taught as part of the curriculum of the Master program MARSYS-MARine EcoSYStem and Fishery Sciences at the Institute of Marine Ecosystem and Fishery Science (IMF), University of Hamburg, Germany.

The course is designed for 36hours in total (~120min per pecture), excluding the time students spend on their case studies, and held over the duration of one semester with 2 lectures per week. The course can also be used as a short-term (e.g. 1 week) intensive training or in a self-study mode at the Bachelor, Master or PhD level.

This course will **introduce** the **principles of data science** and how to mine out insights from data to understand complex behaviors, trends, and inferences. It will teach **skills in three major areas** with a focus on marine topics. However, the course can be utilized by any other scientist as key concepts are the same across disciplines.

*Data Analysis with R* builds heavily on the **tidyverse framework** and introduces various of its packages, which provide an R syntax ‘dialect’ to simplify data import, processing and visualization.

At the end of the course students will

- understand the principles of data science
- be trained in formulating and investigating research questions within the marine context
- feel confident working with one of the most common and popular software for data analysis
- will be familiar with various data types and data structures
- have learned how to
- import data into and export from R
- subset, manipulate and transform data
- write own functions and apply iterations such as loops
- compute descriptive statistics

- be able to visualize data in various ways, including creating maps
- understand the principles of statistical modelling and the mathematics behind simple linear regression models
- be able to
- apply different linear model families,
- compare and select models,
- visualize model results,
- evaluate model diagnostics using real datasets

- learn how to work as part of a research team to produce scientific products

This course assumes no prior knowledge in computer programming or statistical modelling. Some knowledge in basic statics, however, will be advantageous. For an efficient workflow, please make sure to download the data and install everything before working through the material provided. The course will be tought in the institutes’ computer room using R Studio Server Pro. The server version can be accessed from any location through an internet browser so no further preparation is required. But if you want to work on your own computer using a desktop version look at lecture 1 for installation informations.

The course provides 18 lectures (each ~ 120min) covering the topics **Programming in R**, **Data Exploration & Visualization** and **Statistical Modelling**. Each lecture contains throughout interactive quizzes and exercises that should be done by each student individually. Some of the exercises require also a bit of homework. Please note that the interactive quizzes only work in the browser and not in the PDF files.

During the slide show the following single character keyboard shortcuts enable alternate display modes:

`o` enables overview mode

`w` toggles widescreen mode

`f` enables fullscreen mode

`h` enables code highlight mode

`control` (Windows) or `command` (Mac) AND `+` / `-` to zoom in or out

`p` opens a separate window for additional information (does not work in Safari).

Pressing `esc` exits all of these modes.

As part of the R-Lab 2.0 project at the University of Hamburg, all quiz questions in the lectures have been additionally converted into a swirl course. Now, students can answer the quiz questions directly from within R, without all the additional information shown in the slides.

2 Case studies are provided:

- one on data exploration and visualization using hydrographical data for the Baltic Sea → suitable after lecture 11
- one on statistical modelling using fish catch data for the Baltic Sea → suitable after lecture 15 or later

These case studies are meant as group exercises (3-4 students) but can easily be split into individual tasks. Each group is expected to work in R Markdown for communicating their work progress and results.

The solution script for case study 2 will be made accessible for a short time after assignments were submitted. If you are not part of the course and interested in the solution script feel free to contact me!

Page built on: 📆 2018-03-10 ‒ 🕢 12:05:32