Online Lecture Series: BD2K Guide to the Fundamentals of Data Science | International Union for the Scientific Study of Population

The NIH Big Data to Knowledge program is pleased to announce the BD2K Guide to the Fundamentals of Data Science, a series of online lectures given by experts from across the country covering a range of diverse topics in data science. This course is an introductory overview that assumes no prior knowledge or understanding of data science.

The series starts Friday, September 9^th and will run all year once per week at 12noon-1pm Easter Time/9am-10am Pacific Time. No registration is required.

***To join the meeting online: https://global.gotomeeting.com/join/786506213

***To join by phone only: +1 (872)240-3311; Access Code: 786-506-213

***First GoToMeeting? Try a test session: http://help.citrix.com/getready

This is a joint effort of the BD2K Training Coordinating Center (TCC), the BD2K Centers Coordination Center (BD2KCCC), and the NIH Office of the Associate Director of Data Science. For more information about the series and to see archived presentations, go to:

http://www.bigdatau.org/data-science-seminars

SCHEDULE

9/9/16: Introduction to big data and the data lifecycle (Mark Musen, Stanford).

9/16/16: SECTION 1: DATA MANAGEMENT OVERVIEW (Bill Hersh, Oregon Health Sciences).

9/23/16: Finding and accessing datasets, Indexing and Identifiers (Lucila Ohno-Machado, UCSD).

9/30/16: Data curation and Version control (Pascale Gaudet, Swiss Institute of Bioinformatics).

10/7/16: Ontologies (Michel Dumontier, Stanford).

10/14/16: Provenance(Zachary Ives, Penn).

10/21/16: Metadata standards (Susanna-Assunta Sansone, Oxford).

10/28/16: SECTION 2: DATA REPRESENTATION OVERVIEW (Anita Bandrowski, UCSD).

11/4/16: Databases and data warehouses, Data: structures, types, integrations (Chaitan Baru, NSF).

11/11/16: No lecture — Veteran’s Day.

11/18/16: Social networking data (TBD).

12/2/16: Data wrangling, normalization, preprocessing (Joseph Picone, Temple).

12/9/16: Exploratory Data Analysis (Brian Caffo, Johns Hopkins).

12/16/16 Natural Language Processing (Noemie Elhadad, Columbia).

The following topics will be covered in January through May of 2017:

SECTION 3: COMPUTING OVERVIEW

Workflows/pipelines

Programming and software engineering; API; optimization

Cloud, Parallel, Distributed Computing, and HPC

Commons: lessons learned, current state

SECTION 4: DATA MODELING AND INFERENCE OVERVIEW

Smoothing, Unsupervised Learning/Clustering/Density Estimation

Supervised Learning/prediction/ML, dimensionality reduction

Algorithms, incl. Optimization

Multiple testing, False Discovery rate

Data issues: Bias, Confounding, and Missing data

Causal inference

Data Visualization tools and communication

Modeling Synthesis

SECTION 5: ADDITIONAL TOPICS

Open science

Data sharing (including social obstacles)

Ethical Issues

Extra considerations/limitations for clinical data

Reproducible Research

SUMMARY and NIH context