The NIH Big Data to Knowledge program is pleased to announce the BD2K Guide to the Fundamentals of Data Science, a series of online lectures given by experts from across the country covering a range of diverse topics in data science.  This course is an introductory overview that assumes no prior knowledge or understanding of data science.

 

The series starts Friday, September 9th and will run all year once per week at 12noon-1pm Easter Time/9am-10am Pacific Time.  No registration is required.

 

 

***To join the meeting online: https://global.gotomeeting.com/join/786506213

***To join by phone only: +1 (872)240-3311; Access Code: 786-506-213

***First GoToMeeting? Try a test session: http://help.citrix.com/getready

 

This is a joint effort of the BD2K Training Coordinating Center (TCC), the BD2K Centers Coordination Center (BD2KCCC), and the NIH Office of the Associate Director of Data Science.  For more information about the series and to see archived presentations, go to:

http://www.bigdatau.org/data-science-seminars

 

SCHEDULE

9/9/16:  Introduction to big data and the data lifecycle (Mark Musen, Stanford).

9/16/16: SECTION 1: DATA MANAGEMENT OVERVIEW (Bill Hersh, Oregon Health Sciences).

9/23/16: Finding and accessing datasets, Indexing  and Identifiers (Lucila Ohno-Machado, UCSD).

9/30/16: Data curation and Version control (Pascale Gaudet, Swiss Institute of Bioinformatics).

10/7/16: Ontologies (Michel Dumontier, Stanford).

10/14/16: Provenance(Zachary Ives, Penn).

10/21/16: Metadata standards (Susanna-Assunta Sansone, Oxford).

 

10/28/16: SECTION 2: DATA REPRESENTATION OVERVIEW  (Anita Bandrowski, UCSD).

11/4/16:  Databases and data warehouses, Data: structures, types, integrations (Chaitan Baru, NSF).

11/11/16: No lecture — Veteran’s Day.

11/18/16: Social networking data (TBD).

12/2/16:  Data wrangling, normalization, preprocessing (Joseph Picone, Temple).

12/9/16:  Exploratory Data Analysis (Brian Caffo, Johns Hopkins).

12/16/16  Natural Language Processing (Noemie Elhadad, Columbia).

 

The following topics will be covered in January through May of 2017:

SECTION 3: COMPUTING OVERVIEW

  Workflows/pipelines

  Programming and software engineering; API; optimization

  Cloud, Parallel, Distributed Computing, and HPC

  Commons: lessons learned, current state

 

 SECTION 4: DATA MODELING AND INFERENCE OVERVIEW

   Smoothing, Unsupervised Learning/Clustering/Density Estimation

   Supervised Learning/prediction/ML, dimensionality reduction

   Algorithms, incl. Optimization

   Multiple testing, False Discovery rate

   Data issues: Bias, Confounding, and Missing data

   Causal inference

   Data Visualization tools and communication

   Modeling Synthesis

 

SECTION 5: ADDITIONAL TOPICS

   Open science

   Data sharing (including social obstacles)

   Ethical Issues

   Extra considerations/limitations for clinical data

   Reproducible Research

   SUMMARY and NIH context