The Digital Research Bootcamp

January 19-22, 2016

The Graduate Center, CUNY


The inaugural GC Digital Research Bootcamp took place over four days in January 2016. It provided participants with an opportunity to develop digital research skills and connect with others in an interdisciplinary environment. The workshops introduced a range of digital tools and methods for doing research, with multiple tracks targeted at different skillsets and disciplinary focuses.

Free of charge to participants, the GC Digital Research Bootcamp was developed in partnership with Software Carpentry, the New York Public Library, Mozilla Science Lab, Humanities Intensive Learning and Teaching, and the Digital Humanities Summer Institute thanks to the sponsorship of GCDI, The Graduate Center, and The CUNY Strategic Investment Initiatives Program.

Instructors included volunteers from Software Carpentry, members of the GCDI community, as well as current and past GC Digital Fellows.

Interest in the Bootcamp surpassed our expectations—we received more than three times the number of applicants as there were available spots. Applicants were selected for participation based on numerous factors, and applicants who were not selected will automatically be considered for future programs as space allows.

Past Workshops


The following are the workshops that were offered during the Digital Research Bootcamp in January 2016. Please note that future Digital Research Institutes will offer different workshops. This is an archive of our previous offerings to give you a sense of the types of materials and topics you might expect from future programs. Additionally, future workshops will likely utilize a different structure, incorporate different materials and examples, and cover different topic areas even if the workshops are similarly titled.

Note: Links to the materials used in the workshops have been included as icons in the upper right corner of each workshop's description as follows:

  • Jump to the day in the schedule containing this workshop
  • The GitHub repository containing code and other materials used in the workshop
  • The live notes taken by participants during the workshop
  • Additional information about the workshop
  • Jump back to the top of the workshop list

Command Line

Introduction to the UNIX command line. Topics covered will include navigating the filesystem, manipulating the environment, executing useful commands, and using pipes to communicate between programs. This session will teach you how to communicate directly with your computer’s operating system using a text-based interface and is a useful first step in learning many other technical skills. For more information, visit: http://swcarpentry.github.io/shell-novice/

Git

Git is a tool for managing changes to a set of files. It allows users to access open source repositories, recover earlier versions of a project, and collaborate with other contributors. This session will be beneficial to anyone working with data, code, or text. For more information, visit: http://swcarpentry.github.io/git-novice/

Python

Python is a programming language that can be used for a wide range of tasks, including collecting and analyzing data and building web applications. It is likely the most popular language for academic researchers. This session will be useful for anyone who wants to take a programmatic approach to research. For more information visit: http://swcarpentry.github.io/python-novice-inflammation/

Data Cleaning

Working with real-world data is often messy and complicated. This session is an overview of some common problems that arise when working with datasets, including converting between file types, resolving inconsistent metadata, and working with APIs.

Data Formats

This session will go more in-depth into the different types of data researchers work with and the challenges that commonly arise when trying to clean data and prepare it for analysis. This session will be beneficial to anyone who needs to convert between data types, has large amounts of data to work with, or may encounter missing or messy data.

Twitter API

This session will cover the basics of accessing data via the Twitter API. including specific challenges that arise when working with large, text-based data sets. This session will be beneficial for anyone who wants to collect data from Twitter or other social networks.

Scientific Python

This session will discuss how to work with N-dimensional array data using numpy and scipy. It will discuss tools for efficiently filtering data, methods for statistically analysing data, linear algebra routines, and various signal processing routines. This session will also discuss how to use masked arrays to work with missing values. This session is aimed towards researchers working with N-dimensional quantitative data.

HTML, CSS, and Javascript

Modern web pages are created using HTML to control content, CSS to control appearance, and JavaScript to dictate behavior. This session will be helpful for anyone that wants to build on the web.

Data Visualization

Communicating your findings can be just as important as getting those findings in the first place. This session will introduce ways to visually represent complex data (text-based, numerical, and temporal) in order to reach a wider audience.

Text Analysis with NLTK

The Natural Language Toolkit (NLTK) is a Python library that allows researchers to work with text-based data, such as literary works or social media corpora. Through attending this session, you will learn how to use Python to analyze large amounts of text to find word frequencies, collocations, and other patterns invisible from a non-computational perspective. This session will be of particular use to researchers who work with any form of text-based data.

Digital Identity with Wordpress

This session will discuss the basics of building a digital academic identity in online spaces to effectively showcase your research and network with scholars in your field. We will use the CUNY Academic Commons to explore personal website creation using the WordPress platform, and will review best practices for organization and design, theme editing, content management, blogging, and image use. Participants who have ideas for project-based or personal websites are welcome to begin implementing these in this session.

GIS

These sessions will explore the basics of mapping and geospatial representation of data and applythat knowledge to make maps, visualize spatial relationships and analyze spatial data on CartoDB. This session will be useful for anyone whose data has a spatial component or who wants to incorporate spatial relationships into their analysis.

Machine Learning

This session will introduce regression analysis, data preprocessing, model selection, supervised and unsupervised machine learning and dimension reduction using the sklearn machine learning library. This session is aimed towards researchers who want to find patterns in their data or use their data to predict a phenomena.

test

Past DRB Schedule


Time Topic Instructor(s)
Day 1: Tuesday, January 18, 2016
9:30 - 10:00 Registration & Coffee
10:00 - 11:15 Command Line 1 Software Carpentry
11:15 - 11:30 Break
11:30 - 12:45 Command Line 2 Software Carpentry
12:45 - 2:00 Lunch + Introductions
2:00 - 3:15 Python 1 Software Carpentry
3:15 - 3:30 Break
3:30 - 4:45 Python 2 Software Carpentry
4:45 - 6:00 Office Hours
Day 2: Wednesday, January 19, 2016
9:30 - 10:00 Coffee
10:00 - 11:15 Git 1 Software Carpentry
11:15 - 11:30 Break
11:30 - 12:45 Git 2 Software Carpentry
12:45 - 2:00 Lunch
2:00 - 3:15 Python 3 Software Carpentry
3:15 - 3:30 Break
3:30 - 4:45 Python 4 Software Carpentry
4:45 - 6:00 Office Hours
Day 3: Thursday, January 20, 2016
9:30 - 10:00 Coffee
10:00 - 11:15 Data Cleaning Evan Misshula
11:15 - 11:30 Break
11:30 - 12:45 Twitter API Stephen Zweibel, Patrick Smyth
11:30 - 12:45 Data Formats Evan Misshula
12:45 - 2:00 Lunch
2:00 - 3:15 NLTK 1 Michelle Johnson McSweeney, Patrick Smyth
2:00 - 3:15 Scientific Python Hannah Aizenman
3:15 - 3:30 Break
3:30 - 4:45 NLTK 2 Michelle Johnson McSweeney, Patrick Smyth
3:30 - 4:45 Machine Learning Hannah Aizenman
4:45 - 6:00 Office Hours
Day 4: Friday, January 21, 2016
9:30 - 10:00 Coffee
10:00 - 11:30 Mapping Michelle Johnson McSweeney, Hannah AIzenman
10:00 - 11:30 Digital Identity/Wordpress Mary Catherine Kinniburgh, Patrick Sweeney
11:30 - 11:45 Break
11:45 - 1:00 Mapping Michelle Johnson McSweeney, Hannah AIzenman
11:45 - 1:00 HTML/CSS/JS Ian Phillips
1:00 - 2:00 Lunch
2:00 - 3:30 Data Visualization Micki Kaufman
3:30 - 3:45 Wrap Up
3:45 - 4:30 Reception