Working with Spreadsheets

  internship

  Devin Johnson

Fancy algorithms, lots of code, and beautiful visualizations are usually the first things that come to mind when people think about data science. While these things are a part of the equation, a central piece of the puzzle (though sometimes overlooked) is the raw data itself, where it lives, and how to most efficiently work with it. The focus of my internship addresses a piece of this, namely spreadsheets and the most efficient ways to work with them using RStudio.

If you’ve worked with data no doubt you’ve worked with spreadsheets. In starting this project the goal has been to speak to two groups: those who work with data in spreadsheets directly and those who work with them using RStudio. Currently we’re developing a resource that provides an overview of the most common packages used to handle data stored in sheets, providing key information as to how each package deals with reading in and formatting data as well as providing information on the current state of each package and system requirements. In addition we’re developing a series of educational posts focused on demonstrating the ways in which RStudio functionality can make life easier for handling and analyzing data in sheets, and how R users can utilize the best features of spreadsheets to inform their own work from scripting to the development of shiny apps. Abandoning the notion that analysts who either primarily work in R or primarily in spreadsheets are two distinct groups that shall never meet, the resource we’re developing aims to build a bridge and show how blending workflows that efficiently utilize spreadsheets and RStudio can make for powerful analyses and impactful products.

Since starting I’ve thoroughly enjoyed my time at RStudio especially in being able to work with and learn from Jenny Bryan and Mine Çetinkaya-Rundel. Not only have I been able to lend my own experiences handling all kinds of spreadsheet data to the project, I’ve learned much about different types of spreadsheet workflows used for data entry and the delicate balance of sheets best maintained for human and computer use. What I’ve enjoyed most about RStudio is the community and culture here. While my time here has been short I’ve felt welcome since day one and have been able to participate in company wide events and meet so many amazing people.