Guide to Editing Podcasts on Audacity

This post distributes copies of Joseph Cohen’s (2022) “Editing Discussion Podcasts with Audiacy”, a brief tutorial that walks users through the process of compiling and editing their podcast episode using Audacity, a freeware sound editing program. You can download a copy of the program here. The basic principles shown here can be used in other programs, like Adobe Audition.

Queens College Courses in Content Creation

These courses might not be available every semester. For more information about these courses, contact their sponsoring departments. A link to these departments is given at the bottom of this page.

Basic Concepts

  • SOC 240: Content Creation & Creative Entrepreneurship (Joseph Cohen). This is a survey that course that tries to cover various parts of the content creation field. Note that there are multiple courses with the code SOC 240, and only Prof. Joseph Cohen’s section in Fall 2022 covers these topics.

Writing & Digital Publishing

  • ENGL 210W. Introduction to Creative Writing
  • ENGL 211W. Introduction to Writing Nonfiction
  • ARTS 187. Graphic Novel
  • ARTS 192. Storyboarding & Storytelling
  • ARTS 248. Book Design and Production
  • CMLIT 336. Forms of Fiction
  • CMLIT 341. Life Writing
  • JOUR 101W. Introduction to News Reporting
  • JOUR 201. Digital Journalism
  • JOUR 202. Visual Storytelling
  • MEDST 245. Screenwriting
  • MEDST 246. Art of the Adaptation

Direction & Production

  • MEDST 200. Principles of Sound and Image
  • MEDST 241. Multimedia
  • MEDST 255. Social Media
  • MEDST 314. Directing
  • MEDST 316. Commercial Production
  • MEDST 317. Advanced Post Production


  • ARTS 157: Digital Moviemaking
  • ARTS 207. Introduction to Video Editing
  • DRAM 111. Introduction to Theater Design.
  • MEDST 240. Styles of Cinema
  • MEDST 243. Introduction to Filmmaking
  • MEDST 265. Producing Independent Movies
  • MEDST 310. Documentary Filmmaking
  • MEDST 318. Cinematography

Visual Art and Design

  • ARTH 264. History of Graphic Art
  • ARTS 190. Design Foundations
  • ARTS 191. Basic Software for Design
  • ARTS 244. Color
  • ARTS 250. Design Thinking
  • PHOTO 165. DIgital Photography
  • ARTS 195. Photoshop Basics
  • ARTS 188. Illustration
  • ARTS 211. Introduction to Adobe Illustrator
  • ARTS 257. Digital Illustration


  • MEDST 313. Creative Sound Production
  • MEDST 330. The Music Industry
  • MUSIC 314. Recording Studio Fundamentals
  • MUSIC 318. Digital Recording I

Web, Games, and Apps

  • ARTS 214. Web Design.
  • CSCI 081. Introduction to Web Programming.
  • CSCI 082. Multimedia Fundamentals and Applications
  • ARTS 172. Games Design
  • ARTS 263. App Design


  • DRAM 100. Introduction to Acting
  • DRAM 150. Introduction to Dance
  • MEDST 151. Public Speaking
  • MEDST 249. Media Performance
  • MEDST 257. Nonverbal Communication

Enterprise Management

  • BALA 200. Introduction to Entrepreneurship
  • CSCI 088. Advanced Productivity Tools for Business.
  • MEDST 250. Introduction to Media Law
  • MEDST 264. Business of Media
  • PSYCH 226. Introduction to Industrial and Organizational Psychology
  • PYSCH 362. Organization Performance Management
  • SOC 224. Complex Organizations

Audience Engagement, Marketing and Promotion

  • BALA 398. Principles of Marketing
  • DATA 334. Social Research Methods
  • BUS 334. Marketing Research
  • MEDST 222. Introduction to Public Relations
  • MEDST 260. Advertising and Marketing
  • MEDST 364. Advertising, Consumption, and Culture

Focus on Audience Groups

  • AFST 202. Introduction to Black Cultures
  • AMST 110. Introduction to American Society
  • ANTH 104. Language, Culture & Society
  • ANTH 222. Sex, Gender & Culture
  • LALS 201. Contemporary Society and Film in Latin America
  • MEDST 225. Ethnicity in American Media
  • MEDST 259. Intercultural Communication
  • URBST 113. Urban Subcultures and Lifestyles
  • WGS 101. Introduction to Women and Gender Studies
  • WGS 104. Introduction to LGBTQ Studies

Deep Dives

  • SOC 103. Sociology of Life in the United States
  • SOC 216. Social Psychology
  • AMST 212. Popular Arts in America
  • ANTH 232. Photography and the Visual World
  • ANTH 285. Sociolinguistics
  • ANTH 332. Anthropology of Memory
  • ARTH 255. Late Modern and Contemporary Art
  • ARTH 256. Contemporary Art Practices
  • CMLIT 337. Archetypes
  • ENGL 170W. Introduction to Literary Study
  • ENGL 371. Twentieth- and Twenty-First-Century Drama and Performance
  • ENGL 390. Comedy and Satire
  • LCD 105. Introduction to Psycholinguistics
  • LCD 110. Phonetics
  • MEDST 100. Media Technologies from Gutenberg to the Internet
  • MEDST 101. The Contemporary Media.
  • MEDST 103. Interpersonal Communications.
  • MEDST 110. Political Communications
  • MEDST 145. History of Broadcasting
  • MEDST 211. Introduction to Sports Television
  • MEDST 257. Nonverbal Communication
  • MEDST 262. Political Economy of Media
  • PSYCH 231. Psychology of Human Motivation
  • PSYCH 217. Life-Span Developmental Psychology
  • PSYCH 334. Development of Perception and Cognition
  • SOC 218. Mass Communication and Popular Culture

More Information from Departments. At Queens College, courses are run by their sponsoring departments. For questions about individual classes, we recommend visiting department websites or contacting department personnel:

Fall 2022 Courses Taught by Joseph Cohen

Courses taught by Prof. Joseph Cohen (Queens College, CUNY) during the Fall semester of 2022.

SOC 240: Content Creation Entrepreneurship

Students will learn the basics of operating a business or nonprofit enterprise that creates content for social media and Web 2.0 platforms (like YouTube, podcasting, Instagram, TikTok, etc.).  This course combines instruction on business management and the creative process and will give students an opportunity to develop and launch their own enterprise. 

Class is scheduled to meet on Wednesdays, 1:40 PM to 4:30 PM in KY 264.

SOC 391: Internship at Queens Podcast Lab

Students need instructor approval to enroll: send a query to

Students will join the Queens Podcast Lab production team to help develop public scholarship and educational programming through YouTube, podcasting, social media, and Internet-distributed multimedia. Time commitments range from 3 to 9 hours per week during the semester. Click here for more information about Credit Internships at the Queens Podcast Lab.

The class will meet virtually via Zoom at a time TBD

DATA 334: Social Research Methods

Students will learn how to use empirical research methods to develop and refine information and knowledge.  Topics include: evidence-based decision-making, scientific inference-making, developing and refining facts and theories, research project design, exploratory research, qualitative research (including focus groups, interviews, ethnography, experimental), quantitative research (including surveys, database analysis, analytics), measurement, sampling, statistical analysis, and research communication.  This course includes a practical component where students will field a pro bono research project for a real-world client.

The class will meet Mondays and Wednesdays from 10:45 AM – 11:55 AM at TBD.

DATA 793: Thesis

Students need instructor approval to enroll: send a query to

Click here for more information about theses with Prof. Cohen.

The class will meet virtually via Zoom at a time TBD.

Fangraphs Seasonal Pitching Data, 2000 – 2019

This data set contains seasonal pitching data from the MLB from 2000 to 2019.

Download a Zip Archive of the Data and Script (includes CSV and RDS)

Click here for an explanation of the variables

Wrangling Operation

The operation requires the following packages, particularly Bill Petti’s excellent baseballr package for wrangling MLB data:

# Download the Baseball R Package:
# devtools::install_github(repo = "BillPetti/baseballr")


First, I create a table of player identifiers from the Chadwick Baseball Bureau using get_chadwick_lu() in baseballr. These identifiers will help users merge this data table to other baseball data.

# Player Identifiers
dat.playerid <- get_chadwick_lu()
dat.playerid <- dat.playerid[c(1:7,13:15,19,25:28)]
saveRDS(dat.playerid, "Player Identifiers.RDS")
identifiers <- readRDS("Player Identifiers.RDS")

Next, I download seasonal MLB pitching performance data from Fangraphs through fg_pitch_leaders()

# Scraping Batting Data
for (i in 2000:2019){
  temp <- fg_pitch_leaders(i, i, league = "all", qual = "n", ind = 1)
  assign(paste0("fg_pitch_", i), temp)

As these are all identical versions of the same data table, just representing different years, I can stack them together using rbind():

dat.pit <- fg_pitch_2000
for (i in 2000:2019){
  temp <- get(paste0("fg_pitch_", i))
  temp <- rbind(dat.pit, temp)
  assign("dat.pit", temp)

Rename the identifier in the Fangraphs table so that it is the same as the Chadwick Bureau identifer data. I then merge the two sets so that the batting data can more readily be merged with other sources.

names(dat.pit)[1] <- paste("key_fangraphs")

# Clean Up Data Types and Sort
dat.pit$key_fangraphs <- as.numeric(dat.pit$key_fangraphs)
dat.pit$Season <- as.numeric(dat.pit$Season)
dat.pit <- arrange(dat.pit, Name, Season)

# Write data
write.csv(dat.pit, "Fangraphs Pitching Leaders 2000 - 2019.csv")
saveRDS(dat.pit, "Fangraphs Pitching Leaders 2000 - 2019.RDS")

# Clean up memory 
rm(list=ls(pattern = "fg_pitch"))

Photo Credit. By derivative work: Amineshaker (talk)Image:Nolan_Ryan_in_Atlanta.jpg: Wahkeenah – Image:Nolan_Ryan_in_Atlanta.jpg|200px, Public Domain,

Fangraphs Seasonal Batting Data, 2000 – 2019

This data set contains seasonal batting data from the MLB from 2000 to 2019.

Download a Zip Archive of the Data and Script (includes CSV and RDS)

Click here for an explanation of the variables

Wrangling Operation

The operation requires the following packages, particularly Bill Petti’s excellent baseballr package for wrangling MLB data:

# Download the Baseball R Package:
# devtools::install_github(repo = "BillPetti/baseballr")


First, I create a table of player identifiers from the Chadwick Baseball Bureau using get_chadwick_lu() in baseballr. These identifiers will help users merge this data table to other baseball data.

# Player Identifiers
dat.playerid <- get_chadwick_lu()
dat.playerid <- dat.playerid[c(1:7,13:15,19,25:28)]
saveRDS(dat.playerid, "Player Identifiers.RDS")
identifiers <- readRDS("Player Identifiers.RDS")

Next, I download seasonal MLB batting leader tables from Fangraphs through fg_bat_leaders()

# Scraping Batting Data
for (i in 2000:2019){
  temp <- fg_bat_leaders(i, i, league = "all", qual = "n", ind = 1)
  assign(paste0("fg_bat_", i), temp)

As these are all identical versions of the same data table, just representing different years, I can stack them together using rbind():

dat.bat <- fg_bat_2000
for (i in 2001:2019){
  temp <- get(paste0("fg_bat_", i))
  temp <- rbind(dat.bat, temp)
  assign("dat.bat", temp)

Rename the identifier in the Fangraphs table so that it is the same as the Chadwick Bureau identifer data. I then merge the two sets so that the batting data can more readily be merged with other sources.

names(dat.bat)[1] <- paste("key_fangraphs")

temp <- merge(dat.bat, identifiers, by = "key_fangraphs")

# Clean Up Data Types and Sort
temp$key_fangraphs <- as.numeric(temp$key_fangraphs)
temp$Season <- as.numeric(temp$Season)
temp <- arrange(temp, Name, Season)

# Write data
write.csv(temp, "Fangraphs Batting Leaders 2000 - 2019.csv")
saveRDS(temp, "Fangraphs Batting Leaders 2000 - 2019.RDS")

# Clean up memory 
rm(list=ls(pattern = "fg_bat"))

Using the Data

To call the data in your analysis

# To load the CSV
dat.bat <- read.csv(paste0("Fangraphs Batting Leaders 2000 - 2019.csv"))

# To load the RDS
dat.bat <- readRDS(paste0("Fangraphs Batting Leaders 2000 - 2019.RDS"))

Where is the path to the folder containing the data files.

An Introduction to Longitudinal Analysis

What is “Longitudinal Data”?

Longitudinal data is data that covers subjects over multiple time periods. Longitudinal data is often contrasted with “cross-sectional data*, which measures subjects at a single point in time.

The data object dat (below) offers an example of longitudinal attendance data for Major League Baseball teams.

# To install baseballr package:
# library(devtools)
# install_github("BillPetti/baseballr")


team <- c("TOR", "NYY", "BAL", "BOS", "TBR",
          "KCR", "CLE", "DET", "MIN", "CHW",
          "LAA", "HOU", "SEA", "OAK", "TEX",
          "NYM", "PHI", "ATL", "MIA", "WSN",
          "MIL", "STL", "PIT", "CHC", "CIN",
          "SFG", "LAD", "ARI", "COL", "SDP")
for (i in team){
  temp <- team_results_bref(i, 2019)
  temp <- temp[c(1, 2, 3, 4, 5, 18)]
  assign(paste0("temp.dat.", i), temp)
temp.dat <- temp.dat.TOR
for (i in team[-1]){
  temp <- get(paste0("temp.dat.",i))
  temp.1 <- rbind(temp.dat, temp)
  assign("temp.dat", temp.1)
dat <- temp.dat

dat$Date <- paste0(dat$Date, ", 2019")

dat <- subset(dat, !(grepl("(", dat$Date, fixed= T)))

dat$Date <- as.Date(parse_date(dat$Date, default_tz=""))
head(dat, 10)
## # A tibble: 10 x 6
##       Gm Date       Tm    H_A   Opp   Attendance
##    <dbl> <date>     <chr> <chr> <chr>      <dbl>
##  1     1 2019-03-28 TOR   H     DET        45048
##  2     2 2019-03-29 TOR   H     DET        18054
##  3     3 2019-03-30 TOR   H     DET        25429
##  4     4 2019-03-31 TOR   H     DET        16098
##  5     5 2019-04-01 TOR   H     BAL        10460
##  6     6 2019-04-02 TOR   H     BAL        12110
##  7     7 2019-04-03 TOR   H     BAL        11436
##  8     8 2019-04-04 TOR   A     CLE        10375
##  9     9 2019-04-05 TOR   A     CLE        12881
## 10    10 2019-04-06 TOR   A     CLE        18429

Our data set includes date, home/away, opponent, and attendance data for the 2019 season. Data comes from Baseball Reference, downloaded using the excellent baseballr package. To see how I downloaded and prepared this data for analysis, download this page’s Markdown file here.

Basic Terminology

Some terminology:

  • Units refer to the individual subjects that we are following across time. In the above data, our units are baseball teams.
  • Periods refer to the time periods in which the subjects were observed. Above, our periods are dates.
  • Cross-sections refer to comparisons across units within the same time period. A cross-section of our data set would only include attendance data for a single day.
  • Time series refer to data series pertaining to the same unit over time. Were our data set only comprised of one team’s attendance data, it could be said to contain only one time series.

How is Longitudinal Data Useful?

Without longitudinal data, we are left to work with cross-sectional snapshots. A snapshot might tell us that 44,424 people came to see the Mets’ 2019 home opener, or that fans made 2,412,887 visits to the Citi Field that year.

Longitudinal data allows us to assess the effects of changes in our units of analysis and the environments in which they operate. Unpackaging phenomenon over time gives you new vantage points and bases for comparison:


dat.nym$Date.P <- as.POSIXct(dat.nym$Date)  
#Converting Date into POSIXct format.  See below.

ggplot(dat.nym, aes(x = Date.P, y = Attendance)) + 
  geom_col() +
  scale_x_datetime(breaks = date_breaks("7 days"), labels = date_format(format = "%m/%d")) +
  xlab("Date") + ylab("Attendance") + 
  scale_y_continuous(breaks = seq(0,45000,5000), label = comma) + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  ggtitle("New York Mets Home Game Attendance, 2019")

plot of chunk unnamed-chunk-3

It allows us to unpackage time effects specifically, as with this table that gives us mean attendance by day:

dat.nym$day <- factor(as.POSIXlt(dat.nym$Date.P)$wday,
                      labels = c("Monday", "Tuesday", "Wednesday",
                                 "Thursday", "Friday", "Saturday", "Sunday"))
temp <- aggregate(Attendance ~ day, data = dat.nym, mean)
temp$Attendance = round(temp$Attendance, 0)
names(temp)[1] <- paste("Day")
##         Day Attendance
## 1    Monday      22184
## 2   Tuesday      27660
## 3 Wednesday      27406
## 4  Thursday      30734
## 5    Friday      30826
## 6  Saturday      36087
## 7    Sunday      32914

Or mean attendance by month:

dat.nym$month <- factor(as.POSIXlt(dat.nym$Date.P)$mo,
                      labels = c("April", "May", "June",
                                 "July", "August", "September"))
temp <- aggregate(Attendance ~ month, data = dat.nym, mean)
temp$Attendance = round(temp$Attendance, 0)
names(temp)[1] <- paste("Month")
##       Month Attendance
## 1     April      28613
## 2       May      28244
## 3      June      30943
## 4      July      35780
## 5    August      34135
## 6 September      26831

And, of course, the finer grained data allows us to test more relationships, like mean attendance by opponent. Here are the top five teams:

temp <- aggregate(Attendance ~ Opp, data = dat.nym, mean)
temp$Attendance = round(temp$Attendance, 0)
names(temp)[1] <- paste("Opponent")
temp <- temp[order(-temp$Attendance),]
rownames(temp) <- 1:18
head(temp, 5)
##   Opponent Attendance
## 1      NYY      42736
## 2      LAD      35627
## 3      PIT      35565
## 4      CHC      35511
## 5      WSN      34885

And this finer-grained data allows us to develop models that incorporate consideration of time:

dat$month <- factor(as.POSIXlt(dat$Date)$mo,
                      labels = c("March", "April", "May", "June",
                                 "July", "August", "September"))
dat$day <- factor(as.POSIXlt(dat$Date)$wday,
                      labels = c("Monday", "Tuesday", "Wednesday",
                                 "Thursday", "Friday", "Saturday", "Sunday"))
summary(lm(Attendance ~ month + day, dat))
## Call:
## lm(formula = Attendance ~ month + day, data = dat)
## Residuals:
##    Min     1Q Median     3Q    Max 
## -28160  -8806    -18   8168  30107 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     32092.0     1075.6  29.836  < 2e-16 ***
## monthApril      -3977.2     1108.0  -3.590 0.000334 ***
## monthMay        -3146.8     1102.0  -2.855 0.004316 ** 
## monthJune        -839.5     1100.5  -0.763 0.445598    
## monthJuly         136.7     1110.2   0.123 0.902040    
## monthAugust     -1428.9     1100.5  -1.298 0.194190    
## monthSeptember  -2943.5     1103.9  -2.666 0.007694 ** 
## dayTuesday      -5247.4      634.8  -8.266  < 2e-16 ***
## dayWednesday    -5023.0      550.1  -9.131  < 2e-16 ***
## dayThursday     -4592.4      554.6  -8.281  < 2e-16 ***
## dayFriday       -3461.8      593.3  -5.834 5.76e-09 ***
## daySaturday       167.4      541.2   0.309 0.757108    
## daySunday        3257.9      536.2   6.076 1.33e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 10660 on 4713 degrees of freedom
## Multiple R-squared:  0.09668,    Adjusted R-squared:  0.09438 
## F-statistic: 42.03 on 12 and 4713 DF,  p-value: < 2.2e-16

Special Considerations with Longitudinal Data

If you are getting into analyzing longitudinal data, there are three things to know from the outset:

  1. Special wrangling operations, particularly understanding how unit-time identifiers work and how to perform commonly-used data transformation operations associated with longitudinal analysis.
  2. Standard description methods employed with this kind of data
  3. Special modeling considerations to consider when using longitudinal data in regressions.

Each topic will be discussed in this module.

Post to WordPress through R Markdown

Blogs are a great communications tool. Finding a way to post to WordPress through R Markdown can be a nice efficiency hack. This post documents a method for generating WordPress posts through RStudio. It uses the RWordPress package by Duncan Temple Lang. This tutorial is basically a scaled-down version of Peter Bumgartner’s excellent tutorial.

At the time of posting, both that and this tutorial are distributed using a CC-BY-NC-SA license.

Set Up Your Session

Installing the RWordPress and XMLRPC Packages

The RWordPress package is being distributed through GitHub. It can be installed using the install_github() operation in the devtools package. You also need to install Lang’s XMLRPC package as well.


Load Necessary Packages

In addition to RWordPress and XMLRP, we need knitr, a suite of operations for generating reports.


Create / Load a Script with Your Credentials

To post to WordPress, your login and password must be included in the script. Posting or circulating a Markdown file with your WordPress login credentials creates a huge web security risk. On the other hand, we teach you to be open about your coding, so that others can review, replicate, and build on it. How to negoatiate the two?

Bumgartner’s solution is to add the following variables to your .Rprofile file using this code

options(WordPressLogin = c(yourUserName = 'yourPassword'),
        WordPressURL = 'yourWordPressURL')

Do not delete the quotation marks in this template. For the ‘yourWordPressURL’, be sure to add a “https://” before and and “/xmlrpc.php” after your site address. So, for the site “”, you would enter “”

Once you have generated and stored the file, you can add it to any Markdown file that you intend to post. I called my file “WPCrds.R”:

source("E:/My Documents/BlogCreds.R") 
#Or wherever you store the file

Set Images to Post to

To save time and resources, post your images to the open online site To do so, add these knitr options to your script:

# To upload images to
opts_knit$set( = imgur_upload, base.url = NULL)  

# Set default image dimensions
# Change them to change the dimensions of figures
opts_chunk$set(fig.width = 5, fig.height = 5, cache = TRUE)

Generate a Polished html Report

Create a Markdown file that renders your report in html. Render it locally to something sufficiently polished, and then move on to the next step.

I will download and insert an image into the report, in case someone needs to do that:

#Download the image
  destfile = "BBCard.jpg")

Then, OUTSIDE the chunk, enter into the main text of the document this line, without the hashtag

![Baseball Card](BBcard.jpg)

Baseball Card
Baseball Card

Post Your Information

In another chunk, set your post title as the variable postTitle, and define fileName as the name of the Markdown file that you are using to generate the post:

postTitle = "Your Title"
fileName = "Your-Markdown-File.Rmd"

Post using the knit2wp command, whose options include:

  • the Markdown file name (option: input)
  • the post title (option: title)
  • whether the post should be published upon rendering (option: publish, set true to directly publish – not advisable)
  • for a new post, the action variable should be set to “newPost” (option: action)
  • to set the post’s category. Use the slugs designated on your site. (option: categories)
  • to set the post’s tags. Use the slugs designated on your site. (option: tags)
postID <- knit2wp(
        input = fileName, 
        title = postTitle, 
        publish = FALSE,
        action = "newPost",
        categoories = c("Insert Slugs here"),
        tags = c("insert slugs here"),
        mt_excerpt = "Insert excerpt here"

Clean It Up on WordPress

From here, you will have a draft post on your WordPress site. I find WordPress to have a far more user-friendly interface to complete the last steps of a post.

Remember: Risk the temptation to do everything in R if it costs a lot of time. Yes, it would be cool if you could do it. The problem is no one but data nerds will appreciate it, and there’s not a lot of data nerds out there.


To post to WordPress through R Markdown, start with this setup to your script. Remember that the source() line points to a file with your credentials. For this post, it is:


postTitle = "Post to WordPress via R Markdown"
fileName = "Using-RWordPress.Rmd"

postID <- knit2wp(
        input = fileName, 
        title = postTitle, 
        publish = FALSE,
        action = "newPost",
        categories = c("analytics"),
        tags = c("communications", "Markdown"),
        mt_excerpt = "Publish a Markdown report direct to your WordPress blog"

Then write the script until it is polished, and then you can post by running this code. I recommend that you set eval = F, so that it doesn’t run when you render the Markdown file into html. Then, once the file runs, execute this code on its own, not as part of a larger operation.

Then, polish the piece in WordPress.

I Couldn’t Finish My Book, So I Enlisted a Sports Psychologist

I recently came across a TED talk by Mihaly Csikszentmihalyi from Claremont Graduate University. I learned of Prof. Csikszentmihalyi’s work a few years ago, when I was struggling to finish my second book. It was part of a formative professional experience that influenced both my work and my career advising. I spoke about the experience in an episode of The Annex with Clayton Childress from the University of Toronto:

Here’s the story:

I was working on my 2017 book and hit a writer’s block that needed overcoming if I were going to finish it. There are times when you are in the zone and can produce page after page of great writing. Then there are others where your writing is garbage, and it is hard to get out a paragraph.

I reasoned that my situation was very similar to an athlete who has a hot hand versus one who is in a slump, and went to a sports psychiatrist to ask how he would treat me if, say, I were a pitcher who couldn’t hit the strike zone or a basketball player who freezes when he makes a free throw.

He told me that he often refers patients to the work of Csikszentmihalyi’s (1990) Flow: The Psychology of Optimal Experience. For me, the big take-away was this: Were I a pitcher, Csikszentmihalyi’s approach would advise me to enjoy the experience of the pitch, and to concentrate on each thrown ball, and to block out the batter, the inning, and all that. Being in the zone is a form of immersion into an activity where your mindset becomes focused on and develops a rhythm to the most basic elements of the activity. Throwing a baseball in pitching. Writing paragraphs in writing.

Here’s the TED talk for more:

This experience was formative for me in both my approach to work and my advising of students. It is very important to enjoy the immediate experience of the tasks involved with one’s job, and one can lead a pleasurable life if one is happy to jump into the day-to-day of their job on a regular basis. For me personally, it encouraged me to shed projects that I didn’t like and didn’t feel were rewarding, and to invest in things that may not be conventionally valued but were things that I did well, loved, cared about, and enjoyed.

For students who come to me seeking career advice, I tell them to spend their twenties and early thirties finding a place in the world that suits their natural dispositions and involves tasks that they enjoy doing. I strongly believe that it is possible to enjoy a comfortable, rewarding livelihood in any line of work, so long as you are really good at it. And you can’t get really good at something unless you are doing it day in and day out. And you can’t do something day in and day out if you don’t like doing it.

Photo Credit. Bain News Service, Publisher. Eric Erickson, pitcher, Detroit AL baseball. , 1917. Photograph.

Am I Rich?

Studies show that people do a poor job of guessing how their economic situation compares to other people (example studies herehere, and here). This situation makes sense. It is impolite to pry into other people’s finances. Most people’s knowledge about what constitutes a “high” or “low” income comes from the brackets they read when filing their taxes.

One way to get a more precise sense of one’s own position on the economic ladder is to examine its distribution across society-at-large, which can be done using data from the Survey of Consumer Finances.. Our focus will be on differences in households, as opposed to people, because most people pool their money and expenditures at this level of organization. People’s personal fortunes are often highly dependent on their household’s economic situation. We think of a household’s place on this ladder as a matter of two variables: income (how much money flows into the household regularly) and wealth (the value of the household’s property, less its debts).


In 2019, the median American household earned $58.644 The middle 50% – between the 25th and 75th percentiles – earned between $30.543 and $107.717. An income of at least $191,605 puts you in society’s top 10%, one of $289,960 puts you in the top 5%, and one of $867,620 makes you a One-Percenter in terms of income. To be in the bottom 10%, one’s household had to take in less than $16,290

The bar chart below gives a more detailed sense of the distribution of household incomes.

Distribution of Household Income, 2019


The 2019 SCF data suggests the median U.S. household net worth of $121,511 The middle 50% (between the 25th and 75th percentiles) are worth between $12,436 and $403,358 More than 10% of American households have negative net worth, which is to say that they owe more money than they own in property. A net worth of $1.2 million puts you in the top 10%, one of $2.6 million puts you in the top 5%, and one of $11 million is enough to be part of the One Percent.

The bar chart below describes the distribution of net worth among U.S. households:

Distribution of Household Wealth in 2019

From Broad Categories to Specific Ideas

People have a poor idea of where they stand in relation to society at large. Most wealthy people think that they are middle class. These figures give readers are more specific understanding of where they stand in relation to others in terms of income and wealth. Of course, these benchmarks do not account for age, region, or a host of other factors. If there’s interest, I can break this out further.

Scripts and Data on OSF.

Photo Credit. Glackens, L. M. , Artist. The rich child’s fourth / L.M. Glackens. , 1911. N.Y.: Published by Keppler & Schwarzmann, Puck Building. Photograph.

Steps in a Data Analysis Job

Part of teaching data analytics involves giving students an idea of the steps involved in performing analytical work. Here is my take on the steps. Please offer your comments below. The goal is to prepare young people to do better work when they go out into the field.

1. Know Your Client

On whose behalf am I performing this analysis?

The first step in a data analysis job is pinning down the identity of your client. Ultimately, data analysis is about creating information. If you are going to do that well, you need some sense of your audience’s technical background, their practical interests, their preferred mode of reasoning or communicating, and much else.

2. Start With a Clear Understanding of the Question

What information does my client want? Why do they want this information? How do they plan to use it, and is this something that I want to be a part of? Can I answer it using the resources or tools at my disposal, or do I have a clear idea of which resources or tools need to be required to get the job done?

You do not want to take on an analysis job that lacks clear informational goals. You may have to work with a client to pin down those informational goals. Doing so requires that you identify factual information that would substantially influence your client’s practical decision-making choices. In addition, it is best to avoid (or at least avoid taking lead on) jobs that you can’t handle. It is probably also a good idea to avoid jobs that are not genuinely interested or influenced by your findings, or begin with an answer already in mind.

Many clients are not highly familiar with data analysis, and may not be highly numerate. It is your professional responsibility to ensure that they understand what data analysis can and cannot do.

3. What Am I Supposed to be Examining?

What are my objects of analysis?  Which qualities, behaviors, or outcomes am I assessing?  Why?

These kinds of questions point students towards working hard to pin down the research design that guides their study. Researchers should have a clear sense of whose personal qualities or behaviors need to be examined, and exactly what about them is relevant to the client’s decision-making problem or goal. If done well, the researcher should have a clear sense of the theoretical propositions or concepts that are most crucial to the client’s goals, and how to find analyzable data to examine them.

4. Assess and Acquire Data

How can I get data on these units’ characteristics or behaviors? How do I get access to them? What is the quality of the data’s sample (i.e., how representative is the data)? Do these measures look reliable and valid?

All analyses should make some assessment of the data, using the tools that you acquired in your methods class. As the saying goes: Garbage In, Garbage Out. Don’t bother bringing a battery of high-powered analytical tools to a crap data set.

5. Clean and Prepare Data

Secure data, correct errors, refine measurements, assess missingness, and identify possible outliers.

This is the process of turning the raw, messy data that one often encounters into a clean, tidy set that is amenable to the analytical operations of your toolkit.

6. Implement Analytical Operations.

Implement analytical procedures designed to extract specific types of information from data.

Over the course of the semester, we will study a range of tools to extract information from data.

7. Interpret Analytical Operations

Convert statistical results into natural-language explanation, and assess implications to client’s thought processes.

The computer does the job of processing the data. Your job is to interface that output with the problem being confronted by your client. In the interest of making them a partner, it helps to develop the skill of translating complicated results into cognitively-accessible, but still rigorous and correct, explanations that everyone can understand.

You empower clients by giving them access to what you see in the statistical results. Win their confidence on this front, and they will work with you again in the future.

8. Communication

Finding ways to convey information in a way that resonates with client.

These are the skills that will keep you from being someone’s basement tech.

Photo Credit. United States Office Of Scientific Research And Development. National Defense Research Committee. Probability and Statistical Studies in Warfare Analysis. Washington, D.C.: Office of Scientific Research and Development, National Defense Research Committee, Applied Mathematics Panel, 1946. Pdf.

Need help with the Commons? Visit our
help page
Send us a message