Home » Posts

Category Archives: Posts

Guide to Editing Podcasts on Audacity

This post distributes copies of Joseph Cohen’s (2022) “Editing Discussion Podcasts with Audiacy”, a brief tutorial that walks users through the process of compiling and editing their podcast episode using Audacity, a freeware sound editing program. You can download a copy of the program here. The basic principles shown here can be used in other programs, like Adobe Audition.

Queens College Courses in Content Creation

These courses might not be available every semester. For more information about these courses, contact their sponsoring departments. A link to these departments is given at the bottom of this page.

Basic Concepts

  • SOC 240: Content Creation & Creative Entrepreneurship (Joseph Cohen). This is a survey that course that tries to cover various parts of the content creation field. Note that there are multiple courses with the code SOC 240, and only Prof. Joseph Cohen’s section in Fall 2022 covers these topics.

Writing & Digital Publishing

  • ENGL 210W. Introduction to Creative Writing
  • ENGL 211W. Introduction to Writing Nonfiction
  • ARTS 187. Graphic Novel
  • ARTS 192. Storyboarding & Storytelling
  • ARTS 248. Book Design and Production
  • CMLIT 336. Forms of Fiction
  • CMLIT 341. Life Writing
  • JOUR 101W. Introduction to News Reporting
  • JOUR 201. Digital Journalism
  • JOUR 202. Visual Storytelling
  • MEDST 245. Screenwriting
  • MEDST 246. Art of the Adaptation

Direction & Production

  • MEDST 200. Principles of Sound and Image
  • MEDST 241. Multimedia
  • MEDST 255. Social Media
  • MEDST 314. Directing
  • MEDST 316. Commercial Production
  • MEDST 317. Advanced Post Production


  • ARTS 157: Digital Moviemaking
  • ARTS 207. Introduction to Video Editing
  • DRAM 111. Introduction to Theater Design.
  • MEDST 240. Styles of Cinema
  • MEDST 243. Introduction to Filmmaking
  • MEDST 265. Producing Independent Movies
  • MEDST 310. Documentary Filmmaking
  • MEDST 318. Cinematography

Visual Art and Design

  • ARTH 264. History of Graphic Art
  • ARTS 190. Design Foundations
  • ARTS 191. Basic Software for Design
  • ARTS 244. Color
  • ARTS 250. Design Thinking
  • PHOTO 165. DIgital Photography
  • ARTS 195. Photoshop Basics
  • ARTS 188. Illustration
  • ARTS 211. Introduction to Adobe Illustrator
  • ARTS 257. Digital Illustration


  • MEDST 313. Creative Sound Production
  • MEDST 330. The Music Industry
  • MUSIC 314. Recording Studio Fundamentals
  • MUSIC 318. Digital Recording I

Web, Games, and Apps

  • ARTS 214. Web Design.
  • CSCI 081. Introduction to Web Programming.
  • CSCI 082. Multimedia Fundamentals and Applications
  • ARTS 172. Games Design
  • ARTS 263. App Design


  • DRAM 100. Introduction to Acting
  • DRAM 150. Introduction to Dance
  • MEDST 151. Public Speaking
  • MEDST 249. Media Performance
  • MEDST 257. Nonverbal Communication

Enterprise Management

  • BALA 200. Introduction to Entrepreneurship
  • CSCI 088. Advanced Productivity Tools for Business.
  • MEDST 250. Introduction to Media Law
  • MEDST 264. Business of Media
  • PSYCH 226. Introduction to Industrial and Organizational Psychology
  • PYSCH 362. Organization Performance Management
  • SOC 224. Complex Organizations

Audience Engagement, Marketing and Promotion

  • BALA 398. Principles of Marketing
  • DATA 334. Social Research Methods
  • BUS 334. Marketing Research
  • MEDST 222. Introduction to Public Relations
  • MEDST 260. Advertising and Marketing
  • MEDST 364. Advertising, Consumption, and Culture

Focus on Audience Groups

  • AFST 202. Introduction to Black Cultures
  • AMST 110. Introduction to American Society
  • ANTH 104. Language, Culture & Society
  • ANTH 222. Sex, Gender & Culture
  • LALS 201. Contemporary Society and Film in Latin America
  • MEDST 225. Ethnicity in American Media
  • MEDST 259. Intercultural Communication
  • URBST 113. Urban Subcultures and Lifestyles
  • WGS 101. Introduction to Women and Gender Studies
  • WGS 104. Introduction to LGBTQ Studies

Deep Dives

  • SOC 103. Sociology of Life in the United States
  • SOC 216. Social Psychology
  • AMST 212. Popular Arts in America
  • ANTH 232. Photography and the Visual World
  • ANTH 285. Sociolinguistics
  • ANTH 332. Anthropology of Memory
  • ARTH 255. Late Modern and Contemporary Art
  • ARTH 256. Contemporary Art Practices
  • CMLIT 337. Archetypes
  • ENGL 170W. Introduction to Literary Study
  • ENGL 371. Twentieth- and Twenty-First-Century Drama and Performance
  • ENGL 390. Comedy and Satire
  • LCD 105. Introduction to Psycholinguistics
  • LCD 110. Phonetics
  • MEDST 100. Media Technologies from Gutenberg to the Internet
  • MEDST 101. The Contemporary Media.
  • MEDST 103. Interpersonal Communications.
  • MEDST 110. Political Communications
  • MEDST 145. History of Broadcasting
  • MEDST 211. Introduction to Sports Television
  • MEDST 257. Nonverbal Communication
  • MEDST 262. Political Economy of Media
  • PSYCH 231. Psychology of Human Motivation
  • PSYCH 217. Life-Span Developmental Psychology
  • PSYCH 334. Development of Perception and Cognition
  • SOC 218. Mass Communication and Popular Culture

More Information from Departments. At Queens College, courses are run by their sponsoring departments. For questions about individual classes, we recommend visiting department websites or contacting department personnel:

Fall 2022 Courses Taught by Joseph Cohen

Courses taught by Prof. Joseph Cohen (Queens College, CUNY) during the Fall semester of 2022.

SOC 240: Content Creation Entrepreneurship

Students will learn the basics of operating a business or nonprofit enterprise that creates content for social media and Web 2.0 platforms (like YouTube, podcasting, Instagram, TikTok, etc.).  This course combines instruction on business management and the creative process and will give students an opportunity to develop and launch their own enterprise. 

Class is scheduled to meet on Wednesdays, 1:40 PM to 4:30 PM in KY 264.

SOC 391: Internship at Queens Podcast Lab

Students need instructor approval to enroll: send a query to joseph.cohen@qc.cuny.edu.

Students will join the Queens Podcast Lab production team to help develop public scholarship and educational programming through YouTube, podcasting, social media, and Internet-distributed multimedia. Time commitments range from 3 to 9 hours per week during the semester. Click here for more information about Credit Internships at the Queens Podcast Lab.

The class will meet virtually via Zoom at a time TBD

DATA 334: Social Research Methods

Students will learn how to use empirical research methods to develop and refine information and knowledge.  Topics include: evidence-based decision-making, scientific inference-making, developing and refining facts and theories, research project design, exploratory research, qualitative research (including focus groups, interviews, ethnography, experimental), quantitative research (including surveys, database analysis, analytics), measurement, sampling, statistical analysis, and research communication.  This course includes a practical component where students will field a pro bono research project for a real-world client.

The class will meet Mondays and Wednesdays from 10:45 AM – 11:55 AM at TBD.

DATA 793: Thesis

Students need instructor approval to enroll: send a query to joseph.cohen@qc.cuny.edu.

Click here for more information about theses with Prof. Cohen.

The class will meet virtually via Zoom at a time TBD.

Fangraphs Seasonal Pitching Data, 2000 – 2019

This data set contains seasonal pitching data from the MLB from 2000 to 2019.

Download a Zip Archive of the Data and Script (includes CSV and RDS)

Click here for an explanation of the variables

Wrangling Operation

The operation requires the following packages, particularly Bill Petti’s excellent baseballr package for wrangling MLB data:

# Download the Baseball R Package:
# devtools::install_github(repo = "BillPetti/baseballr")


First, I create a table of player identifiers from the Chadwick Baseball Bureau using get_chadwick_lu() in baseballr. These identifiers will help users merge this data table to other baseball data.

# Player Identifiers
dat.playerid <- get_chadwick_lu()
dat.playerid <- dat.playerid[c(1:7,13:15,19,25:28)]
saveRDS(dat.playerid, "Player Identifiers.RDS")
identifiers <- readRDS("Player Identifiers.RDS")

Next, I download seasonal MLB pitching performance data from Fangraphs through fg_pitch_leaders()

# Scraping Batting Data
for (i in 2000:2019){
  temp <- fg_pitch_leaders(i, i, league = "all", qual = "n", ind = 1)
  assign(paste0("fg_pitch_", i), temp)

As these are all identical versions of the same data table, just representing different years, I can stack them together using rbind():

dat.pit <- fg_pitch_2000
for (i in 2000:2019){
  temp <- get(paste0("fg_pitch_", i))
  temp <- rbind(dat.pit, temp)
  assign("dat.pit", temp)

Rename the identifier in the Fangraphs table so that it is the same as the Chadwick Bureau identifer data. I then merge the two sets so that the batting data can more readily be merged with other sources.

names(dat.pit)[1] <- paste("key_fangraphs")

# Clean Up Data Types and Sort
dat.pit$key_fangraphs <- as.numeric(dat.pit$key_fangraphs)
dat.pit$Season <- as.numeric(dat.pit$Season)
dat.pit <- arrange(dat.pit, Name, Season)

# Write data
write.csv(dat.pit, "Fangraphs Pitching Leaders 2000 - 2019.csv")
saveRDS(dat.pit, "Fangraphs Pitching Leaders 2000 - 2019.RDS")

# Clean up memory 
rm(list=ls(pattern = "fg_pitch"))

Photo Credit. By derivative work: Amineshaker (talk)Image:Nolan_Ryan_in_Atlanta.jpg: Wahkeenah – Image:Nolan_Ryan_in_Atlanta.jpg|200px, Public Domain, https://commons.wikimedia.org/w/index.php?curid=5022538

Fangraphs Seasonal Batting Data, 2000 – 2019

This data set contains seasonal batting data from the MLB from 2000 to 2019.

Download a Zip Archive of the Data and Script (includes CSV and RDS)

Click here for an explanation of the variables

Wrangling Operation

The operation requires the following packages, particularly Bill Petti’s excellent baseballr package for wrangling MLB data:

# Download the Baseball R Package:
# devtools::install_github(repo = "BillPetti/baseballr")


First, I create a table of player identifiers from the Chadwick Baseball Bureau using get_chadwick_lu() in baseballr. These identifiers will help users merge this data table to other baseball data.

# Player Identifiers
dat.playerid <- get_chadwick_lu()
dat.playerid <- dat.playerid[c(1:7,13:15,19,25:28)]
saveRDS(dat.playerid, "Player Identifiers.RDS")
identifiers <- readRDS("Player Identifiers.RDS")

Next, I download seasonal MLB batting leader tables from Fangraphs through fg_bat_leaders()

# Scraping Batting Data
for (i in 2000:2019){
  temp <- fg_bat_leaders(i, i, league = "all", qual = "n", ind = 1)
  assign(paste0("fg_bat_", i), temp)

As these are all identical versions of the same data table, just representing different years, I can stack them together using rbind():

dat.bat <- fg_bat_2000
for (i in 2001:2019){
  temp <- get(paste0("fg_bat_", i))
  temp <- rbind(dat.bat, temp)
  assign("dat.bat", temp)

Rename the identifier in the Fangraphs table so that it is the same as the Chadwick Bureau identifer data. I then merge the two sets so that the batting data can more readily be merged with other sources.

names(dat.bat)[1] <- paste("key_fangraphs")

temp <- merge(dat.bat, identifiers, by = "key_fangraphs")

# Clean Up Data Types and Sort
temp$key_fangraphs <- as.numeric(temp$key_fangraphs)
temp$Season <- as.numeric(temp$Season)
temp <- arrange(temp, Name, Season)

# Write data
write.csv(temp, "Fangraphs Batting Leaders 2000 - 2019.csv")
saveRDS(temp, "Fangraphs Batting Leaders 2000 - 2019.RDS")

# Clean up memory 
rm(list=ls(pattern = "fg_bat"))

Using the Data

To call the data in your analysis

# To load the CSV
dat.bat <- read.csv(paste0(data.directory("Fangraphs Batting Leaders 2000 - 2019.csv"))

# To load the RDS
dat.bat <- readRDS(paste0(data.directory("Fangraphs Batting Leaders 2000 - 2019.RDS"))

Where data.directory is the path to the folder containing the data files.

An Introduction to Longitudinal Analysis

What is “Longitudinal Data”?

Longitudinal data is data that covers subjects over multiple time periods. Longitudinal data is often contrasted with “cross-sectional data*, which measures subjects at a single point in time.

The data object dat (below) offers an example of longitudinal attendance data for Major League Baseball teams.

# To install baseballr package:
# library(devtools)
# install_github("BillPetti/baseballr")


team <- c("TOR", "NYY", "BAL", "BOS", "TBR",
          "KCR", "CLE", "DET", "MIN", "CHW",
          "LAA", "HOU", "SEA", "OAK", "TEX",
          "NYM", "PHI", "ATL", "MIA", "WSN",
          "MIL", "STL", "PIT", "CHC", "CIN",
          "SFG", "LAD", "ARI", "COL", "SDP")
for (i in team){
  temp <- team_results_bref(i, 2019)
  temp <- temp[c(1, 2, 3, 4, 5, 18)]
  assign(paste0("temp.dat.", i), temp)
temp.dat <- temp.dat.TOR
for (i in team[-1]){
  temp <- get(paste0("temp.dat.",i))
  temp.1 <- rbind(temp.dat, temp)
  assign("temp.dat", temp.1)
dat <- temp.dat

dat$Date <- paste0(dat$Date, ", 2019")

dat <- subset(dat, !(grepl("(", dat$Date, fixed= T)))

dat$Date <- as.Date(parse_date(dat$Date, default_tz=""))
head(dat, 10)
## # A tibble: 10 x 6
##       Gm Date       Tm    H_A   Opp   Attendance
##    <dbl> <date>     <chr> <chr> <chr>      <dbl>
##  1     1 2019-03-28 TOR   H     DET        45048
##  2     2 2019-03-29 TOR   H     DET        18054
##  3     3 2019-03-30 TOR   H     DET        25429
##  4     4 2019-03-31 TOR   H     DET        16098
##  5     5 2019-04-01 TOR   H     BAL        10460
##  6     6 2019-04-02 TOR   H     BAL        12110
##  7     7 2019-04-03 TOR   H     BAL        11436
##  8     8 2019-04-04 TOR   A     CLE        10375
##  9     9 2019-04-05 TOR   A     CLE        12881
## 10    10 2019-04-06 TOR   A     CLE        18429

Our data set includes date, home/away, opponent, and attendance data for the 2019 season. Data comes from Baseball Reference, downloaded using the excellent baseballr package. To see how I downloaded and prepared this data for analysis, download this page’s Markdown file here.

Basic Terminology

Some terminology:

  • Units refer to the individual subjects that we are following across time. In the above data, our units are baseball teams.
  • Periods refer to the time periods in which the subjects were observed. Above, our periods are dates.
  • Cross-sections refer to comparisons across units within the same time period. A cross-section of our data set would only include attendance data for a single day.
  • Time series refer to data series pertaining to the same unit over time. Were our data set only comprised of one team’s attendance data, it could be said to contain only one time series.

How is Longitudinal Data Useful?

Without longitudinal data, we are left to work with cross-sectional snapshots. A snapshot might tell us that 44,424 people came to see the Mets’ 2019 home opener, or that fans made 2,412,887 visits to the Citi Field that year.

Longitudinal data allows us to assess the effects of changes in our units of analysis and the environments in which they operate. Unpackaging phenomenon over time gives you new vantage points and bases for comparison:


dat.nym$Date.P <- as.POSIXct(dat.nym$Date)  
#Converting Date into POSIXct format.  See below.

ggplot(dat.nym, aes(x = Date.P, y = Attendance)) + 
  geom_col() +
  scale_x_datetime(breaks = date_breaks("7 days"), labels = date_format(format = "%m/%d")) +
  xlab("Date") + ylab("Attendance") + 
  scale_y_continuous(breaks = seq(0,45000,5000), label = comma) + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  ggtitle("New York Mets Home Game Attendance, 2019")

plot of chunk unnamed-chunk-3

It allows us to unpackage time effects specifically, as with this table that gives us mean attendance by day:

dat.nym$day <- factor(as.POSIXlt(dat.nym$Date.P)$wday,
                      labels = c("Monday", "Tuesday", "Wednesday",
                                 "Thursday", "Friday", "Saturday", "Sunday"))
temp <- aggregate(Attendance ~ day, data = dat.nym, mean)
temp$Attendance = round(temp$Attendance, 0)
names(temp)[1] <- paste("Day")
##         Day Attendance
## 1    Monday      22184
## 2   Tuesday      27660
## 3 Wednesday      27406
## 4  Thursday      30734
## 5    Friday      30826
## 6  Saturday      36087
## 7    Sunday      32914

Or mean attendance by month:

dat.nym$month <- factor(as.POSIXlt(dat.nym$Date.P)$mo,
                      labels = c("April", "May", "June",
                                 "July", "August", "September"))
temp <- aggregate(Attendance ~ month, data = dat.nym, mean)
temp$Attendance = round(temp$Attendance, 0)
names(temp)[1] <- paste("Month")
##       Month Attendance
## 1     April      28613
## 2       May      28244
## 3      June      30943
## 4      July      35780
## 5    August      34135
## 6 September      26831

And, of course, the finer grained data allows us to test more relationships, like mean attendance by opponent. Here are the top five teams:

temp <- aggregate(Attendance ~ Opp, data = dat.nym, mean)
temp$Attendance = round(temp$Attendance, 0)
names(temp)[1] <- paste("Opponent")
temp <- temp[order(-temp$Attendance),]
rownames(temp) <- 1:18
head(temp, 5)
##   Opponent Attendance
## 1      NYY      42736
## 2      LAD      35627
## 3      PIT      35565
## 4      CHC      35511
## 5      WSN      34885

And this finer-grained data allows us to develop models that incorporate consideration of time:

dat$month <- factor(as.POSIXlt(dat$Date)$mo,
                      labels = c("March", "April", "May", "June",
                                 "July", "August", "September"))
dat$day <- factor(as.POSIXlt(dat$Date)$wday,
                      labels = c("Monday", "Tuesday", "Wednesday",
                                 "Thursday", "Friday", "Saturday", "Sunday"))
summary(lm(Attendance ~ month + day, dat))
## Call:
## lm(formula = Attendance ~ month + day, data = dat)
## Residuals:
##    Min     1Q Median     3Q    Max 
## -28160  -8806    -18   8168  30107 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     32092.0     1075.6  29.836  < 2e-16 ***
## monthApril      -3977.2     1108.0  -3.590 0.000334 ***
## monthMay        -3146.8     1102.0  -2.855 0.004316 ** 
## monthJune        -839.5     1100.5  -0.763 0.445598    
## monthJuly         136.7     1110.2   0.123 0.902040    
## monthAugust     -1428.9     1100.5  -1.298 0.194190    
## monthSeptember  -2943.5     1103.9  -2.666 0.007694 ** 
## dayTuesday      -5247.4      634.8  -8.266  < 2e-16 ***
## dayWednesday    -5023.0      550.1  -9.131  < 2e-16 ***
## dayThursday     -4592.4      554.6  -8.281  < 2e-16 ***
## dayFriday       -3461.8      593.3  -5.834 5.76e-09 ***
## daySaturday       167.4      541.2   0.309 0.757108    
## daySunday        3257.9      536.2   6.076 1.33e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 10660 on 4713 degrees of freedom
## Multiple R-squared:  0.09668,    Adjusted R-squared:  0.09438 
## F-statistic: 42.03 on 12 and 4713 DF,  p-value: < 2.2e-16

Special Considerations with Longitudinal Data

If you are getting into analyzing longitudinal data, there are three things to know from the outset:

  1. Special wrangling operations, particularly understanding how unit-time identifiers work and how to perform commonly-used data transformation operations associated with longitudinal analysis.
  2. Standard description methods employed with this kind of data
  3. Special modeling considerations to consider when using longitudinal data in regressions.

Each topic will be discussed in this module.

Post to WordPress through R Markdown

Blogs are a great communications tool. Finding a way to post to WordPress through R Markdown can be a nice efficiency hack. This post documents a method for generating WordPress posts through RStudio. It uses the RWordPress package by Duncan Temple Lang. This tutorial is basically a scaled-down version of Peter Bumgartner’s excellent tutorial.

At the time of posting, both that and this tutorial are distributed using a CC-BY-NC-SA license.

Set Up Your Session

Installing the RWordPress and XMLRPC Packages

The RWordPress package is being distributed through GitHub. It can be installed using the install_github() operation in the devtools package. You also need to install Lang’s XMLRPC package as well.


Load Necessary Packages

In addition to RWordPress and XMLRP, we need knitr, a suite of operations for generating reports.


Create / Load a Script with Your Credentials

To post to WordPress, your login and password must be included in the script. Posting or circulating a Markdown file with your WordPress login credentials creates a huge web security risk. On the other hand, we teach you to be open about your coding, so that others can review, replicate, and build on it. How to negoatiate the two?

Bumgartner’s solution is to add the following variables to your .Rprofile file using this code

options(WordPressLogin = c(yourUserName = 'yourPassword'),
        WordPressURL = 'yourWordPressURL')

Do not delete the quotation marks in this template. For the ‘yourWordPressURL’, be sure to add a “https://” before and and “/xmlrpc.php” after your site address. So, for the site “joeblog.com”, you would enter “https://joeblog.com/xmlrpc.php”

Once you have generated and stored the file, you can add it to any Markdown file that you intend to post. I called my file “WPCrds.R”:

source("E:/My Documents/BlogCreds.R") 
#Or wherever you store the file

Set Images to Post to imgur.com

To save time and resources, post your images to the open online site Imgur.com. To do so, add these knitr options to your script:

# To upload images to imgur.com
opts_knit$set(upload.fun = imgur_upload, base.url = NULL)  

# Set default image dimensions
# Change them to change the dimensions of figures
opts_chunk$set(fig.width = 5, fig.height = 5, cache = TRUE)

Generate a Polished html Report

Create a Markdown file that renders your report in html. Render it locally to something sufficiently polished, and then move on to the next step.

I will download and insert an image into the report, in case someone needs to do that:

#Download the image
  destfile = "BBCard.jpg")

Then, OUTSIDE the chunk, enter into the main text of the document this line, without the hashtag

![Baseball Card](BBcard.jpg)

Baseball Card
Baseball Card

Post Your Information

In another chunk, set your post title as the variable postTitle, and define fileName as the name of the Markdown file that you are using to generate the post:

postTitle = "Your Title"
fileName = "Your-Markdown-File.Rmd"

Post using the knit2wp command, whose options include:

  • the Markdown file name (option: input)
  • the post title (option: title)
  • whether the post should be published upon rendering (option: publish, set true to directly publish – not advisable)
  • for a new post, the action variable should be set to “newPost” (option: action)
  • to set the post’s category. Use the slugs designated on your site. (option: categories)
  • to set the post’s tags. Use the slugs designated on your site. (option: tags)
postID <- knit2wp(
        input = fileName, 
        title = postTitle, 
        publish = FALSE,
        action = "newPost",
        categoories = c("Insert Slugs here"),
        tags = c("insert slugs here"),
        mt_excerpt = "Insert excerpt here"

Clean It Up on WordPress

From here, you will have a draft post on your WordPress site. I find WordPress to have a far more user-friendly interface to complete the last steps of a post.

Remember: Risk the temptation to do everything in R if it costs a lot of time. Yes, it would be cool if you could do it. The problem is no one but data nerds will appreciate it, and there’s not a lot of data nerds out there.


To post to WordPress through R Markdown, start with this setup to your script. Remember that the source() line points to a file with your credentials. For this post, it is:


postTitle = "Post to WordPress via R Markdown"
fileName = "Using-RWordPress.Rmd"

postID <- knit2wp(
        input = fileName, 
        title = postTitle, 
        publish = FALSE,
        action = "newPost",
        categories = c("analytics"),
        tags = c("communications", "Markdown"),
        mt_excerpt = "Publish a Markdown report direct to your WordPress blog"

Then write the script until it is polished, and then you can post by running this code. I recommend that you set eval = F, so that it doesn’t run when you render the Markdown file into html. Then, once the file runs, execute this code on its own, not as part of a larger operation.

Then, polish the piece in WordPress.

I Couldn’t Finish My Book, So I Enlisted a Sports Psychologist

I recently came across a TED talk by Mihaly Csikszentmihalyi from Claremont Graduate University. I learned of Prof. Csikszentmihalyi’s work a few years ago, when I was struggling to finish my second book. It was part of a formative professional experience that influenced both my work and my career advising. I spoke about the experience in an episode of The Annex with Clayton Childress from the University of Toronto:

Here’s the story:

I was working on my 2017 book and hit a writer’s block that needed overcoming if I were going to finish it. There are times when you are in the zone and can produce page after page of great writing. Then there are others where your writing is garbage, and it is hard to get out a paragraph.

I reasoned that my situation was very similar to an athlete who has a hot hand versus one who is in a slump, and went to a sports psychiatrist to ask how he would treat me if, say, I were a pitcher who couldn’t hit the strike zone or a basketball player who freezes when he makes a free throw.

He told me that he often refers patients to the work of Csikszentmihalyi’s (1990) Flow: The Psychology of Optimal Experience. For me, the big take-away was this: Were I a pitcher, Csikszentmihalyi’s approach would advise me to enjoy the experience of the pitch, and to concentrate on each thrown ball, and to block out the batter, the inning, and all that. Being in the zone is a form of immersion into an activity where your mindset becomes focused on and develops a rhythm to the most basic elements of the activity. Throwing a baseball in pitching. Writing paragraphs in writing.

Here’s the TED talk for more:

This experience was formative for me in both my approach to work and my advising of students. It is very important to enjoy the immediate experience of the tasks involved with one’s job, and one can lead a pleasurable life if one is happy to jump into the day-to-day of their job on a regular basis. For me personally, it encouraged me to shed projects that I didn’t like and didn’t feel were rewarding, and to invest in things that may not be conventionally valued but were things that I did well, loved, cared about, and enjoyed.

For students who come to me seeking career advice, I tell them to spend their twenties and early thirties finding a place in the world that suits their natural dispositions and involves tasks that they enjoy doing. I strongly believe that it is possible to enjoy a comfortable, rewarding livelihood in any line of work, so long as you are really good at it. And you can’t get really good at something unless you are doing it day in and day out. And you can’t do something day in and day out if you don’t like doing it.

Photo Credit. Bain News Service, Publisher. Eric Erickson, pitcher, Detroit AL baseball. , 1917. Photograph. https://www.loc.gov/item/2014701275/.

Steps in a Data Analysis Job

Part of teaching data analytics involves giving students an idea of the steps involved in performing analytical work. Here is my take on the steps. Please offer your comments below. The goal is to prepare young people to do better work when they go out into the field.

1. Know Your Client

On whose behalf am I performing this analysis?

The first step in a data analysis job is pinning down the identity of your client. Ultimately, data analysis is about creating information. If you are going to do that well, you need some sense of your audience’s technical background, their practical interests, their preferred mode of reasoning or communicating, and much else.

2. Start With a Clear Understanding of the Question

What information does my client want? Why do they want this information? How do they plan to use it, and is this something that I want to be a part of? Can I answer it using the resources or tools at my disposal, or do I have a clear idea of which resources or tools need to be required to get the job done?

You do not want to take on an analysis job that lacks clear informational goals. You may have to work with a client to pin down those informational goals. Doing so requires that you identify factual information that would substantially influence your client’s practical decision-making choices. In addition, it is best to avoid (or at least avoid taking lead on) jobs that you can’t handle. It is probably also a good idea to avoid jobs that are not genuinely interested or influenced by your findings, or begin with an answer already in mind.

Many clients are not highly familiar with data analysis, and may not be highly numerate. It is your professional responsibility to ensure that they understand what data analysis can and cannot do.

3. What Am I Supposed to be Examining?

What are my objects of analysis?  Which qualities, behaviors, or outcomes am I assessing?  Why?

These kinds of questions point students towards working hard to pin down the research design that guides their study. Researchers should have a clear sense of whose personal qualities or behaviors need to be examined, and exactly what about them is relevant to the client’s decision-making problem or goal. If done well, the researcher should have a clear sense of the theoretical propositions or concepts that are most crucial to the client’s goals, and how to find analyzable data to examine them.

4. Assess and Acquire Data

How can I get data on these units’ characteristics or behaviors? How do I get access to them? What is the quality of the data’s sample (i.e., how representative is the data)? Do these measures look reliable and valid?

All analyses should make some assessment of the data, using the tools that you acquired in your methods class. As the saying goes: Garbage In, Garbage Out. Don’t bother bringing a battery of high-powered analytical tools to a crap data set.

5. Clean and Prepare Data

Secure data, correct errors, refine measurements, assess missingness, and identify possible outliers.

This is the process of turning the raw, messy data that one often encounters into a clean, tidy set that is amenable to the analytical operations of your toolkit.

6. Implement Analytical Operations.

Implement analytical procedures designed to extract specific types of information from data.

Over the course of the semester, we will study a range of tools to extract information from data.

7. Interpret Analytical Operations

Convert statistical results into natural-language explanation, and assess implications to client’s thought processes.

The computer does the job of processing the data. Your job is to interface that output with the problem being confronted by your client. In the interest of making them a partner, it helps to develop the skill of translating complicated results into cognitively-accessible, but still rigorous and correct, explanations that everyone can understand.

You empower clients by giving them access to what you see in the statistical results. Win their confidence on this front, and they will work with you again in the future.

8. Communication

Finding ways to convey information in a way that resonates with client.

These are the skills that will keep you from being someone’s basement tech.

Photo Credit. United States Office Of Scientific Research And Development. National Defense Research Committee. Probability and Statistical Studies in Warfare Analysis. Washington, D.C.: Office of Scientific Research and Development, National Defense Research Committee, Applied Mathematics Panel, 1946. Pdf. https://www.loc.gov/item/2009655233/.

Avoid Starting Out by Chasing the Latest Hot Thing

I’m passing along an excellent post by Terence Shin that I caught on the data analysis blog KDnuggets. Mr. Shin’s post speaks to those of you who have a great enthusiasm for the latest hot thing in data analysis — the skills or tools that are being discussed on the blogs and in the press. Currently, one of these hot things is machine learning. There is always something.

Read the post yourself. It is very well done. Its key points are: (1) Machine learning (or any hot thing in data analysis) is only one part of the toolkit that you will have to wield to solve real-world problems, and (2) A proper engagement of machine learning (or any advanced skill) requires an understanding of foundational skills.

To my mind, learning data analysis is similar to learning any occupational skill. It is not altogether different from learning to be a craftsperson, like an electrician or carpenter. One group of trainees learns how to wire a house, and another learns to translate raw data into information to support decision-making. It is possible to write and execute scripts to perform highly-complicated statistical operations using documentation and web searches, much like I could perform a do-it-yourself rewiring of my home by watching YouTube videos. However, as you gain practical experience in this field, you will find that there is much more to the job than knowing how to wield the most complicated piece of equipment.

This kind of learning strategy seems likely to hinder students’ overall development as data analysts. Start by mastering learning basic tools to solve simple problems, and work your way up to more complex problems and more sophisticated tools. This strategy puts you in the position of a problem-solver and question-answerer, and something more than a programming tech. I think such a strategy is more likely to result in you being a competent entry-level analyst upon graduation.

That being said, I do not want to discourage students from experimenting with stuff while they are learning. That passion will keep you pushing your boundaries and learning over your career, because there is plenty to learn over a lifetime (I’m still learning). Do not extinguish that passion. Just make sure that you don’t become over-focused on this or that method to the detriment of becoming a solid analyst with a good grasp of their toolkit.

Photo Credit. Johnson, Paula J, and Michael Crummett. Wheelwrights and cartwrights Dale Thibault and Harvey Howes, Miles City, Montana. United States Miles City Montana, 1979. Miles City, Montana. Photograph. https://www.loc.gov/item/afc1981005_01_22928/.

Need help with the Commons? Visit our
help page
Send us a message