Loading [MathJax]/jax/element/mml/optable/BasicLatin.js

Tuesday, February 24, 2015

Female Academic Minions

I came across a tweet from @paulcoxon and immediately thought to myself, where are all the female minions?


So I fixed it!  :)



Original tweets can be found here:

Monday, February 23, 2015

When are Statistics Jobs Posted?

One of the better websites that regularly posts Statistics jobs is this one at the Department of Statistics at the University of Florida. If you have ever looked for a postdoc, a faculty position, government job or applied statistician type-job in statistics, you have probably come across this website before.

As I am currently a postdoc, I was curious about two things: (1) What is the most frequent type job posted? Who is the target audience of this website? (2) If certain types of statistics jobs had a preferred target range over the academic year?

To do this, I enlisted the help of some wonderful R-packages from Hadley Wickham to help with the gathering of the data (rvest), cleaning the data (stringr, lubridate) and visualizing the data (ggplot2).  One caveat about this data is the website only posts the job postings from August 2014 until now.  The R code is available below in Rmarkdown and Markdown and in a gist.

For simplicity, I grouped the type of positions into four categories:

1. faculty = tentured or non-tenured faculty position including chairs, deans and department heads.
2. postdoc = postdoctoral fellows
3. lecturer = lecturer or instructor
4. statistican = a statistican whose primary role is data analysis or managing other data analysts.

The majority of statistics jobs posted on the UF website since August 2014 have been faculty positions.


Statistics job postings are fairly uniformly posted Mon-Fri on this UF website.


The frequency of the statistics job postings increase Sept - Nov.


This increase in the months Sept-Nov is mostly driven by academic faculty positions (not surprisingly).  If you are looking for postdoc positions, they tend to be more frequently posted after this time period.  Lecturer/Instructor positions are fairly uniform. Similarly, applied statistician jobs do not seem to have a peak range. 



UF Department of Statistics Job Postings

Stephanie Hicks
23 Feb 2015

Purpose

This Rmd uses the UF Department of Statistics Job Postings website to determine the frequency of faculty, postdoc, lecturer and statistican jobs over the academic year.

One caveat: The website only has data starting from Aug 2014 up until now, so I cannot include the postings over the summer, but I am interested in seeing how these plots differ after including spring and summer of 2015.

Load libraries

library(rvest)
library(stringr)
library(lubridate)
library(ggplot2)

Scrape data

First, we scrape the tables from the UF Statistics Jobs website.

I'm using the rvest package to parse the html page. The data is contained in tables in the html pages, so I'm using the html() and html_table() functions to parse the html and parse the tables in the html pages, respectively.

pgs = vector("list", 17)
for(i in 1:17){
    jobs <- html(paste0("http://www.stat.ufl.edu/jobs/?page=", i))
    pgs[[i]] = do.call(rbind, html_table(jobs))
}
dat = do.call(rbind, pgs)
colnames(dat) = c("Location", "Description", "Date")

These are the top 10 most frequent job description titles.

head(sort(table(dat$Description), decreasing = TRUE), 10)
## 
##                Assistant Professor                Postdoctoral Fellow 
##                                 26                                 18 
##                    Biostatistician Assistant/Associate/Full Professor 
##                                 17                                 11 
##                       Statistician      Assistant/Associate Professor 
##                                 10                                  8 
##   Tenure Track Assistant Professor            Postdoctoral Fellowship 
##                                  8                                  7 
##   Assistant or Associate Professor  Assistant Professor of Statistics 
##                                  6                                  6

Data Cleaning

Using the str_detect() function in the stringr R package, we can use regular expressions to subset the data frame for any jobs that match the pattern "Lecture".

head(dat[str_detect(dat$Description, "Lecture"),])
##                                    Location
## 9   INDIANA UNIVERSITY / BLOOMINGTON CAMPUS
## 15                    Mount Holyoke College
## 19                    University of Glasgow
## 100                Department of Statistics
## 118                      Harvard Statistics
## 119                      Harvard Statistics
##                                           Description       Date
## 9                                            Lecturer 02/17/2015
## 15                    Visiting Lecturer in Statistics 02/12/2015
## 19  Lecturer / Senior Lecturer / Reader in Statistics 02/10/2015
## 100                       Full Time Lecturer Position 12/23/2014
## 118                                          Lecturer 12/15/2014
## 119                                   Senior Lecturer 12/15/2014

Because the str_detect() function can only accept one pattern, we can use the paste() function to get around that fact and subset the rows matching either "Lecture" or "Instructor".

head(dat[str_detect(dat$Description, paste(c("Lecture", "Instructor"), collapse='|')),])
##                                    Location
## 9   INDIANA UNIVERSITY / BLOOMINGTON CAMPUS
## 15                    Mount Holyoke College
## 19                    University of Glasgow
## 100                Department of Statistics
## 118                      Harvard Statistics
## 119                      Harvard Statistics
##                                           Description       Date
## 9                                            Lecturer 02/17/2015
## 15                    Visiting Lecturer in Statistics 02/12/2015
## 19  Lecturer / Senior Lecturer / Reader in Statistics 02/10/2015
## 100                       Full Time Lecturer Position 12/23/2014
## 118                                          Lecturer 12/15/2014
## 119                                   Senior Lecturer 12/15/2014

For simplicity, I grouped the data into four categories:

  1. faculty = tentured or non-tenured faculty position including chairs, deans and department heads.
  2. postdoc = postdoctoral fellows
  3. lecturer = lecturer or instructor
  4. statistican = a statistican whose primary role is data analysis or managing other data analysts.
I_faculty = str_detect(dat$Description, paste(c("Professor", "Tenure", "tenure", "Faculty", 
                                             "Assistant", "Chair", "Dean", "Department", 
                                             "Head"), collapse='|'))
I_postdoc = str_detect(dat$Description, paste(c("Post", "Fellow"), collapse='|'))
I_lecturer = str_detect(dat$Description, paste(c("Lecture", "Instructor"), collapse='|'))
I_statistician = str_detect(dat$Description, paste(c(ignore.case("Biostatistic"), 
                                                  "Statistician", "Scientist", 
                                                  "Staff", "Professional", "Analyst", 
                                                  ignore.case("Researcher"), "Programmer", 
                                                  "Research Associate", "Master",
                                                  "Manager", "Director", "Investigator",
                                                  "Specialist", "Consultant", "VP", 
                                                  "Bioinformatician", "Biometrician", 
                                                  "Computational"), collapse='|'))

Now, let's create a new column variable called "Position" with the job titles

dat$Position <- ifelse(I_postdoc, "Postdoc", ifelse(I_faculty, "Faculty", 
                                         ifelse(I_lecturer, "Lecturer", 
                                         ifelse(I_statistician, "Statistician", "Other"))))
dat[which(dat$Position == "Other"),]
##                             Location
## 56   IDEAS European training network
## 74                Aerojet Rocketdyne
## 143      Odyssey Reinsurance Company
## 184              NC State University
## 328               Indiana University
## 340  Univeristy of California, Davis
## 420 Applied Research Solutions, Inc.
## 448            Computational Biology
##                                   Description       Date Position
## 56                 14 Early stage researchers 01/26/2015    Other
## 74                          Summer Internship 01/16/2015    Other
## 143                    Underwriting Associate 12/02/2014    Other
## 184             Grants Proposal Administrator 11/11/2014    Other
## 328                        Bloomington Campus 10/03/2014    Other
## 340                                Statistics 09/30/2014    Other
## 420 Test and Evaluation Subject Matter Expert 09/04/2014    Other
## 448                  University of Pittsburgh 08/27/2014    Other

We see there are a few descriptions that were not able to be categorized using the regex patterns provided above. We'll use some google-fu next to determine where they belong.

Turns out the "University of Pittsburgh" advertisement is for a postdoc. The "Bloomington Campus" and "Statistics" advertisements are for faculty positions. The "14 Early stage researchers" are for statistician positions. I removed the last four ("Summer Internship", "Underwriting Associate", "Grants Proposal Administrator", "Test and Evaluation Subject Matter Expert") as I don't think they are relevant to the analysis here.

dat[which(dat$Description == "University of Pittsburgh"),]$Position <- "Postdoc" 
dat[which(dat$Description %in% c("Bloomington Campus", "Statistics")),]$Position <- "Faculty"
dat[which(dat$Description %in% c("14 Early stage researchers")),]$Position <- "Statistician"
dat = dat[!(dat$Description %in% c("Summer Internship", "Underwriting Associate",
                                      "Grants Proposal Administrator", 
                                      "Test and Evaluation Subject Matter Expert")),]
dat[which(dat$Position == "Other"),]
## [1] Location    Description Date        Position   
## <0 rows> (or 0-length row.names)

OK, so now we have dealt with grouping all the positions. Let's use the lubridate R package to make the Date column more R friendly. I'm using the mdy() function to tell R this column contains dates in the form of "month/day/year". The month() function extracts the month from each of the rows.

table(month(mdy(dat$Date)))
## 
##   1   2   8   9  10  11  12 
##  53  44  53  96 111  78  48

Let's add a few other columns to our data frame.

dat$Position = factor(dat$Position)
dat$Date = mdy(dat$Date)
dat$month = factor(month(dat$Date, label=TRUE), 
                   levels = c("Aug", "Sep", "Oct", "Nov", "Dec", "Jan", "Feb"))
dat$dayOfWeek = wday(dat$Date, label = TRUE) # day of week

Data visualization

The frequency job postings by position, day of the week and month:

ggplot(dat, aes(x = Position)) + geom_bar() # frequency job by type

ggplot(dat, aes(x = dayOfWeek)) + geom_bar(position="dodge")

ggplot(dat, aes(x = month)) + geom_bar() # frequency job by month

Job postings by date, day of the week and month (colors represent the type of position).

ggplot(dat, aes(x = Date, fill = Position)) + geom_bar(position="dodge")

ggplot(dat, aes(x = dayOfWeek, fill = Position)) + geom_bar(position="dodge")

ggplot(dat, aes(x = month, fill = Position)) + geom_bar(position="dodge")

Most academic faculty positions are posted Sept-Nov and most postdoc positions are posted after that time period.

---
title: "UF Department of Statistics Job Postings"
author: "Stephanie Hicks"
date: "23 Feb 2015"
output: html_document
keep_md: TRUE
---
## Purpose
This Rmd uses the UF Department of Statistics Job Postings website to determine
the frequency of faculty, postdoc, lecturer and statistican jobs over the
academic year.
One caveat: The website only has data starting from Aug 2014 up until now,
so I cannot include the postings over the summer, but I am interested in seeing
how these plots differ after including spring and summer of 2015.
#### Load libraries
```{r, message=FALSE}
library(rvest)
library(stringr)
library(lubridate)
library(ggplot2)
```
#### Scrape data
First, we scrape the tables from the UF Statistics Jobs website.
I'm using the `rvest` package to parse the html page. The data is contained in
tables in the html pages, so I'm using the `html()` and `html_table()`
functions to parse the html and parse the tables in the html pages,
respectively.
```{r}
pgs = vector("list", 17)
for(i in 1:17){
jobs <- html(paste0("http://www.stat.ufl.edu/jobs/?page=", i))
pgs[[i]] = do.call(rbind, html_table(jobs))
}
dat = do.call(rbind, pgs)
colnames(dat) = c("Location", "Description", "Date")
```
These are the top 10 most frequent job description titles.
```{r}
head(sort(table(dat$Description), decreasing = TRUE), 10)
```
#### Data Cleaning
Using the `str_detect()` function in the `stringr` R package, we can
use regular expressions to subset the data frame for any jobs that match the
pattern "Lecture".
```{r}
head(dat[str_detect(dat$Description, "Lecture"),])
```
Because the `str_detect()` function can only accept one pattern, we can
use the `paste()` function to get around that fact and subset the rows matching
either "Lecture" or "Instructor".
```{r}
head(dat[str_detect(dat$Description, paste(c("Lecture", "Instructor"), collapse='|')),])
```
For simplicity, I grouped the data into four categories:
1. faculty = tentured or non-tenured faculty position including chairs, deans
and department heads.
2. postdoc = postdoctoral fellows
3. lecturer = lecturer or instructor
4. statistican = a statistican whose primary role is data analysis or
managing other data analysts.
```{r}
I_faculty = str_detect(dat$Description, paste(c("Professor", "Tenure", "tenure", "Faculty",
"Assistant", "Chair", "Dean", "Department",
"Head"), collapse='|'))
I_postdoc = str_detect(dat$Description, paste(c("Post", "Fellow"), collapse='|'))
I_lecturer = str_detect(dat$Description, paste(c("Lecture", "Instructor"), collapse='|'))
I_statistician = str_detect(dat$Description, paste(c(ignore.case("Biostatistic"),
"Statistician", "Scientist",
"Staff", "Professional", "Analyst",
ignore.case("Researcher"), "Programmer",
"Research Associate", "Master",
"Manager", "Director", "Investigator",
"Specialist", "Consultant", "VP",
"Bioinformatician", "Biometrician",
"Computational"), collapse='|'))
```
Now, let's create a new column variable called "Position" with the job titles
```{r}
dat$Position <- ifelse(I_postdoc, "Postdoc", ifelse(I_faculty, "Faculty",
ifelse(I_lecturer, "Lecturer",
ifelse(I_statistician, "Statistician", "Other"))))
dat[which(dat$Position == "Other"),]
```
We see there are a few descriptions that were not able to be categorized using
the regex patterns provided above. We'll use some google-fu next to determine
where they belong.
Turns out the "University of Pittsburgh" advertisement is for a postdoc. The
"Bloomington Campus" and "Statistics" advertisements are for faculty positions.
The "14 Early stage researchers" are for statistician positions. I removed
the last four ("Summer Internship", "Underwriting Associate",
"Grants Proposal Administrator", "Test and Evaluation Subject Matter Expert")
as I don't think they are relevant to the analysis here.
```{r}
dat[which(datDescription == "University of Pittsburgh"),]Position <- "Postdoc"
dat[which(datDescription %in% c("Bloomington Campus", "Statistics")),]Position <- "Faculty"
dat[which(datDescription %in% c("14 Early stage researchers")),]Position <- "Statistician"
dat = dat[!(dat$Description %in% c("Summer Internship", "Underwriting Associate",
"Grants Proposal Administrator",
"Test and Evaluation Subject Matter Expert")),]
dat[which(dat$Position == "Other"),]
```
OK, so now we have dealt with grouping all the positions. Let's use the
`lubridate` R package to make the Date column more R friendly. I'm using the
`mdy()` function to tell R this column contains dates in the form of
"month/day/year". The `month()` function extracts the month from each of the
rows.
```{r}
table(month(mdy(dat$Date)))
```
Let's add a few other columns to our data frame.
```{r}
datPosition = factor(datPosition)
datDate = mdy(datDate)
datmonth = factor(month(datDate, label=TRUE),
levels = c("Aug", "Sep", "Oct", "Nov", "Dec", "Jan", "Feb"))
datdayOfWeek = wday(datDate, label = TRUE) # day of week
```
#### Data visualization
The frequency job postings by position, day of the week and month:
```{r}
ggplot(dat, aes(x = Position)) + geom_bar() # frequency job by type
ggplot(dat, aes(x = dayOfWeek)) + geom_bar(position="dodge")
ggplot(dat, aes(x = month)) + geom_bar() # frequency job by month
```
Job postings by date, day of the week and month (colors represent the type
of position).
```{r, message=FALSE}
ggplot(dat, aes(x = Date, fill = Position)) + geom_bar(position="dodge")
ggplot(dat, aes(x = dayOfWeek, fill = Position)) + geom_bar(position="dodge")
ggplot(dat, aes(x = month, fill = Position)) + geom_bar(position="dodge")
```
Most academic faculty positions are posted Sept-Nov and
most postdoc positions are posted after that time period.

Monday, February 16, 2015

Simple Sour Cream Muffins - Three Ways



Sour cream is an ingredient I love to use in baking. It can add a tart flavor and creamy texture to many, different baked goods! If you don't have sour cream on hand, I find greek yogurt is a good substitute too.

During the week day mornings, I'm always running behind schedule and do not have time to make oatmeal or eggs (which I love to do on the weekends).  I'm also one of those people that needs to eat something in the morning or otherwise I feel like lunch can never come soon enough.  Muffins have been my favorite breakfast because they are so portable and they freeze really well! Yes, you heard right! On the weekends I will make a batch or two of muffins and they will last me a good month.  Rather than paying $2-4 for a muffin every day, I take a muffin out of the freezer and pop it into the microwave for 30-45 seconds when I get to work.  Add a cup of coffee and I'm set until lunch!

Today I'm showcasing how sour cream can be used in a diverse set of muffins: blueberry, pumpkin and corn bread muffins.



Sour Cream and Blueberry Muffins

Ingredients:

- 3/4 cup granulated sugar
- 1/4 cup unsalted butter, room temp
- 2 large eggs
- 2 tsp vanilla extract
- 1/2 cup sour cream
- 2 1/4 cups all-purpose flour
- 1 1/2 tsp baking powder
- 1/2 tsp baking soda
- 1/2 tsp salt
- 1 1/2 cups blueberries, fresh or frozen



Recipe:

1) Pre-heat oven to 375. Grease 12 muffin tins or paper baking cups.

2) Mix sugar and butter. Add in vanilla extract, eggs and sour cream.

3) Mix together the flour, baking powder, baking soda and salt.  Add the dry ingredients to the wet ingredients, but do not over mix.

4) Fold in blueberries.  Spoon the muffin batter into the 12 muffin tins.

5) Bake at 375 for 18-22 mins. Let cool on a wire rack.

Optional: Add the zest of 1 lemon to make lemon blueberry sour cream muffins.  Add coarse white sparkling sugar for garnish.




Sour Cream and Pumpkin Muffins

Ingredients:

- 1/3 cup brown sugar
- 1/4 cup canola oil
- 1 tsp vanilla extract
- 3/4 cup pure pumpkin purée
- 1/2 cup sour cream
- 1 egg
- 1/2 cup all-purpose flour
- 1/2 cup whole-wheat flour
- 1/2 tsp baking powder
- 1/2 tsp baking soda
- 1/2 tsp ground cinnamon
- 1/4 tsp ground ginger
- 1/4 tsp salt
- 3 tbsp raw pumpkin seeds



Recipe:

1) Pre-heat oven to 350. Grease 12 muffin tins or paper baking cups.

2) Mix brown sugar and oil. Add in vanilla extract, egg, pumpkin puree and sour cream.

3) Mix together the flours, baking powder, baking soda, cinnamon, ginger and salt.  Add the dry ingredients to the wet ingredients, but do not over mix.

4) Spoon the muffin batter into the 12 muffin tins. Sprinkle on raw pumpkin seeds.

5) Bake at 350 for 20 mins. Let cool on a wire rack.




Sour Cream and Cornbread Muffins

Ingredients:

- 1/4 cup unsalted butter, room temp
- 3 tbsp granulated sugar
- 2 large eggs
- 1/2 cup sour cream
- 1/2 cup milk (I used almond milk as a substitute)
- 1 cup all-purpose flour
- 2/3 cup yellow cornmeal
- 1 1/2 tsp baking powder
- 1/2 tsp baking soda
- 1/2 tsp salt


Recipe:

1) Pre-heat oven to 425. Grease 12 muffin tins or paper baking cups.

2) Mix sugar and butter. Add in vanilla extract, eggs, milk and sour cream.

3) Mix together the flours, baking powder, baking soda and salt.  Add the dry ingredients to the wet ingredients, but do not over mix.

4) Spoon the muffin batter into the 12 muffin tins.

5) Bake at 425 for 15-17 mins. Let cool on a wire rack.





Finish with some jam or a big pad of butter!