About

We are studying the question of how US-based PT programs’ required observation hours (henceforth referred to as “criterion hours”) for prospective students vary with geographic location.

Dataset

The dataset was sourced from the PTCAS website (for programs & criterion) & Google Maps (for program locations). It has the following shape:

dim(oh_data)
## [1] 249   8

The first row indicates the number of programs (rows) and the later indicates the number of descriptive qualities (columns). Previewing it:

head(oh_data, n=10)

Outlining the non-trivial columns:

Summary Statistics

Requirement Category

oh_data %>%
  ggplot() +
    geom_bar(aes(as.factor(Category))) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    xlab('Requirement Category') +
    ylab('Number of Programs')

where the categories are:

categories

Criterion Hours

Here’s a summary of the number of criterion hours:

summary(oh_data$criterion_hours)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   10.00   40.00   50.00   65.03  100.00  300.00      24

The proportion of programs with criterion hours are:

sum(!is.na(oh_data$criterion_hours)) / nrow(oh_data)
## [1] 0.9036145

The proportion of programs with required or highly recommended hours are:

sum(oh_data$Category < 4) / nrow(oh_data)
## [1] 0.9638554

The distribution of hours across all programs looks like:

oh_data %>%
  filter(!is.na(criterion_hours)) %>%
  ggplot() +
    geom_histogram(aes(criterion_hours), bins = 15)

Peeking at distributions conditioned on the requirement category (specifically 1 & 2, which do impose a requirement):

oh_data %>%
  filter(!is.na(criterion_hours) & Category %in% c(1,2)) %>%
  ggplot() +
    geom_violin(aes(as.factor(Category), criterion_hours))

The distributions are similar, with modes at 40 & 100 hours.

Mean criterion hours:

mean(oh_data$criterion_hours, na.rm = TRUE)
## [1] 65.03111

Standard deviation:

sd(oh_data$criterion_hours, na.rm = TRUE)
## [1] 38.94598

Incorporating Geography

For analysis, we will only include programs in the contiguous US:

oh_analysis_data <- oh_data %>%
  filter(Instituition != 'University of Puerto Rico' & Instituition != 'Andrews University')

Visualizing

Points

First, we view each program as a point on the US map, where color indicates the hours requirement:

us <- map_data('state')
usa <- map_data('usa') %>%
  filter(region == 'main')

gg <- oh_analysis_data %>%
  filter(!is.na(criterion_hours)) %>%
  ggplot()  +
    geom_polygon(data = us, aes(x=long, y = lat, group = group), fill='grey', color='white') +
    geom_point(aes(x=Longitude, y=Latitude, color = criterion_hours), size=10) + 
    scale_color_distiller(palette=1, direction=1) + 
    guides(color=guide_colorbar(title='Required Hours',
                               barwidth = 2, barheight = 20, 
                               title.theme=element_text(size=15), 
                               label.theme = element_text(size=20))) +
    ggtitle('PTCAS Program Criterion Hours') +
    theme(plot.title = element_text(size=30),
          axis.text=element_text(size=20),
          axis.title=element_text(size=20)) +
  xlab('Longitude') +
  ylab('Latitude') +
    theme(plot.title = element_text(size=30),
        axis.text=element_text(size=20),
        axis.title=element_text(size=20),
        axis.line=element_blank(),axis.text.x=element_blank(),
        axis.text.y=element_blank(),axis.ticks=element_blank(),
        axis.title.x=element_blank(),
        axis.title.y=element_blank(),
        panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),plot.background=element_blank())

gg