Gender Equality in STEM Programs

Recently I’ve been interested in investigating how gender equality – or equivalently, inequality – has evolved over time in Canada. Using the University of Waterloo’s public Cognos cubes, specifically those for undergraduate enrollment, I have found some pretty interesting results. Below, I will detail a brief summary of these findings.

To begin, let’s talk a bit more about our population of study and the data that is used in our analysis. The target population that I’m examining here is set of all undergraduate university students and the sample population is the set of all undergraduate students who have enrolled at the University of Waterloo. The sample chosen in the analysis is the sample population restricted to students who have enrolled between 1996 to 2013 where terms are only selected in the sample if there are at least 20 distinct programs with at least 1 student in them. We make no distinction between students enrolled in or not enrolled in a co-operative program.

All analysis and visualization is done in the free academic version of Revolution R, Version 7.

For each date, at the Program (e.g. Life Sciences) and Faculty (e.g. Science) level, a total is computed for each of the female and male genders and the percentage of females is then calculated as |Females|/(|Males| + |Females|). A rendered ordered bar chart at each date, with the Program on the x-axis and percentage female on the y-axis, is then generated using the ggplot2 R package and a GIF animation of these charts is produced to study the time evolution as seen below.


Bar colors are dependent on the Faculty of the Program. Click the image above to properly view the animation.

The abbreviations for each Program can be clarified here. A quick scan over the image shows that there does not appear to be any noticeable change in the overall shape other than a -very- slight flattening of the center bars and slight increase in the slopes near the extreme ends during later years.

Using this data, I use the following method as a crude estimate for Faculty-wide, time dependent gender bias, where I define this as how gender bias a Faculty is relative to past or future states or enrollments of the university. Suppose that for a fixed date we have n programs and P=\{P_{1,F1}, P_{2,F2}, ..., P_{n,Fn}\} is a set of ordered values of percentages of females in n different programs, ordered by least to greatest percentage of females in the first index, and where the second index is representative of the Faculty in which the program falls under. Let P_{F}=\{P_{k,Fk} \in P: Fk = F\} and n_{F} = | P_{F} |. Then for each Faculty F, we denote the (female-dominated) gender bias as

G(F)= (P_{n_{F},Fn_{F}}+P_{n_{F-1},Fn_{F-1}}+P_{n_{F-2},Fn_{F-2}}) / 3n

Which we can think of as a three term average [1] of the quantile of the three most female dominated programs. A value close to 100% (less biased) is generally preferred.

Taking only the STEM Faculties into consideration (SCI, ENG, MATH), we plot out this measure over time using the lattice R package below:

Gender Bias over Time

The blue circles indicate points in time, the red lines are LOESS curves and the green lines are smoothing splines. The science faculty seems to follow a rather sinusoidal trend, the engineering faculty a mostly linear trend, except for the sudden rise in the 2003-2005 date range, and the maths faculty being the most sporadic of the three. There is an apparent outlier near the 2008 year in the maths faculty, although this may be explained by the increased interest in the new FARM program and other finance related programs in light of the latest U.S. recession.

A least squares regression with slope and intercept interaction factors is also done in R for computing long term trends and is shown below:

R Regression

Here, Idx is just a normalized Date variable. From the results, we can see that the long-run growth in the MATH and SCI Faculties are not significantly different from one another and we can expect a long-term growth of female gender bias of approximately 0.23% every term in these faculties in the near future, while for engineering, this is closer to 0.03%.

With this in mind, it looks like we won’t be seeing fair gender equality for at least 2 decades for the sciences and several times that amount for the mathematics and engineering faculties.

To replicate these results, as well as see the charts above in higher resolution and examine the source data, you can check out the relevant Skydrive directory here.

If you have any comments or suggestions for future statistical projects, let me know in the comments section below.

[1] An average is done here in order to smooth out any outliers, which from the data we can see a few, particularly in the architecture program.