As much as I can, I only integrate a data analysis component to my introduction to sociology classes. I am not trying to do anything really complicated but I want my students to get a very basic taste of what it means to think with data. For a long time, I had the perfect tool at hand in the form of Microcase Workbooks. There were several of them (a couple for introduction, one for marriages and families, one for social research). MicroCase is a bare bone version of more commons statist8ical software in the social sciences. It is a small program (does not take too much space on your hard drive) that runs on Windows only. However, it uses the GSS, American Community Survey, the World Value Survey and gives students the opportunity to select their own variables, construct their own tables / maps / pie charts / scatterplots / time lines. Students have always found it easy to use and actually fun. Well, that is over as the publisher decided to no longer update the software or the databases. Since then, I have been looking for alternatives. And, of course, publishers’ reps have been more than eager to try to sell me on their latest tools… which are all inadequate for my own purpose. And sending intro students into SPSS is out of the question… heck, I don’t want to go into SPSS.
So, what is a SocProf supposed to do? Well, there are now tons of data and databases that are publicly available. Why not create my own exercises? It will be cheaper to my students and my exercises can be exactly the way I want them. There are also now a lot of visualizing tools, either directly provided by the same organizations that make the data available (like the UN development report or Gapminder). I don’t get dependent upon the good will of a corporate publisher to keep on updating a product that is going to be costly to students. Win-win. On the losing side, it is going to be time-consuming to build up these exercises. I just spent an hour cleaning up data from the CDC on suicide in the US. And it the visualization tools are not available, I can always use Tableau.
So, for instance, indeed, I started simple with some data on suicide in the US. The CDC was the organization with the most data on that. Starting with this:
The first problem with this map is that it is not interactive and the level of detail (by county) makes it a bit busy even if you can clearly regional patterns. These regional patterns actually make for an interesting puzzle for my students to solve. That can be a starting point but it is hard to create rankings, for instance.
A second option is to use the CDC interactive tool through WISQARS. So, basically, it looks like this:
As you can see above, you have a series of menus, drop down and radio buttons. You can filter things out. I kept the entire US but I selected “suicide” for intent of injury. And I kept the largest spread (2000 – 2006). I kept all the demographic subset at default. And I got this as a result:
Several problems, with this: (1) on the right hand side, it says “Hover over a state with your mouse to see its name and rate”… that does not work. I tried different browsers including *gasp* Explorer, and no dice. (2) The export data function creates a csv file that takes a lot of cleaning up if you want to do the most simple statistical operations and visualizations. Which is what I ultimately ended up doing in Tableau Public (sorry, the embed still does not work).
The map, though, shows the same pattern as the county one above.
Third option, if you really want an interactive map, and still from the CDC, there is another interactive tool that is a bit trickier to manipulate but does the job: Health Data Interactive:
Again, you get to set your options and get an interactive map (with some missing data and only 44 states reporting, which is kinda annoying).
Beyond maps, though, the CDC has some good data visualizations but again, the raw data are harder to track down. For instance, you can get a broad overview over time:
Again, you can set up some interesting questions regarding the shifts in age categories with the highest suicide rates, when the shift happened and why. But you can drill down even further and consider race and ethnicity:
Why whites and American Indian / Alaskan Native / Pacific Islanders (from my little Tableau thing, we already know that Alaska has a high rate)?
Ok, let’s add sex into the mix:
Across the board, men are way more likely to commit suicide than women. Adding sex does not alter the racial / ethnic patterns. So, should we pity white men after all?’
Finally, let’s add age. Let’s start with the 10-24 age category:
One can only ask, what is going on with young American Indian / Alaskan Native / Pacific Islanders? Whites are no longer strikingly higher than other racial and ethnic category, for that age category.
But once you move up the age ladder, into the 25 – 64 category:
Then, whites pop up again in the higher rates.
Ok, how about 65 and older:
See what happens with American Indian / Alaskan Native / Pacific Islanders? And Whites?
Ok, how about some trends?
Note the uptick with the recession. Otherwise, a familiar gender pattern.
Let’s separate men and women and compare by age categories, first, for men:
The interesting trend here is the progressive joining of the 25-64 (up) and the 65+ (down).
Now, we already know that women are much less likely to commit suicide than men. And this visualization has an extra age category but one can see that the relative increase is greater for women than men. This is especially the case in the 45-54 category.
And now, for the fun of a different visualization, let’s add yet another variable: the means of suicide:
I am normally not a big fan of stacked bars, but in this case, I think it works. You can clearly see that men are more likely to use a firearm in all age categories whereas suffocation and poisoning are more used by women. One could explore access and cultural factors in the decision to use one mechanisms or another to kill oneself.
This gender aspect is more visible if one filters out other variables:
So, as you can see, there is a lot to explore and a lot of sociological puzzles to be solved, just by using some very basic data, with limited variables, and just by using publicly available data visualizations.
I’ll continue to share these things as I build them.