Data Resources

Resources in Social Science Data Analysis

This page highlights some tips, techniques, and tools that I have found useful for data analysis of large-scale quantitative surveys that are common in social science research. Many of the links here will have broad applicability for a wide range of general data analysis and research questions.

ipums_data_prep.py - A Python script to prepare IPUMS dataset extracts for loading into an RDBMS

If you do not have access to a statistical package or are otherwise interested in loading datasets downloaded from the IPUMS Project, this script will help prepare the files for loading into a relational database such as PostgreSQL

Graph.py - A Python Graph Library

Graphs are among the most fundamental and versatile data structures in all of computer science. Their applicability is so broad that I find myself constantly reinventing the wheel building a graph library from project to project. In an attempt to consolidate, here is a basic outline of my own implementation of graphs using Python, a language very amenable to high-level data structures and algorithms. You can view the source above or download the file here as graph.py.txt.

spssread.pl - A Perl Script to extract metadata from SPSS data files

This is a perl script I wrote to extract metadata (aka "dictionary data") from SPSS (sav) files without using the SPSS program itself. The script can read variable names and labels, as well as all value labels associated with the variables in a data file.

Dataset Aggregation and Transposition

This paper explores some common ways to transform datasets using aggregation, disaggregation, and transposition techniques. The examples here are simplistic, but they represent basic transformation problems that researchers are likely to encounter. Emphasis is on using SAS and SPSS to transform datasets, with some discussion on the benefits and limits to working in an SQL based environment.