Hi!

czep

About czep

I am a Programmer/Analyst Graduate Student Research Analyst Director of Research Director of Engineering Business Intelligence Engineer Senior Statistician Director of Analytics Data Scientist.

Thus far my career has spanned four US Presidents and six generations of high school students. I’ve properly worked for eight different companies and consulted for dozens more.

If we overlook Pizza Delivery Driver, my first real job title out of college was Programmer/Analyst. I had travelled West in search of the Great American Novel, and instead found employment at a software company. I worked on an Informix database using the ACE Report Writer, PERL (it was all caps back then), and csh scripts for HP-UX machines in the IS department (Information Systems) of a software company that made mainframe connectivity software. Terminal emulators. This was when big data meant reel-to-reel tape and email addresses were a novelty. I often shipped data to customers in the mail, on floppy disks spanned using PKZIP. The first website I ever created dates from this period and it is partially immortalized at the Wayback Machine: philjrhigh.

After two years of writing SQL, playing Doom, reading postmodern fiction, and discussing Spinoza and Quine with philosophy grad students, I moved across the country again to start graduate school in Sociology at SUNY Stony Brook where I also took classes in the Applied Math and Computer Science departments. I learned methodology and statistics from Michael Schwartz and Judith Tanur, regression analysis from Stephen Finch, algorithms from Michael Bender, probability from Michael Taksar, and the wondrous art of SAS programming from Frank Romo.

I am extremely grateful for the training I received in Sociology because it has made me a much better data scientist than I would have been with a purely technical education. On a fundamental level, data science is social science, and those who would concern themselves solely with algorithms and math will fail to make the necessary connections to draw insights from their data systems. I’m on a mission to educate the hordes of ignorant techies who vacuously dismiss social science as “soft” and unworthy of study. That so many engineers have not shed their undergraduate biases and the tribe mentality used to placate their young egos and soothe them into thinking they were superior for studying the “hard” sciences is partially responsible for the sad state of the tech industry today: coding fads, rampant sexism, harrassment, appeals to ahistorical self-serving ideologies, a lack of social conscience, surveillance infrastructure, tax avoidance, and an unhealthy preoccupation with advertising revenue. We desperately need more social scientists in tech.

Besides helping me contextualize my work, there was another important synergy I realized by combining social science with the relational model. Approaching statistical packages like SAS and SPSS after already being familiar with SQL vastly expanded my analytical toolkit and provided me with a perspective unavailable to those who concentrate on traditional databases. And knowing SQL gave me an edge compared to my academic peers who did not have data munging experience with relational systems and scripting languages.

Cumulatively, I’ve written more code in SAS and Visual Basic, but today I do most of my work in R and Python. If you read some of my recent code you may detect some old VBisms that are hard to unlearn. I imagine it’s like hearing Italian in a hillbilly accent.

I’m also a husband and father and these will always be my number one and two priorities. H. & O., my love, always.

My personal motto is based on pragmatic altruism: “the best way to help yourself is by helping others”.

About this blog

The title Variables and Observations refers to the nomenclature used by the SAS programming language to denote the columns and rows of a dataset. Serving as a fortuitous pun, the word observations can also be used in English to describe a loose collection of largely unstructured notes about a range of topics. This makes it quite appropriate for a personal blog. I suppose Features and Examples would be more recognizable by today’s young machine learning students, but it clearly doesn’t have the same ring to it. I did also consider The World as Data and Representation but decided against it in light of the serious mental strain it would require to continuously inject references to German metaphysics throughout my blog posts.

The tagline is more or less something I chose simply to avoid leaving jekyll’s site.description empty. Contemporary work, especially in the tech industry, is one of my themes because I have a frustratingly lengthy history with it. But I am honestly not marketing-savvy enough to commit to a single, focused theme. I’ve loosely categorized each post as belonging to one or more of the following topics:

  • Data. This used to be ‘analytics’ but I’m sick of people asking me to help troubleshoot GA.
  • Linux. The most powerful operating system on the planet.
  • Statistics. I’m a frequentist.
  • Culture. These are the essays that will get me in trouble someday.
  • Web. Properly, ‘webdev’.
  • Work. This Weberian life.

You cannot comment here because I am afraid of moderating. My contact information can be found here.

This is a static site generated by jekyll, hosted on S3, and deployed with s3_website. The CSS is purposefully minimal. I started with a recent Bootstrap Reboot and then cut out all the junk I didn’t need (like, really, who uses abbr?). On this site, there is no javascript, no cookies, and no tracking, analytics, or ad-serving libraries. I finally got around to deploying to Cloudfront with a TLS cert so the site should now be redirecting to https://czep.net/.

For 2018 I’ve decided to use Computer Modern as the primary font family. Originally designed by Donald Knuth for TeX, and graciously made available as a web font by Christian Lawson-Perfect. Just look at those italics, aren’t they beautiful? And so worth the fact that I made your browser download 500+ KB of font files to render them, no?

I’ve gone fairly deep into logistic regression.

My most cited work is a little essay I wrote many years ago about a very large number.

My favicon is simply an interesting pattern found in random noise.

You can read a previous version of my About Me page from 2015. I also keep around an old and still mostly functional version of my site dating from around 2003.

Scott Czepiel
San Francisco