Variables and Observations

Data analysis at the speed of thought

12 Jan 2020 • czep • Data • 2,700 words

I have a provocative question to ask of experienced and beginner data scientists alike, whether you are fully fluent with the syntax you use to analyze data or not quite comfortable with the command line. Do you think a graphical tool for exploratory data analysis could make you more productive? Would you consider using such a tool? What would you envision such a tool be able to do for your workflow?

I’m developing an R package to help you analyze data at the speed of thought.

Linux From Scratch on EC2

09 Sep 2017 • czep • Linux • 2,500 words

I recently built an EC2 AMI based on Linux From Scratch and documented the process in a hint for the LFS project. Early last year in a state of growing frustration with the evolution of mainstream linux distributions, I wrote about Linux From Scratch and the benefits of having the freedom to carefully craft your operating system based on personal preferences. I am now more convinced than ever that we need the Linux From Scratch project to inspire momentum to counter the prevailing forces of uniformity which threaten to remove the creative element from computing altogether. I’m no longer participating in the boring cookie-cutter committee-designed mainstream distro upgrade circus. From now on, it’s my distro, my rules. My goal is to run LFS systems, in production, everywhere.

This post will explain my motivations in a bit more detail. You can read my EC2 hint here.

Full text search with Postgres and Django

05 Jul 2017 • czep • Web • 5,800 words

Being one of the most fundamental problems in computer science, occupying the first half of Volume 3 of Donald Knuth’s classic work The Art of Computer Programming, it would seem fair to assume that by 2017 search would be mainly a done deal. But search is still hard. General purpose algorithms may perform decently in the average case, but what constitutes decent performance may not scale well enough to meet user demands. Unless you’re building a search engine, you are probably not designing your data structures specifically to take advantage of search algorithms. When web application developers work on adding a search feature, they are not going to start evaluating different algorithms with Big-O notation. And they’re not going to ask whether quicksort or merge sort would obtain the best performance on their specific dataset. Instead, most likely, they will defer to whatever features are available in their chosen database and application framework, and simply go from there.

That search is hard is why so many sites either defer to whatever vanilla features are available in their framework or outsource to a third-party library such as the Lucene-based document retrieval system Elasticsearch, or to a search-as-a-service provider like Algolia. While the cost in terms of both time and money of integrating third party libraries or services can be prohibitive for a small development team, the negative impacts of providing a poor search interface cannot be understated. In this post I will walk through the process of building decent search functionality for a small to medium sized website using Django and Postgres.

Legion of lobotomized unices

12 Jun 2017 • czep • Linux • 1,100 words

Over the past two decades, changes are underway with profound consequences for both social organization and system design. Virtual machines, cloud computing, and containers are reducing the need for general purpose multi-user systems and the stewardship that maintenance of such systems requires. At the same time, while we live in an age in which we are more connected than ever, we are increasingly cut off from one another because the systems we use are isolated clients. The loss of centralized loci of computing has changed the way we work and communicate online, in many ways making it more difficult to collaborate and causing more isolation by removing the shared spaces that brought us together in the past. Unix systems which used to be the durable social centers of computing have been replaced by a disposable legion of lobotomized unices.

Let's privatize Air Traffic Control!

06 Jun 2017 • czep • Culture • 500 words

Because the private sector can do things so much better.

Tech and Occupational Prestige

05 Mar 2017 • czep • Work • 1,200 words

In my post on Age and Tech, I explored earnings trends by age for various technical occupations using data from the 2015 American Community Survey. The analysis suggests that technical occupations offer a bright future for people of all ages. Income in tech jobs is consistently well above national averages and the earnings curve by age remains strong. Older workers in tech remain employed at higher rates than in other occupations. In this post I will explore another dimension of tech occupations: social status or occupational prestige. How do Americans perceive technical occupations on the ladder of social status?

My little rant about systemd

24 Feb 2017 • czep • Linux • 700 words

The real issue with systemd isn’t technical, it’s sociological. How did this system achieve widespread adoption despite widespread opposition and admission even among its proponents that it wasn’t nearly stable? Understanding the social organization of open source development should be the primary goal of the community if we hope to learn from and prevent such mistakes in the future.

Age and Tech

24 Dec 2016 • czep • Work • 5,300 words

Whether and to what extent age is a prevalent factor in the technical job market are questions that are frequently raised in press articles, blog posts, and discussion boards. The usual—but by no means sole—concern is that tech companies discriminate against older workers in favor of younger ones. The sources of this tendency are multi-faceted. In this post we will explore the Age Question in tech, by examining the latest American Community Survey.

Generating post-hoc session ids in SQL

16 Oct 2016 • czep • Data • 2,300 words

This is a short demonstration of the power of analytic functions in SQL to generate session ids for raw event data.

Deploying Django projects

05 Sep 2016 • czep • Web • 4,300 words

It takes 5 minutes for anyone with a passing familiarity with web development to get to Django’s “It Worked!” page after starting from scratch in a clean development environment. And then it takes 5 days to figure out how to deploy your simple ‘Hello World’ application to a production server. In this post I will describe a recipe for deployment of Django projects so you can focus on application development rather than wrestling with 500s.

« Prev Page: 2 of 6 Next »