Full text search with Postgres and Django

Being one of the most fundamental problems in computer science, occupying the first half of Volume 3 of Donald Knuth’s classic work The Art of Computer Programming, it would seem fair to assume that by 2017 search would be mainly a done deal. But search is still hard. General purpose algorithms may perform decently in the average case, but what constitutes decent performance may not scale well enough to meet user demands. Unless you’re building a search engine, you are probably not designing your data structures specifically to take advantage of search algorithms. When web application developers work on adding a search feature, they are not going to start evaluating different algorithms with Big-O notation. And they’re not going to ask whether quicksort or merge sort would obtain the best performance on their specific dataset. Instead, most likely, they will defer to whatever features are available in their chosen database and application framework, and simply go from there.

That search is hard is why so many sites either defer to whatever vanilla features are available in their framework or outsource to a third-party library such as the Lucene-based document retrieval system Elasticsearch, or to a search-as-a-service provider like Algolia. While the cost in terms of both time and money of integrating third party libraries or services can be prohibitive for a small development team, the negative impacts of providing a poor search interface cannot be understated. In this post I will walk through the process of building decent search functionality for a small to medium sized website using Django and Postgres.

Let's privatize Air Traffic Control!

Because the private sector can do things so much better.

Tech and Occupational Prestige

In my post on Age and Tech, I explored earnings trends by age for various technical occupations using data from the 2015 American Community Survey. The analysis suggests that technical occupations offer a bright future for people of all ages. Income in tech jobs is consistently well above national averages and the earnings curve by age remains strong. Older workers in tech remain employed at higher rates than in other occupations. In this post I will explore another dimension of tech occupations: social status or occupational prestige. How do Americans perceive technical occupations on the ladder of social status?

Age and Tech

Whether and to what extent age is a prevalent factor in the technical job market are questions that are frequently raised in press articles, blog posts, and discussion boards. The usual—but by no means sole—concern is that tech companies discriminate against older workers in favor of younger ones. The sources of this tendency are multi-faceted. In this post we will explore the Age Question in tech, by examining the latest American Community Survey.

Generating post-hoc session ids in SQL

This is a short demonstration of the power of analytic functions in SQL to generate session ids for raw event data.

Deploying Django projects

It takes 5 minutes for anyone with a passing familiarity with web development to get to Django’s “It Worked!” page after starting from scratch in a clean development environment. And then it takes 5 days to figure out how to deploy your simple ‘Hello World’ application to a production server. In this post I will describe a recipe for deployment of Django projects so you can focus on application development rather than wrestling with 500s.

Svelte Apache

The agenda for this post is to strip down an apache install to the minimal configuration needed to feasibly run the server. We’ll make it slender and elegant, in a word—svelte. This can serve as a starting point for incrementally adding functionality as it is required for your particular installation. We will also turn on some monitoring modules to provide some helpful diagnostics about a running apache server. In addition, we will install a python script modeled on php’s phpinfo() function to quickly show a lot of detail about the environment in which apache is running. Finally, since all the things should be encrypted these days, we will set ourselves up with a certificate courtesy of Let’s Encrypt which will allow us to serve our site on https and be accepted by modern browsers.

Linux From Scratch in 2016

The Linux From Scratch project is very much alive and well in 2016. What began in the late 1990s as an educational process for building a completely customized GNU/Linux system from source code is still very much relevant today. What you will learn from going through the LFS book will augment your linux knowledge like nothing else. Give it a try, you won’t be disappointed!

Adventures with Flask

Deployment of web applications can be a frustrating endeavor when trying to make the various pieces of a stack work together. Development environments are typically setup with light weight test web servers and file-based databases to simplify debugging of application code during development. This means that once your code is in good shape locally, you still face various challenges moving it from your workstation to a production server. One of the pesky issues I always face when moving to production is getting the app to talk to the database. To help verify that my production database configuration is working properly, I wrote a basic flask app, flask-db-test. It’s a simple app presenting a form which exposes a single database model, allowing you to see if reads and writes are working.

Pedantic SQL

Pedantic SQL is a style guide and code beautifier for ANSI SQL intended to make SELECT statements more readable. I use the term ‘beautifier’ loosely because one may well argue that no amount of formatting will be sufficient to make SQL look pretty. I call it ‘pedantic’ because it is insistently rigid in how it is applied. But this rigidity is what makes the resulting queries readable. The lack of consistency in SQL writing styles makes it entirely too difficult to read others’ queries. A lot of bad practices have evolved which detract from query readability and extensibility. A query should be understandable at a quick glance. Following this style guide will make your queries a lot easier to grasp. It will also make them a lot easier to extend or build upon later.

subscribe via RSS