I am writing an R package for exploratory data analysis in the browser with ReactJS. In my last post I outlined my lofty ambitions for writing a graphical data analysis tool to make exploratory data analysis easier, describing the motivations I had after struggling with the context-switch between analysis and programming. In this post I will go on a whirlwind tour of my thoughts and inspirations that will start with the front end, go deep into the backend and then journey back to the front end again. By the end I will hope to have made the case that I’m not crazy and this thing might actually work.
I haven’t written a SQL post since Generating post-hoc session ids in SQL. I don’t ordinarily think of SQL as good candidates for blog posts because to me SQL is just boring. I do use it everyday though, and I’ve certainly internalized a lot of handy tricks. Today I’d like to share one of those rare moments when I sat back and thought to myself, “wow this query is beautiful!” The solution involved using not just one, but two cross joins, and a window function to count the number of events occurring at or above each level of a score.
I have a provocative question to ask of experienced and beginner data scientists alike, whether you are fully fluent with the syntax you use to analyze data or not quite comfortable with the command line. Do you think a graphical tool for exploratory data analysis could make you more productive? Would you consider using such a tool? What would you envision such a tool be able to do for your workflow? I’m developing an R package to help you analyze data at the speed of thought.
I recently built an EC2 AMI based on Linux From Scratch and documented the process in a hint for the LFS project. Early last year in a state of growing frustration with the evolution of mainstream linux distributions, I wrote about Linux From Scratch and the benefits of having the freedom to carefully craft your operating system based on personal preferences. I am now more convinced than ever that we need the Linux From Scratch project to inspire momentum to counter the prevailing forces of uniformity which threaten to remove the creative element from computing altogether. I’m no longer participating in the boring cookie-cutter committee-designed mainstream distro upgrade circus. From now on, it’s my distro, my rules. My goal is to run LFS systems, in production, everywhere. This post will explain my motivations in a bit more detail. You can read my EC2 hint here.
Being one of the most fundamental problems in computer science, occupying the first half of Volume 3 of Donald Knuth’s classic work The Art of Computer Programming, it would seem fair to assume that by 2017 search would be mainly a done deal. But search is still hard. General purpose algorithms may perform decently in the average case, but what constitutes decent performance may not scale well enough to meet user demands. Unless you’re building a search engine, you are probably not designing your data structures specifically to take advantage of search algorithms. When web application developers work on adding a search feature, they are not going to start evaluating different algorithms with Big-O notation. And they’re not going to ask whether quicksort or merge sort would obtain the best performance on their specific dataset. Instead, most likely, they will defer to whatever features are available in their chosen database and application framework, and simply go from there. That search is hard is why so many sites either defer to whatever vanilla features are available in their framework or outsource to a third-party library such as the Lucene-based document retrieval system Elasticsearch, or to a search-as-a-service provider like Algolia. While the cost in terms of both time and money of integrating third party libraries or services can be prohibitive for a small development team, the negative impacts of providing a poor search interface cannot be understated. In this post I will walk through the process of building decent search functionality for a small to medium sized website using Django and Postgres.
Over the past two decades, changes are underway with profound consequences for both social organization and system design. Virtual machines, cloud computing, and containers are reducing the need for general purpose multi-user systems and the stewardship that maintenance of such systems requires. At the same time, while we live in an age in which we are more connected than ever, we are increasingly cut off from one another because the systems we use are isolated clients. The loss of centralized loci of computing has changed the way we work and communicate online, in many ways making it more difficult to collaborate and causing more isolation by removing the shared spaces that brought us together in the past. Unix systems which used to be the durable social centers of computing have been replaced by a disposable legion of lobotomized unices.
Because the private sector can do things so much better.
In my post on Age and Tech, I explored earnings trends by age for various technical occupations using data from the 2015 American Community Survey. The analysis suggests that technical occupations offer a bright future for people of all ages. Income in tech jobs is consistently well above national averages and the earnings curve by age remains strong. Older workers in tech remain employed at higher rates than in other occupations. In this post I will explore another dimension of tech occupations: social status or occupational prestige. How do Americans perceive technical occupations on the ladder of social status?
The real issue with systemd isn’t technical, it’s sociological. How did this system achieve widespread adoption despite widespread opposition and admission even among its proponents that it wasn’t nearly stable? Understanding the social organization of open source development should be the primary goal of the community if we hope to learn from and prevent such mistakes in the future.