Adventures with Flask | Variables and Observations

Deployment of web applications can be a frustrating endeavor when trying to make the various pieces of a stack work together. Development environments are typically setup with lightweight test web servers and file-based databases to simplify debugging of application code. This means that once your code is in good shape locally, you still face various challenges moving it from your workstation to a production server.

One of the pesky issues I always face when moving to production is getting the app to talk to the database. To help verify that my production database configuration is working properly, I wrote a basic flask app, flask-db-test. It’s a simple app presenting a form which exposes a single database model, allowing you to see if reads and writes are working. In this post I will explain how the app works and describe the stack that I use for web development.

Even though the functionality of the flask-db-test app could be accomplished using a single python file, the project is organized using principles designed for large flask applications. The typical single-file ‘hello world’ flask test apps may be useful for basic testing that your deployment stack is operational, but then you may find that other incongruities exist when trying to make your larger applications work.

Thus, one of the design goals of flask-db-test is to use patterns that one will find in larger flask applications. Mainly, this entails organizing the project in multiple files, factoring logical groupings of application functionality into their own modules, using blueprints to attach the modules to the application at run time, dynamic population of configuration settings, and using a factory function to instantiate the application object. Even though it’s overkill for a simple test app, the reason for all this is so that the test app will more closely resemble larger apps that you intend to deploy, and by following these similar patterns and structure in the test app, you will (in theory) have an easier time getting your real apps working once the test app is working.

Now let’s take a look at the stack I use for most of my projects.

My stack:

Deployment environment: EC2
OS: CentOS
Web server: Apache
Scripting language: Python
WSGI adapter: mod_wsgi
Framework: Flask
Database: PostgreSQL
Database adapter: psycopg2
ORM: SQLAlchemy
Local development: OSX

Yeah I know all the cool kids are using nginx and green unicorn, but I lost my hipster license in an unfortunate altercation some years back at a juice bar when my neck beard got caught in someone’s copy of Gravity’s Rainbow. Since then, the elders of the internet will only let me play with apache, aka httpd, the Real hypertext transfer protocol daemon. So save your asynchronous posturing for the playground—this stack is strictly for grown-ups. Besides, my server doesn’t need a Supervisor to change its diapers.

Flask is an easy to use yet powerful micro-framework for building web applications in Python. Flask’s architecture is based on providing a very basic set of core features while allowing for plug-in extensions to provide additional features as the developer desires. Thus, even though the current stable version has not been updated since July 2013, the ecosystem around Flask continues to be very active because developers can use popular extensions like SQLAlchemy and WTForms to extend the basic functionality of Flask.

Personally I find Flask a lot easier to work with compared to larger frameworks like Django which can be more opinionated about how certain functionality, such as the template engine and database ORM, needs to be implemented. One reason I stick to Python for web apps, rather than the obvious choice of Ruby and Rails, is that I often need to work with machine learning pipelines or statistical data analysis and having easy access to scikit-learn, numpy, scipy, and RPy is required. Another reason is that while Flask is highly expressive, it isn’t as magical as Rails. Abstraction is a wonderful thing, but the distance with which Ruby shields you from what’s actually going on can be frustrating. When writing Ruby apps, I’m never quite sure what I’m doing, but with Flask I’m a lot more comfortable and can more readily see the reasoning behind why and where things need to go.

PostgreSQL is my first choice in databases. Based on experience with all the major RDBMS platforms, PostgreSQL is the one I trust the most. There are many technical reasons to choose Postgres over the other one, but these are well beyond the scope of this article and I don’t want to descend too deeply into religious beliefs. In any case, no matter what your choice of database, it is wise to avoid coupling your application to any proprietary extensions that would render your code non-portable. Try to stick with ANSI SQL as closely as possible. When it’s time to optimize (which, if you’re reading this, you haven’t reached yet), then it does make sense to explore the unique features of your database system that may speed things up. However, you should be prepared to have to switch database backends at some point, so make sure to code defensively and offer a functioning code path that will continue to work in a database agnostic environment.

I find EC2 to be the best deployment option for quickly spinning up and testing new web apps. Platforms like Heroku can be attractive as they simplify a lot of operational headache (deployment, database connectivity, ssl), but this can increase pain later when needing to scale or migrate to dedicated hardware. Every dependency you build into your app to save time and effort now can come back to haunt you later. And the pain will be much greater when your’e balancing the demands of a growing user base alongside burgeoning technical debt. Don’t get me wrong, Platforms-as-a-Service like Heroku and the various other pieces of the Amazon stack can offer genuine benefits—if you know how to use them correctly and if they truly make sense with your architecture.

In 1999, I heard a Bon Mot along the lines of “Show me a website that won’t run on a 400MHz Pentium and I’ll show you a dead dot-com.” I think a similar saying could be true today, replacing “400mhz Pentium” with “EC2 t2.small instance”. My point is to be smart and start small and self-contained. This provides you with the maximum flexibility for moving to larger instances or dedicated colocated servers, and gradually integrating third party services when the time is right.

My OS of choice is CentOS. Enterprise clients tend to like support contracts (at least the finance departments do!), and moving from CentOS to RHEL is almost completely hassle-free. There are a few things that irritate me, namely the horror show that is systemd and the fact that SELinux probably creates more problems than it solves, but on balance I prefer to do things the Red Hat way. It’s consistent, predictable, built for the enterprise, and rock-solid stable. If not for RHEL, I would probably be happily using Slackware or FreeBSD, but of course I’d have to get that neck beard going again.

Now that my biases are out in the open, let me try to explain what I’m actually trying to accomplish here and hopefully be of help to others using the same or at least some parts of the stack I use. It may be a surprise that I admit this, but I’m not really a good DBA! I’m much more of a power user of databases, but I still need to do a lot of research every time I deploy a new database server. One place where I always get tripped up is pg_hba.conf. I’m sure I’ve read the documentation for every Postgres version dating back to 2002, but since setup is for me a relatively rare activity, it never seems to sink in. I get it working, usually after trying a million different things without a rational scientific approach to what I’m doing, and then I have no idea how I actually got it to work in the first place!

Another problem that arises is the big move from development box to production. A lot of things are different when working on a local setup, even if it’s a VM running the same OS as your production build.

So, I decided to create a basic test case and experiment with different connection settings until I finally figured out how it all works. Thus, one of the primary purposes of flask-db-test is to provide a quick and easy test case which can swap in different database configurations with ease.

We’ve come a long way from serving scripts out of cgi-bin.

Remember why web application frameworks evolved: there’s an awful lot of common boilerplate and recurring patterns that most web apps need to solve. I still write cgi scripts from time to time for very basic one-off problems, but the minute they become the least bit complex I will find myself muttering, “wouldn’t it be great if I could use a template here?” or “I sure hope nobody needs to make any changes to the database after I’ve hard-coded all this SQL.” But, in order to provide you with all this power, web frameworks force you to make some compromises, learn how they work, and invest some time into architecting your app rather than merely slapping together a script. This is why it can be difficult to get all the pieces working together.

The purpose of deciding on a technology stack isn’t merely to make friends on the internet and participate in flame wars. The real reason to choose a stack and stick with it is to provide yourself with a consistent base that you understand in detail and on which you can quickly and confidently deploy your applications with a minimum of struggle. The more you use your stack, the more you’ll be able to recognize its shortcomings and begin to explore and evaluate specific alternatives when the time is right.

The Readme file for flask-db-test is terse but it does include all the steps necessary to successfully deploy the app on a fresh EC2 instance using the stack I’ve outlined above. For the remainder of this article I will walk through all these steps and add some additional detail so you can see exactly why and how everything works.

Install

So let’s get started with a fresh EC2 instance running CentOS 7. The link to the official CentOS7 AMI is here. Make sure your Security Group allows inbound ssh from your development workstation’s IP address and inbound and outbound http traffic to and from everywhere. After connecting to your instance for the first time, the very first thing you want to do is run an update. Also, get vim or another editor of your choice.

sudo yum -y update
sudo yum install vim

Next, create a user account unless you’re happy being known generically as ‘centos’. Also add yourself to these groups so you can have sudo privileges, and setup your ssh key.

sudo useradd coolnamebro
sudo passwd coolnamebro
sudo usermod -G adm,wheel,systemd-journal -a coolnamebro
sudo -u coolnamebro mkdir -p /home/coolnamebro/.ssh
sudo cp /home/centos/.ssh/authorized_keys /home/coolnamebro/.ssh
sudo chmod 700 /home/coolnamebro/.ssh
sudo chown coolnamebro:coolnamebro /home/coolnamebro/.ssh/authorized_keys

To make directory listings a little prettier and provide some quick shortcuts to frequently used commands, I always drop the following lines into /etc/profile.d/colorls.sh.

# sudo vim /etc/profile.d/colorls.sh
alias l='ls -alh --color=auto' 2>/dev/null
alias la='ls -al --color=auto' 2>/dev/null
alias ls='ls --color=auto' 2>/dev/null
alias p='ps aux'
alias t='top'

Now, let’s get apache working. We will need the following packages to get started. We will need gcc to compile the psycopg2 adapter later, so we will go ahead and install the development group as well.

sudo yum install httpd python python-devel python-virtualenv httpd-devel mod_wsgi
sudo yum groupinstall development

Next we are going to prepare the /var/www directory. We will add two directories: wsgi-scripts will be the entry point for all our mod_wsgi applications. The apps directory will contain all of our application code. We will also create a new group www, to which we will add ourselves and any other users who will need to administer the site.

sudo mkdir /var/www/{wsgi-scripts,apps}

# setup webserver admin group and directories
sudo groupadd www
sudo usermod -a -G www coolnamebro
sudo chown -R root:www /var/www
sudo chmod 2775 /var/www
find /var/www -type d -exec sudo chmod 2775 {} +
find /var/www -type f -exec sudo chmod 0664 {} +

The last two commands ensure that new directories and files created under /var/www will inherit the proper permissions. Logout and login again to pickup the new group membership. Apache configuration can be a complicated beast; however, fortunately, the default install is relatively sane. I urge you to read the documentation to familiarize yourself with the configuration files and the various options you have as a web server administrator. Consider removing unnecessary modules (of which there are many). But for now, we are just going to do the bare minimum to get our test app working. This means enabling mod_wsgi and adding the necessary directives for apache to serve our flask apps.

Apache 2.4 ships with three MPMs (Multi-Processing Modules) which control how the server handles connections. By default it will use the trusty ‘prefork’ MPM but you may want to consider one of the newer options, ‘worker’ or ‘event’ (especially if you catch any flak from your nginx friends). We will stick to prefork for our test, which means that Apache will handle each new connection request with a dedicated child process. In addition, mod_wsgi has two possible modes of operation: embedded mode and daemon mode. In embedded mode, the Python sub-interpreter handling your application is embedded as part of the Apache process (or thread) handling the connection. Embedded mode requires the least amount of configuration but will be relatively heavy on system resources, particularly memory, especially if used in combination with apache prefork. To scale, you should consider running in daemon mode. The choice is up to you and it only requires two additional directive in httpd.conf. For reference, the mod_wsgi documentation is extremely well written and thorough.

sudo vim /etc/httpd/conf/httpd.conf

# add these lines at the end of the file
WSGIScriptAlias /flasktest /var/www/wsgi-scripts/flask-db-test.wsgi

<Directory /var/www/wsgi-scripts>
    Require all granted
</Directory>

Run apachectl configtest after making changes to your httpd.conf file to validate that you didn’t inadvertently bork anything. Then fire up Apache and visit your public IP address to make sure you see the default Welcome page. You will probably also want to enable apache to run at system boot.

sudo systemctl start httpd
sudo systemctl enable httpd

At this point we could proceed to installing flask-db-test but because I’ve made psycopg2 one of the dependencies, we will need to install Postgres first. (If you don’t want to bother with Postgres and would rather run the test using SQLite instead, you can skip this section and edit requirements.txt to remove the dependency: psycopg2==2.6.1).

PostgreSQL

Installing Postgres on Centos or any RHEL derivative is a breeze. The first step is to exclude postgres from the default yum repositories.

sudo vim /etc/yum.repos.d/CentOS-Base.repo
# add this line to the end of both the [base] and [updates] sections:
exclude=postgresql*

Now, we will install the Postgres repo so that when we use yum, it will pick up the latest and greatest rather than the older version in the default repo. For more information, and to ensure you are using the latest url, please see these instructions.

sudo yum localinstall http://yum.postgresql.org/9.4/redhat/rhel-7-x86_64/pgdg-centos94-9.4-1.noarch.rpm

Take a look at what packages are available:

yum list postgres*

Next, we will install the core packages that are needed.

sudo yum install postgresql94.x86_64 \
                     postgresql94-contrib.x86_64 \
                     postgresql94-libs.x86_64 \
                     postgresql94-odbc.x86_64 \
                     postgresql94-server.x86_64 \
                     postgresql94-test.x86_64 \
                     postgresql94-debuginfo.x86_64

Initialize Postgres with the following command. After that completes, start up the server and also enable it to launch at boot time:

sudo /usr/pgsql-9.4/bin/postgresql94-setup initdb
sudo systemctl start postgresql-9.4
sudo systemctl enable postgresql-9.4.service

And now for the dreaded pg_hba.conf. This file controls how Postgres will evaluate whether to allow connections from clients. On our default install, you should see the following lines at the end of the file:

# "local" is for Unix domain socket connections only
local   all             all                                     peer
# IPv4 local connections:
host    all             all             127.0.0.1/32            ident
# IPv6 local connections:
host    all             all             ::1/128                 ident

The first line handles any connections occurring over unix sockets and the latter two are for TCP/IP connections. The authentication methods ‘peer’ and ‘ident’ mean that Postgres will lookup the operating system user id belonging to the connection and allow the connetion to occur if there is a postgres user with the same name. Since our default install only has one user, postgres, the only way to currently connect to the database is to open a shell as the postgres user. If we were to create a new postgres user with a name matching our operating system username, then we could connect to any database on the server. Our web application will be connecting to postgres using the same user that apache child processes will run as, which by default is ‘apache’. So, we could add a new postgres user named apache and be on our way. This is not advisable, however, because it will allow any code that apache runs to connect to any database. What we should really do is create a user specifically for our app, and only allow it to connect to a database that it owns. You can read all about how this works in the chapter on Client Authentication in the Postgres manual.

To accomplish this, edit pg_hba.conf and add the following line above the first ‘local’ entry. We need to put this first or else our connection attempts will be matched by the default line, which will fail since we have no operating system user named ‘flasktest’.

# sudo vim /var/lib/pgsql/9.4/data/pg_hba.conf

local   flasktestdb     flasktest   md5

This requires md5-hashed passwords be used for local connections (unix domain sockets) to the database named ‘flasktestdb’ for user ‘flasktest’.

After making this change you’ll need to restart postgres.

sudo systemctl restart postgresql-9.4

Now we have to create the postgres user and database that our web app will use. Since the only way we can connect to the database is from a shell owned by user postgres, we will first need to open a root shell, then use sudo to open a psql client as user postgres.

sudo /bin/bash
sudo -u postgres psql

Now we can issue the two SQL commands below to create the user (role) and database for our test app.

create role flasktest with login encrypted password 'test12345';
create database flasktestdb with owner flasktest;

Use \q to exit the psql client and exit to return to your normal shell—don’t go running around as root for too long!

To fix an issue that will come up when installing psycopg2, you’ll need to create a symlink to pg_config so the python installer will know where to find it.

sudo ln -s /usr/pgsql-9.4/bin/pg_config /usr/local/bin/pg_config

flask-db-test

Now we can actually install our test app. First, install git so you can clone the repository:

sudo yum install git

Now, enter the /var/www/apps directory and clone the flask-db-test project. We will move the packaged wsgi script to our wsgi-scripts directory, create and activate a virtual environment for our python interpreter, and then install the python packages necessary for our flask application to work.

cd /var/www/apps
git clone https://github.com/czep/flask-db-test.git
cd flask-db-test
mv flask-db-test.wsgi ../../wsgi-scripts
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

If you get a permission denied error when running git clone, make sure you logged out and back in so that your membership in the www group is known by the shell. If you get an error message saying “Error: pg_config executable not found”, see the above section on creating a symlink for pg_config. If you get an error saying “unable to execute gcc: No such file or directory”, make sure you’ve installed gcc using the yum command from earlier.

Usage - SQLite

To use flask-db-test, edit or add new configurations files in the local directory. To activate a config, create a symlink from config_live.py pointing to the desired config file. We will start with our sqlite config.

ln -s local/config_sqlite.py config_live.py

Ensure your virtual environment is still active—your shell prompt should be prefaced with “(venv)”. We first need to initialize the database using the following command:

python manage.py init_database

If all goes well, you will see the file flask-db-test.db in the flask-db-test directory. If you connect to it in sqlite, you’ll see an empty table called ‘stuff’:

sqlite3 flask-db-test.db
.tables
.schema stuff

However, there are a few things wrong with the file as is, and a few frustratingly sub-optimal things we need to do to get it to work. Frankly, sqlite should not be used for production in a web app. Single-user, embedded systems fine, but not a multi-user web site. So, I’m not terribly concerned with getting this to work in a production worthy manner. The main issues surround the permissions model and SELinux. Since you created it with a script running as your user, it will be owned by you, and is going to have permissions of 644. We really need the apache user to be able to write to this file. In addition, we need apache to be able to write to files in the flask-db-test directory and we will need to alter some SELinux labels in order to allow apache to write to the database file. First,

sudo chown apache app-dev.db
sudo usermod -a -G www apache

Next, SELinux. I admit I have no interest in figuring out the proper contexts because I’m never going to be using sqlite in production like this. So to get this to work, just set selinux to permissive mode. But, promise me that you will set it back to enforcing after this test is done. I hate to be yet another “just turn off SELinux” guy, but yeah, just turn off SELinux and it will work.

Now, restart apache (or if using daemon mode, touch the wsgi script).

# mod_wsgi in embedded mode
sudo systemctl restart httpd

# mod_wsgi in daemon mode
touch ../../wsgi-scripts/flask-db-test.wsgi

Now turn SELinux back on before I 0wn your server.

Usage - Postgres

Let’s try activating the configuration for using Postgres as the database backend. First, edit the file local/config_postgres.py and entering the proper credentials in the environment variable for SQLAlchemy.

SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://flasktest:test12345@/flasktestdb'

Delete the symlink and remap it to point to the postgres config file, then initialize the database.

rm config_live.py
ln -s local/config_postgres.py config_live.py
python manage.py init_database

Go back to a root shell and connect using psql to the flasktest database:

sudo /bin/bash
sudo -u postgres psql flasktestdb

You should now be able to see a table called ‘stuff’ in the default list of relations. Check it out:

\dt
\d stuff

Before continuing, we have a couple of SELinux commands to fix so that apache can talk to psycopg2:

sudo semanage fcontext -a -t httpd_sys_script_exec_t /var/www/apps/flask-db-test/venv/lib/python2.7/site-packages/psycopg2/_psycopg.so
sudo restorecon -v /var/www/apps/flask-db-test/venv/lib/python2.7/site-packages/psycopg2/_psycopg.so

Restart apache, and visit the site again. Bask in the glory of the goodness!