Auto-Reflecting Tables and Columns in SQLAlchemy

python and reflection on water

How a Few Lines of Code Makes me Want to do Cartwheels Down the Street

If you ask me, Python and databases both jumped in the air and kicked up their heels the day SQLAlchemy made its first appearance. Have any of you had to struggle with writing database queries and other complicated SQL statements? Look no further: SQLAlchemy helps abstract those pesky details to a better place! It is a very comprehensive package that uses an Object Relational Mapper (ORM) to help you interact in a much more Pythonic way with your databases.

And… it is (mostly) database-agnostic.

I say mostly because you still need to know enough to be dangerous to properly configure your particular dialect.

What I like about SQLAlchemy is that I can use objects in Python along with their typical syntax. What I do not like was that I can use objects in Python along with their typical syntax.

Yes.

You read that correctly.

Things were great when I first started using this package.

But like any relationship there are ups and downs.

I figured out that my database-driven needs would not be fully met by having to declare all objects and their properties up-front.

I needed something a bit more dynamic.

Sooooo…. after a lot of soul-searching and digging around the SQAlchemy docs, I made an important discovery: You don’t need to use class-based attributes in order to leverage the awesome power of session-based queries for your databases.

In short: this opens up a whole wealth of possibilities!

I have to take a brief segway here to mention that the code in this article is licenced under the Apache 2.0 software licence. The following is a portion of the Python code we will be looking at:

Simple, right?

Well, kind of: allow me to explain.

I have authored several Python scripts that use Pandas and SQLite. For those who have not yet had the pleasure of working with Pandas it is an amazing package with tons of applications in Data Science and many other fields. I marvel at the simplicity at which one can specify various keyword arguments to various Pandas functions in order to achieve amazing results. My scripts use SQLite databases to store incoming, ‘mid-stream’ and result datasets. While SQLite is great for prototyping I am now at a point where I need more.

Like authentication.

I could port my existing code to be able to point to a different database, but what I really want is something dialect-agnostic. Enter SQLAlchemy and its powerful ORM. It allows developers to interact with their databases as Python-friendly classes. Very cool!

Session-Based Queries

One of SQLAlchemy’s huge benefits is session-based queries which employ Python object-oriented syntax rather than SQL statements. For those who have worked with two or more database technologies, I feel somewhat sheepish in spelling out why this is beneficial. For those who have not had the pleasure, it is like hiring a translator when speaking with each of the outgoing, highly-conversational people on an around-the-world Zoom call.

SQLAlchemy’s ORM is your translator. Having ‘object’ in the title means it is keen on… objects. There are a couple of packages such as Elixir and SQLSoup that extend SQLAlchemy; they too use an object-oriented approach. Which is fine if your code uses (drum roll please…) objects.

One slight problem is that some *ahem* most of my code uses JavaScript Object Notation (JSON) to configure parameters at run-time. I have a hockey-sock full of riveting tidbits about JSON: these are the subject for another article. Suffice to say that many of my Python scripts use string-based parameters obtained from JSON for their configuration.

After mucking around in the SQLAlchemy documentation combined with some inspection via the Eclipse debugger, I cobbled together a small but hefty code snippet that enables me to refer to each and every one of my tables and columns within a given database by name, vice having to specify them as objects and attributes.

When I first saw an error-free console after cobbling this code together I wanted to do cartwheels down the street!

Some may be asking ‘so what, why should I care’? Perhaps a more fullsome example is appropriate. Before diving into the code below, if you are wondering how to configure a database using SQLAlchemy I recommend you check out the following helpful articles: SQLAlchemy — Python Tutorial by Vinay Kudari and How to use Python SQLite3 using SQLAlchemy by Mahmud Ahsan.

Once you’ve got your Python environment configured along with a test database, go ahead and load the following snippet into your IDE of choice:

With a SQLAlchemy engine configured you can supply it to the get_data function along with the name of a table and column from your test database. Say our test database contains a table called ‘ThisIsATable’ and in it a column called ‘ThisIsAColumn’. Within ‘ThisIsAColumn’ we have several text-based rows, at least one of which contains ‘Please try to find me’.

Summary

If all goes well you should have a Pandas DataFrame containing a sub-set of the table from your test database. Thanks to a few lines of code we are now free to write our database-agnostic queries that are also dynamic. With these queries, we can obtain arbitrary slices data for Pandas to chew on, thanks to simple string-based parameters.

From an ‘easy to maintain’ perspective, the code leverages public functions and attributes. Calling on private attributes is generally ill-advised since these could change in a future release; I tend to shy away from private attributes whenever possible. A point worth mentioning is that the Session in the above code is invoked within the function get_data. This is not the recommended practice. Please see this article by A Gordon for further information on creating and using Sessions. The SQLAlchemy docs also contain information on creating and using Sessions, specifically the section entitled ‘When do I construct a Session, when do I commit it, and when do I close it?’.

Armed with the above code, you and your proof-of-concept Python scripts are now able to autoload and reflect database metadata by specifying only a bare minimimum of information via handy-dandy strings.

Strings! Such wonderful things.

Say it with me: Strings are wonderful things!

In my opinion, this represents the dawn of advanced querying via the database-agnostic goodness of SQLAlchemy, together with the hugely powerful Pandas library combined with string-based keyword arguments! I think another snake image is appropriate: it sums up the relationship amongst SQLAlchemy, Pandas and text-based interaction. Together, they form a trifecta that is primed for the limitless expanse of Natural Language Processing along with many other use cases.

Snake forming a triangle with the words ‘SQLAlchemy’, ‘Pandas’ and ‘Text-Based Interaction’ along three sides.
Snake image courtesy of Laura Barry on Upslash

What are your experiences with Python and SQLAlchemy? Please share!

E

Leave a Comment

Required fields are marked *