kajtajm: Planning my first Python / PyObjC / MySQLdb project
Never have I planned a program so much in detail before starting coding. And stranger still, I’m planning the program to be coded in a language in which I haven’t written a single line of code. Nonetheless, I’m eager and it feels good.
I woke up early this Sunday, my mind concentrating on the many steps I will have to take before the first project will see the light of day. I had planned eight prerequisite lessons before even starting the coding. And despite receiving many good pieces of advice, nobody had relieved me of my main worry — the input grid for the data entry in matrix format.
To speed up things, I thought of writing some auto-generated INSERT statements from the current Google Spreadsheet that I’ve been using. I would then at least have the basic data from the first two weeks of the year to play around with in my MySQL database.
And then it struck me: I don’t need a grid at all, ever. I’m much better served by a mere text field.
Look at how I record my time bookings right now as I go, in a flat file in the standard Mac text editor (a bit sloppily when it comes to indentation):
Sun 17.1.2010
0700-0730 PyObjC Thought of how to get rid of the grid
-0800 kajtajm Planned the object structures
-0830 routine Showered and dressed
-0845 PyObjC Planned a blog entry
This is an extremely fast way of booking time. Why wouldn’t I use the same simple syntax in a Cocoa text box instead of a grid? Using a multi-line text box for input will relieve me of finding a grid, learning it, and fighting it to follow my needs. And I’d like it to behave like a text field, anyway. The pretty straight column formatting of a grid is a forsakable luxury for me.
I would assume Python is easily able to parse my sloppily indented text and enter it properly into my MySQL table, which I plan to look like this:
CREATE TABLE kajtajmrow ( kajdate DATE, timefrom SMALLINT(4) ZEROFILL, timeto SMALLINT(4) ZEROFILL, project CHAR(10), kajtext CHAR(40), hours DECIMAL(8,2), keyword CHAR(10), PRIMARY KEY(kajdate, timefrom));
I already made a couple of technical decisions in writing my CREATE statement.
- Field naming. Short names. To get rid of reserved words, I prefix the fields in question with “kaj”, as it’s then clear the name is something I came up with, and as it relieves me of inventing supposedly more descriptive names that are so long they render my SELECT output hard to read. I think the end result is descriptive enough and will minimise distraction and misunderstandings later on when I code and work with the MySQL client command line.
- Time formats. I hardly expect to do much insightful statistics based on the time of day, so I reduced the times to convenient input devices. I avoid the hassle of converting times to and from a TIME format that in this case pleases only the computer. I made them SMALLINT(4) to reduce space consumption which may give faster indexing when I load all my thousands of time bookings from the 1980s and 1990s, and I added ZEROFILL as my eyes recognise 0815 as a time quicker than 815. Besides, I intend to strictly enforce times ending in 00 15 30 or 45, so I will need to make considerations for checking the format anyway.
- Hours format. The DECIMAL(8,2) gives me a maximum of 999999.99 hours, and if I track on the average of 3000 hours a year, this should do it for the next 300 years. The field is able to represent hours that I don’t want, such as 3.14 h, but that’s a blow that I’m willing to take. Worse, I will want to report hours mostly as integers, so I will need to format them upon output. At times in reports, I may wish to use the fractions present in most fonts, such as 3½, 2¼, 5¾. That’s as accurate as 3.5, 2.25 and 5.75 but consumes less space and looks more similar to 3, 2 and 5. Luckily, hours just a computational, non-editable field derived from timefrom and timeto. Still, I decided to store it, as the parsing rule for calculating hours from timefrom and timeto is more easily implemented in Python upon editing, than in MySQL upon reporting (where it’s doable but cumbersome).
- Text fields and formats. I chose char(10) for the main project field. In ten chars, allowing mixed entry, I can quickly enter a fairly descriptive project code. The second text field is the free-form text explaining what I’ve done, where char(40) can contain sufficient text to explain what I’ve accomplished, without spending too much horizontal space on the screen or in a report (and if I need to explain in more detail what I did in an two-hour booking, I split it into two one-hour bookings). Finally, as longer projects have subgoals or subtasks, I added another redundant computational field in addition to hours: keyword. I will parse it in Python to be the first word of kajtext, again expecting simplified SELECT syntax when reporting. Sure, hours and keyword are strictly speaking redundant fields. But I can easily check for errors, and I expect the savings in clarity and lack of redundancy in the code to far outweigh the drawbacks of redundancy in the data.
- The keys. Theoretically, the database structure allows for double bookings, with one task for 1015-1130 and another 1030-1115. I may need to write a cleanup utility to identify such double bookings later on.
With the database structure set, it’s time for planning the Python program. As I noted in a previous post, I need an input and an output screen.
For mere input of new time bookings, I don’t expect many complexities in the UI. I’ll enter bookings into an empty text box, either writing or pasting text according to the simple syntax in the example on top.
- If data is valid and interpretable, then clicking on the [Import] command button will successfully parse user input and convert it into INSERT statements.
- If data isn’t valid, I think nothing shall be entered into the database. Rows which don’t pass (1014 is not a valid time in kajtajm) should be prefixed with a “?”, perhaps with the slight modification that rows with too long values for the project or kajtext fields are prefixed with “!” for easy distinction and thus better usability. (The redundant keyword field I’ll just truncate after ten characters, as keywording lines is optional and all data is present in the kajtext field anyway).
The above method of giving user feedback avoids the necessity of any further error messages, that would merely clutter both my code and my user interface. As a user, I think I will be able to understand how to edit invalid rows and remove the “?” or “!” by hand before trying [Import] again.
For editing existing time bookings, it’s trickier. Yes, I can use mostly the same syntax as for pure entry. But to be able to edit, I won’t have only INSERTs but will need also UPDATEs and DELETEs. And I will need a mechanism for the end user to identify which rows to edit.
Selecting the rows to edit should be an easy thing. I can offer input fields for kajdate (a range), project and kajtext. With “%” in the project and kajtext input fields, I can make the search into a LIKE. And I probably also need a maximum rows input field (default perhaps 50), to limit the number of rows retrieed into the text field.
Rows that the user didn’t touch don’t need any action, so the first thing I plan to happen when the user presses the [Merge] button (it’s no longer a mere import) is to compare the rows that are in the text box at that point in time with the rows that were read into memory from the database and served as the basis for user editing in the text box. If there are no changes, no MySQL statements are needed and database traffic will ensue (as it would cost response time even if I am at localhost).
This is how I plan to treat a row (in the form of a parsed time entry) that somehow differs from what was read in from the database.
- A disappeared row. This would perhaps intuitively indicate a delete, and does mean that the end user has deleted the row from the inbox. But that may have been by mistake, after having painstakingly entered fifteen new rows. So if I allow deleting rows this easily, I’m probably in for involuntary deletes and low usability. Besides, the user might want to simplify editing of rows by cleaning away correct rows from the text field, thus eliminating clutter. Anyway, I have decided to disregard a discovery of a disappeared row, and solve the delete need by requiring explicit, conscious deletion of some kind (see the next item!).
- The primary key (i.e. combination of kajdate and timefrom) existed in the set that was read in from the database, but at least one other field has changed. This I should likely convert to a mere UPDATE of the non-key fields of timeto, project, kajtext, hours and keyword. The exception would be that the user consciously wants to do the aforementioned delete. The best way to do this, that I’ve come up with, is to enter a special project code of “d” as in “delete”. Sure, that prevents the usage of such a project, but I wouldn’t want to use it anyway, and most importantly, I don’t want to make it easy to delete rows by mistake.
- The primary key didn’t exist in the set that was read in from the database. That sounds like a clear INSERT case to me, and that’s what it is.
In the input screen, I will probably want to provide some semi-hard coded data entry statistics, such as the hours worked this year for each of the six top projects (however many I decide will fit on the screen), or the total number of hours and rows entered. Most importantly, the input screen should show the time booking with the highest kajdate - timefrom combination, to remind me of where I need to continue my data entry.
The real benefit of keeping track of time comes from interesting reports. And here, I’m really looking forward to slicing and dicing the data in Python, given its tuple concept that enables array computations (smells like APL to me!). But first, the report has to be defined. The definition of the screening criteria can be the same as for the editing screen. Sorting needs extra attention. Possible sorting and aggregation fields are year, month, week [1], date, project and keyword. The way I thought of doing this is to simply allow the end user to write a code like YMP (year-month-project), P (project), MD (month-date) or PM (project-month) in a text field. Remember, the user is me, so I don’t need a lot of UI distraction. In fact, i consider such a text field has a high usability.
The above report definition is quite flexible, but has its limitations. For one, I would have fixed numeric columns, and for the first kajtajm release, I will always provide two columns: hours and percentage of total hours. Later, I could conceive of matrix reports where I have different columns for different projects, keywords or days of week. Also, I can imagine longer, more descriptive project names and sort orders, but for now, that’s strictly a distraction.
But now for the fun part that I have missed since I drifted into management and gave up programming: the matrix slicing and dicing of the reporting cube, at will. The fun starts already with converting the selection criteria into the WHERE part of the SELECT, and the sorting fields to a GROUP BY, where the only complexity (and it’s a minor one) is to extract YEAR(), MONTH(), and WEEK() out of kajdate.
I then expect to read the entire SELECT statement into memory in one fell swoop, then do both subtotal calculations and report formatting from the memory tuples. Here, I’m not yet sure of the Python structures needed. I haven’t even coded my “Hellöu wörld” yet. But I think I need a dynamically defined object (forgive me that I probably don’t use the proper Python lingo [2]) — let’s call it rptrow — with one attribute for each sorting / aggregation field, one for the hours and one for the percentage. I’ll then loop through the SELECT answer set, and enter a tuple of rptrow into an array for each row in the answer set, plus an additional row each time when a non-leaf-level aggregation field changes (to make room for the subtotals). I plan to do the subtotalling on top, since I like to read the subtotals in subheaders, not subfooters. With the SELECT set finally read into memory including spare rows for subheaders, I just need for Python to compute the subtotals and percentages. Then, my report is done, although it’s still only in memory.
The final proof is in the output of the report.
- Flat file, fixed font. To see whether I’m on track, I should just dump the memory structure into a flat file or a text field for a fixed font. This should even be quite legible on-screen, and can be used as input for prettier reports in, say, Google Docs or an email.
- Flat file, comma separated. I will also want to pretty-format some reports with non-fixed fonts but still straight right-aligned columns for hours and percentages. That can be done in a spreadsheet. The easiest way to import is probably using a comma separated file (and it would likely suffice that I write the “file” into a text field, from which I can copy it to the clipboard and then a file).
Perhaps I want to pretty-format a report directly from kajtajm, generating a PDF? Hmm, I tend to think this should be left on the wish list, like the direct clipboard dump. But if I come up with a way to paste the report into Google Docs as a table, I’d be happy.
That should be it! Of course I still need to write the program, but it does seem a lot more manageable now, after this “programming by thinking while blogging” session.
[1] Weeks are numbered 1-52 (some years 1-53) in a standard way in Europe.
[2] I owe it to Patrick Crews, database QA engineer at Sun/MySQL and self-confessed Python lover, that I came this far. Without his inspiring and unsolicited explanation of some key Python concepts, I wouldn’t have felt comfortable enough to go ahead with this thinking.


Use PyObjC to code Python under the Mac Cocoa framework, and MySQLdb to interface with MySQL! That’s the advice I got in reply to my
What tool should I use to develop small MySQL end user applications on my Mac?


In the MySQL Community team, our charter is to serve the MySQL community — new and old MySQL users alike. One of the ways we do this is by facilitating information exchange between community members, where the new can learn from the old.
Currently, there are 217 entries in it. It contains what our user community think it is the-best-of-the-best from what the community has produced, and it’s tagged and voted for.
As I’ve 

