New series: The Python Tourist

I've started writing what will (hopefully) be a series of articles about common pitfalls in Python programming. Originally I was going to call it "Things (Not) to do When Visiting Python", but shortened that to "The Python Tourist". "Tourist" being appropriate since I move around a lot between languages, and tend to forgot all those little "gotchas" while I'm away. I'll probably do one for PHP and maybe Javascript, as I start to learn more about those. If someone has already patented "The [whatever] Tourist" for a title, leave a comment and I'll try to think of something else. Smile

First article: Passing Mutable Objects as Default Args

Incidentally, there are several other similar lists out there already, "Python Warts", etc. I'm just documenting the ones that I tend to screw up, for my own quick reference.

RSS via email

I've tried several types of RSS readers, but haven't had much luck finding one that works the way I want it to:

  1. Standalone readers: The problem I've found with them is that they don't (and really can't) maintain historic information on the feeds you subscribe to. You just get a snapshot of the current feed state, so would have to stay connected 24 hours a day to not miss anything.
  2. Web-based aggregators: I used the bloglines aggregator for a while. I liked that it kept track of old posts so I didn't have to check feeds 24 hours a day to stay current. The thing I didn't like was that you had to mark an entire feed as read/unread, and couldn't mark individual posts. (You can mark certain posts to keep, but that's not the same thing.) And if you accidentally click on the wrong icon, it marks your entire set of feeds as "read". Argh! That happened a lot ...

In my frustration, I did a search for RSS readers on wikipedia, and discovered something I never knew existed: email-based aggregators! [UPDATE 4/24/2006: Wikipedia has inexplicably taken down their list of RSS readers. A very very similar list can be found here: RSS Aggregators]There are several around, but after looking at their sites, I chose Squeet. This is how RSS reading should be! Now posts are delivered to my mailbox where I've set up filters to place them into various folders by topic. Each post is an individual item, so I can mark them read/unread/deleted just like normal messages, and they happily sit there forever until I get a chance read them. What's more, switching machines is now painless - the feeds sit in my IMAP account, so I can grab them from any machine (just have to set the filters up on each machine).

Moving to jEdit (from XEmacs)

I've been using XEmacs as my primary text editor forever. However, it has some annoying features that have bugged me over the years:

  1. Syntax coloring is somewhat limited - not enough unique "things" can be colorized.
  2. Certain modes (e.g. python-mode) have a "shift block left/right" feature, but it is not available in all modes.
  3. HTML mode is a pain. First, it asks for my email address every time I start a new page, so it can create an annoying template that I have to delete, and secondly the formatting really doesn't work very well.
  4. When I edit C/C++ code on my Windows XP machine, and edit the same code later under Linux, the formatting is not the same. For some reason the indentation gets screwed up. (Possibly a version-creep thing between the two machines, but still, it shouldn't get screwed up.)

I've tried many editors over the years as a replacement (vim, cream, NEdit, jed), but none have matched XEmacs, so I've lived with its misfeatures. I had tried jEdit a few times before, but stumbled on two things:

  1. I've found no way to assign a shortcut to run an external process like "make". This is an area where XEmacs really shines. After "make" runs, you get a clickable list of errors and can immediately jump to the problem spots. jEdit seems to have no equivalent functionality, though I'd love to hear that I'm wrong on this.
  2. Although jEdit is cross-platform, fonts under Linux (xorg-x11) are really limited and ugly for some reason.

So I trudged on with XEmacs ...

The final straw for XEmacs came last week as I was editing some HTML and PHP code for the site, and XEmacs kept splitting the screen to give me a really bizarre error message that I could neither track down nor figure out how to turn off. Editing any HTML/PHP page seemed to do the same thing. Given the other warts, and since I'm getting into an HTML/PHP phase right now, I decided to make the plunge and try to switch to jEdit.

Preventing snooping with .htaccess

When a visitor goes to your site, say http://www.example.com/foo, if the webserver doesn't find a file like "index.html", "index.php", etc. in "foo", it will by default let the visitor browse your directory tree. Personally, I think that this should be OFF by default, and only enabled if you really want people to freely browse your files. But, it is easy to turn off. All you have to do is place the following line in your .htaccess file:

Options -Indexes

If you place this in the .htaccess at the root directory of your site, it will automatically apply to all directories underneath. Then, if there are directories you want people to browse, you can enable them by placing an .htaccess in the particular directory with:

Options +Indexes

If you only want to block access to certain files, you can do that with:

IndexIgnore *.php *~

(This example blocks the listing of .php and *~ [backup] files.)

I was a little surprised to find that my webhost didn't set "Options -Index" by default, so it's a good thing to check.

A little PHP security

PHP logoI've been picking up a few things about securely running PHP apps while getting the site up and running. I am by no means a PHP security expert, but I thought I'd write down the things I've found out so far so I can find them again when I forget. The first big thing that bothered me was that I had to make all my files world-readable to get anything to work. This bugged me, so I searched around a while for a better way to do it.

Shredding with SLAX

SLAX LogoLately, my favorite Linux live-cd is SLAX. I run Gentoo on my main system, but a live-cd is useful in a number of situations. I've used KNOPPIX in the past, but it is getting to be a little heavy for my tastes, with versions now taking up the better part of a DVD. SLAX is nice and small, just under 200Mb currently - quicker to download and burn in those "I need a live-cd NOW" situations. Since I'm more interested in having a toolkit that I can use in certain situations as the need arises, rather than a fullblown system for day to day use, the smaller live-cds fit my needs better.

Ironically, my primary use of Linux live-cds over the past few years has been as a tool to rescue Windows systems that have become unbootable, either from spyware/viruses, hardware failures (thanks Dell! Twice!), or random acts of Windows weirdness. In those sitations, I will boot from the SLAX CD, copy the Windows data across the network to my primary Linux box via samba, wipe the disk with dd (so the Windows installer doesn't try to "repair" the bad system), then rebuild and copy the data back over samba.

This weekend we decided to clear out a closet of old computers and donate them to charity. The problem was that most of them had been used in a family business and contained confidential client data that we didn't want to ship out to the world. Some of the machines were bootable, some were not, but clearly, booting into Windows and trying to delete the files from a running system was a bad idea (first, it is hard to know you've gotten everything, and secondly you can't wipe the entire system while it is running). So, time to boot into a SLAX live-cd.

More on Wordpress "Import" ...

I thought that perhaps I was imagining it, but I went through a great number of my old posts on my blogger.com site and confirmed it: Wordpress screwed up the formatting of the blog entries. I have to ask: Why? Why should Wordpress need more than read-only access to import my blog entries? It's like it was written to intentionally screw up your blog so there would be no going back.

Given the defacement of my main page, and now the screwing up of the entries themselves, I'd wholeheartedly recommend staying away from the "Import" function in Wordpress.

All about Python and Unicode

And now for the first real bit of content, having nothing to do with the underlying mechanics of making the site work ...

This is a paper I wrote last year documenting the research I did to understand the ins and outs of how Python and Unicode interact. I include it here in case someone else has the same sort of questions I did, and also to bring it up to the top level of the site since there is (now) no direct link to it anywhere else.

Paper: All About Python and Unicode

Small hack required to fix RSS support

I ran across a problem with RSS support in the default Wordpress installation: The RSS (and RSS2, Atom) feeds were being given URIs of the form feed://. Apparently this is a new style way of denoting feeds that is being implemented in some RSS readers. Unfortunately, Firefox didn't recognize it, and neither did (one of) the RSS readers I use (Habari Xenu). The fix was just to remove the feed: prefix from the URIs.

To adjust this, just edit the footer.php file, which can be found in the theme directory, for example:

       /blogroot/wp-content/themes/default/footer.php

Remove the "feed:" strings from this file and all should be well again.

Ouch ... Wordpress trashed my old blog.

Well, it certainly didn't take long for me to hit a problem :-(

Wordpress has this tempting little "Import" button in the administrator panel. When you click on it, it offers to import your existing blog content from a variety of places, such as blogger.com. That seemed convenient, so I told it to import my site, and then sat back as it defaced my blog! It replaced my theme (which I had spent considerable time customizing) with this ugly page that looks like a server error, with some advertising for Wordpress. If I hadn't known better, I'd have thought my site had been hit with a rootkit.

Syndicate content