Unformed thoughts from scripting

This is the post I was talking about in the previous post. I had a couple of tabs open for the last few weeks, loosely related to bash-scripting. I want to capture these links so I can allow myself to close the tabs, and I also want to explain why I think they are useful. Then I’ll ramble on for a couple of paragraphs about some other thoughts that arose in the process of pulling these explanations together (accompanied by other links of course).

  1. the ‘test’ man-page
  2. tldp discussion of ‘here’ documents

I’ve come to like the “test” command as a handy tool when automating any tasks that act upon the file-system. This is probably something that many would think is ugly, but I like it and here’s why …

So obviously you understand why I need to check the status (existence, readability, writeability of files). No need for me to explain that. If you think using the ‘test’ command is bad, it’s probably because you think it inelegant. You believe that using more idiomatic language-features are the way to do this kind of thing. You might be smarter than me, you might think that my reasons for choosing this approach relate to cognitive limitations (which is true, but not invalid), or laziness in not taking the time to learn the details of said language features.

I like the ‘test’ command command precisely because it is ‘not’ idiomatic. In the spirit of the unix philosophy it is a small simple program that does something very well. I know exactly what it is going to do in any setting. There are no obscure syntax rules surrounding it. I don’t need to memorize any special cases. I can copy and paste single-line expressions til the cows come home and I can even use them on the command-line unmodified.

# print "file exists" if myfile exists
$ test -f myfile && echo "file exists"

could alternatively be expressed as:

$ if [ -f myfile ]; then
>     echo "file exists"
> fi

which is more verbose and idiomatic, but has the drawbacks that it is on three lines as opposed to one. It relies quite heavily on some loaded syntax rules. It can’t be easily transplanted to another context. It can’t easily be pasted to the command-line, or live in your history. There are also various language subtleties in place here: spaces don’t matter in some places (e.g. indentation) but do in others (e.g. spacing between ‘if’ and ‘[‘ or ‘-f’).

There is also a fundamental dishonesty in the picture here because the ‘[‘ element appears to be a sytactical element when it is in-fact a command!

 $ which [
/usr/bin/[

so effectively you could just do our sample logic like this:

$ [ -f myfile ] && echo "file exists"

Here Be Dragons“: you are now using punctuation for naming a command. You are misrepresenting a command as punctuation. For somebody who knows what’s going on this is fine (in fact when doing a course a few years ago where this concept was introduced, the instructor took pains to point this out) but if you don’t there is a possibility for confusion.

Punctuation can mean different things in different circumstances. Sometimes a ‘[‘ may need to be escaped. Sometimes it may not. I’d like to be able to wallop out a simple command on a command-line without having to painfully construct weird syntax. I don’t / can’t keep all the different rules for all the different languages I use salient in my mind at all times.

Which brings me to the argument that maybe this approach is just because I am (we are?) too limited in my mental capabilities.  In a world where people often overestimate their capabilities this is perhaps a virtue. I also like to consider that the code I produce isn’t just a showcase of my own intimate knowledge of a particular language. I also like to consider the perspective of somebody else looking at what I’ve done, where they might not have the luxury of teasing out the latent meaning carried by an arcane combination of punctuation and spaces.

I often visualise the tired and stressed-out support engineer. One of my colleagues (or even myself) sitting in the office in the late evening trying to determine why a red-light has gone off. In such a scenario spirits may be low, tempers may be frayed, but almost certainly it isn’t the case that that they have the capacity to instantly recall the rules that have shaped the various “musical notes” they now see in front of them. How to determine the error isn’t among the opaque? How to make the fix? “Should this be an asterisk [*] or a plus [+]?” “Is that a tab or four spaces?”

Sometimes it’s just enough to visualise myself in 6 months looking at my own code and going W. T. F. !

I can also use ‘test’ for examining environment-variables:

$ test $myvar || echo "myvar is not set"

carries a lot less baggage than:

$ [ x$myvar != x ] && echo "myvar is not set"

“here” docs

If I have a load of different items to process I like to express it thus:

$ while read package_name; do
>   yum install -y $package_name
> done <<-END
>     httpd
>     screen
>     git
>     svn
>     gcc-c++
>     autoconf
>     automake
>     glibc-devel.i686
>     libstdc++-devel.i686
> END

which is how I might describe the installation of a bunch of pre-requisite packages for a 32-bit cross-compiling environment.

What I have done here is separate the command performing the installation from the list of packages to be installed. At a stroke you have separated concerns but also provided locality, and simplicity.

There are two alternatives here that spring to mind. The first is to simply repeat the command multiple times each with the parameter changed. Instantly people will say this breaks the “don’t repeat yourself” principle, with the very pragmatic concern that any mistakes in the composition of the command need to be corrected for each repetition, and also any later fixes can provide fertile ground for further bugs where one or more repetitions of a fix are not perfectly reproduced.

The alternative approach is to separate the list of parameters into another file, which is “sounder engineering practice” but also increases complexity as you’ve now created two files to maintain rather than one. The practice of externalising parameters makes less sense for scripting languages than compiled programs since a script can be easily edited.

Some good thoughts on this topic are here.

In the course of putting this little essay together I found this little quote from Linus:

“Bad programmers worry about the code. Good programmers worry about data structures and their relationships.”

which I suppose could be deployed either in support of some of my arguments here, or against the whole process of writing about it 🙂

Also, a quick note on the syntax of here-documents: It should be clear that what happens is each line up until ‘END’ (which may be any arbitrary string as specified after ‘<<‘) is read into standard input and passed into the subshell as parameter “$package_name”. The use of ‘-‘ may be less clear. It means that leading tabs on each line won’t be passed in to the subshell, which is handy for presentation-purposes (although I guess in this case leading tabs wouldn’t be a problem, but anyway).

Also, something to bear in mind is that any words prefixed with ‘$’ will be treated as an environment variable, and will be expanded. Now this is very cool if you want to generate a here-doc based on parameters inferred elswhere in your script.

If you want to output your here-doc without expansion surround your termination string (which is “END” in my example) with single-quotes e.g.

$ while read f; do
>   echo $f
> done <<-'END'
>     normal text
>     $dont_expand_this
> END

A complete discussion of here-docs is available in the linux documentation project.