How Do You Spell the Names of the Months?

How Do You Spell the Names of the Months?

One construct that I’ve seen (and used) a lot in Perl is a hash that maps month abbreviations to numeric values that can be fed to POSIX::strftime:

our %months = (
    "jan" => 0, "feb" => 1, "mar" => 3, ...
);

This is useful for parsing log files and such. It works, it’s quick and easy, and it doesn’t require a whole tree of dependent modules (which are always on the wrong side of the Internet) to be installed.

But what’s always bugged me is that this is the sort of thing that the machine ought to know already. And besides, it’s US-centric: what if the person running the script is in a non-English-speaking country?

Fortunately, I18N::Langinfo knows the names of the months.

I18N::Langinfo provides the langinfo function, which is a wrapper around the nl_langinfo(3) function, a system function that returns the string corresponding to some internationalizable element, like the names of the months and days of the week. See /usr/include/langinfo.h for a complete list. In this case, we want the constants ABMON_1ABMON_12.

Here’s a script that looks up the month name abbreviations and returns them as an array (returning a more useful hash, as above, is left as an exercise for the reader):

#!/usr/local/bin/perl
use I18N::Langinfo;

our @monabbrs = &getmonths;

for (my $i = 1; $i <= 12; $i++)
{
        print "Mon $i: [$monabbrs[$i]]n";
}

sub getmonths
{
        my @retval;

        for (my $i = 1; $i <= 12; $i++)
        {
                $retval[$i] = langinfo();
        }

        return @retval;
}

Note the use of a named reference to dig the ABMON_* constants out of I18N::Langinfo‘s namespace. I could have used

use I18N::Langinfo qw(langinfo ABMON_1 ... ABMON_12);

but didn’t like it, for obvious reasons.

A Better &getmonths

Now, one problem is that this defaults to the current locale. But what if you’re a German living in Japan, trying to parse a Russian log file? For that, we need to be able to specify the log file’s locale information.

Here’s a second version of the &getmonths function that takes an optional locale argument. Since langinfo only returns information about the current locale, &getmonths temporarily sets the locale to the one the caller requested, then restores it before returning.

use POSIX qw(setlocale LC_CTYPE LC_ALL);
...
sub getmonths
{
        my $locale = shift;
        my $oldlocale = setlocale(LC_CTYPE);
        my @retval;

        setlocale(LC_ALL, $locale) if defined $locale;

        for (my $i = 1; $i <= 12; $i++)
        {
                $retval[$i] = langinfo();
        }

        setlocale(LC_ALL, $oldlocale) if defined $locale;

        return @retval;
}

This version uses langinfo(LC_CTYPE) to query the current locale. The two POSIX::setlocale calls temporarily set the locale to the one requested by the caller, then restore the old one.

Note that this version sets LC_ALL, which sets everything i18n-related to use the same locale. It’s conceivable that you might want to use US English for months, Russian for money, and Japanese for dates, and it’s possible to handle this, but that’s left as an exercise for the reader.

Discussion

setlocale is POSIX-standard, which means that analogous functions can be written in C, PHP, and so on.

However, one obvious problem with setlocale is that it (evidently) uses a global variable to set the C library’s idea of what the current locale is. This means that the functions above are not reentrant: if you have multiple threads in a program, then the functions above have to use a lock: no other thread must be allowed even to print any messages to the user. Hence, if at all possible, it’s best to stuff this into an initialization function and cache its results.

There’s a catopen(3) function that could probably allow you to extract the month names in a locale other than the current one, but it’s not as convenient. It looks as though it requires you to know where the locale files are stored on your machine.