“Motion Capture” for Text-to-Speech?

I had a random thought over the weekend, and while I suspect it’s not original, I couldn’t find anyone working on it.

One big reason why text-to-speech (TTS) synthesis sucks so badly is that the result sounds flat. Yes, the synthesizer can try to infer cadence and tone from things like commas, paragraph breaks, exclamation points, and question marks, but the result still falls far short of what a human reader sounds like. In the end, the problem seems to be Turing-hard, since you need to understand the meaning of a piece of text in order to read it properly.

So would it be possible to record a human reading a piece of text, and extract just the intonation, cadence, and pacing of the text? Hollywood already uses motion capture, in which cameras record the movements of a human being, and makes a CGI creature move the same way (e.g., Gollum in The Lord of the Rings or Shrek). In fact, you can combine multiple people’s movements into one synthesized creature, say by using one person’s stride, another’s hand movements, and a third person’s facial expressions.

So why not apply the same principle to synthesized speech? For instance, you could have someone read a paragraph of text. We already have voice-recognition software, so it should be possible to analyze that recording and match it to individual words and phonemes in the text. That gives you timing, for things like the length of a comma or reading speed. The recording can then be analyzed for things like whether a given word was spoken more loudly, or at a higher pitch, than other surrounding words, and by how much. This can be converted to speech markup.

This means that you could synthesize Stephen Fry reading a book in Patrick Stewart’s voice.

Perhaps more to the point, if you poke around Project Gutenberg, you’ll see that there are two types of audio books: ones generated via TTS, and ones read by people. The recordings of humans are, of course, better, but they require that an actual person sit down and read the whole book from start to finish, which is time-consuming.

If it were possible to apply a human’s reading style to the synthesis of a known piece of text, then it would be possible for multiple people to share the job of recording an audio book. Allow volunteers to read one or two pages at a time, and synthesize a recording of those pages using the volunteer’s intonation and cadence, but using a standard voice.

I imagine that there would still be lots of problems with this — for instance, it might feel somewhat jarring when the book switches from one person’s reading style to another’s — but it should still be an improvement over what we have now. And there are probably lots of other problems that I can’t imagine.

But hey, it would still be an improvements. Is anyone out there working on this?

Disk Hack

One of the things I enjoy about Unix system administration is the McGyver aspect of it: when something goes pear-shaped, and your preferred tools aren’t available because they’re on the disk that just died, or on the other side of the pile of smoking ashes that used to be a router, you have to figure out how to recover with what you’ve got left. It’s a bit like that scene in Apollo 13 when they realize that the space capsule has a round hole for the air filter, but only square filters, and the engineer dumps all the equipment the astronauts have available onto the table, and says “We’ve got to find a way to make this fit into the hole for this, using nothing but that“.

So anyway, my mom’s Mac recently died. And, naturally, there are no available backups. But I said I’d do what I could, and took the machine home.

I’m glad we wrote off the old machine as a total loss, since it was (the tense should give you an idea of what’s coming) an iMac, one of those compact everyting-in-the-monitor models that tries oh-so-hard to fit everything into as small a space as possible. Of course, this compactness means that there’s no room to do anything: the disk is behind the LCD display, and wedged in a tight slot between the graphics card and the DVD drive. And the whole thing is wrapped in — I kid you not — foil, most likely to help control airflow. So basically I wound up ripping things out with little or no grace or elegance. If it wasn’t totaled then, it certainly is now.

At any rate, that left me with a disk that, thankfully, turned out to be unharmed. The next question was how to hook it up. I was pretty sure it had an HFS or HFS+ filesystem, which meant that the obvious thing to do would be to put it in a Mac to read. But I don’t have a Mac that I could put a second internal drive in. I toyed briefly with the idea of finding an enclosure and whatever conversion hardware would be necessary to turn an internal SATA drive into an external USB drive, but figured that was too hard for a one-shot. Then I found that my Linux box has hfs and hfsplus filesystem kernel modules, so hey.

(Of course, I don’t know how stable the Linux HFS driver is, so I figured it’d be best to write-protect the disk. At which point I discovered that this model only has one hardware jumper slot, and it doesn’t write-protect the disk. Fuck you very much, Maxtor/Apple.)

The Linux HFS driver turned out to be good enough for reading, and I could mount the disk and read files, so yay. The next question was how to get the files from there to a laptop that I could bring over to my parents’. Ths is complicated by the fact that on HFS, a file isn’t just a stream of bytes, the way it is in Unix; it has two “forks”: the data fork contains the actual data of the file (e.g., a JPEG image), and the resource fork contains metadata about the file, such as its icon, the application that should open the file by default, and so on. I didn’t want to lose that if I could help it.

The way the Linux HFS driver deals with resource forks is to create virtual or transient or whatever you want to call them files: if you open myfile, you’ll get the data fork, which looks just like any file. But if it has a resource fork, you can also open myfile/rsrc and read the contents of that. This meant that in the worst case, I 1) copy over the data forks to a directory on the Mac, then 2) find which files have resource forks (something like

find /mnt -type f |
sh -c 'while read filename; do
    if [ -s "$filename/rsrc" ]; then
        echo "$filename";
    fi;
done'

and 3) somehow re-graft the resource forks onto the files on the Mac end. But that seemed like a lot of work.

Apple software is often distributed on .dmg (disk image) files, which are mounted as virtual disks. I figured that’d be the obvious way to package up the contents of the disk. So I dded the raw disk device (/dev/sdb rather than /dev/sdb2 which was the mounted partition, in order to get the entire disk, including partition map and such), but when I tried to mount that, it didn’t work, so presumably there’s more to a .dmg file than just the raw disk data. I tried a couple of variations on that theme, but without success.

(In passing, I also noticed that rsync supports “extended attributes” on both my Mac and my Linux box, so I tried using that to copy files (with resource forks) over, only to find that the two implementations use different options to say “turn on extended attributes”, so the client couldn’t start the remote server correctly.)

Eventually, I realized that dd could be used not just to read a disk image, but to write one. Yes, I said above that reading from the disk and writing to a file didn’t produce a usable disk image. But I also said that .dmg files are mounted like disks, and that implies that there has to be a device to mount.

So on the Mac, I created a disk image file with hdiutil create, then opened it with the Disk Utility. Forcing “Verify disk” made the Disk Utility mount the image on /dev/disk2s9, just before telling me that there was no usable filesystem on the disk image. That was fine; all I wanted was for it to create a /dev/disk* device that I could write to. Then I was able to

ssh linuxbox 'dd if=/dev/sdb bs=2M' | dd bs=2048k of=/dev/disk2

to transfer the raw contents of the disk to the “disk data” portion of the disk image.

To my slight surprise, this actually worked. Yes, I had to repair the disk image, but from the log messages, that appears to be because some of the superblock copies were missing (the disk is 120Gb, but I only created a 32Gb image).

The final problem should be that of getting the disk image from my locked-down(-ish) laptop to my mom’s new vanilla Mac, but I don’t think I’ll bother. It’ll be a lot easier, and better in the long run, to put the old disk image onto an external drive that can then double as a backup disk, so I don’t have to do this again.

Because while it can be fun to solve a puzzle and figure out how to fix something with suboptimal tools, there’s also wisdom in avoiding getting into such situations in the first place.

“Avowed”

The New York Times ran a piece about the David Mabus affair (tl;dr version: he’s a mentally-ill troll who’d been sending death threats to people for years, and was finally arrested after enough people complained to the police).

It begins:

Over the years, someone writing as David Mabus made himself known to scientists and avowed atheists across North America in thousands of threatening e-mails and violently profane messages on Twitter.

The phrase “avowed atheists” annoyed me, because I see it a lot. I even twatted about it:

The phrase “avowed atheist” still annoys me, though. When’s the last time someone was an “avowed Baptist”?

Then I realized that with an entire browserful of Internet at my disposal, I could answer that question.
Read More

Meta-Social Code

I had an idea the other day, and I’m not sure why no one’s implemented. I suspect that either a) someone has and I don’t know about it, or b) there’s some fundamentally-unsolvable problem. If so, please point this out in the comments.

At any rate, the idea is this: as a site owner, I want to make it easy for people to link to my site. I want a wall of social-site buttons on every page, with every site from AOL to Zynga.

But as a reader, I only have a few sites that I use to link to places. I don’t want to wade through row after row of useless icons just to find the one that I want to use to share the URL.

So it seems the obvious thing to do would be for some entrepreneur (not me) to offer “social site button bar as a service”. The site owner adds some markup to the page to say “this is something that can be recommended/liked/shared”, and include a JavaScript script that takes care of the magic behind the scenes. Something like:

<html>
<head>
http://social.com/siteid12345/api.js
</head>
<body>
<h1>This is my page</h1>
<p>Hello world</p>
<social:button-bar/>
</body>
</html>

The script can take care of adding additional <script>s to load additional APIs from whichever social sites are being loaded, and add a DOMContentLoaded listener that’ll replace <social:button-bar> with a series of other elements, like <g:plusone> and <fb:like>.

The end-user, meanwhile, can visit a configuration page and decide which social sites will appear in the button bar. This can be saved in a cookie.

I’ve consed up a quick and dirty prototype, and was surprised that it worked. I haven’t tested it extensively, though.

The obvious objection (aside from “but how does one make money off of this thing?” But they said that about Kozmo.com too. So there) is that this seems like an engraved invitation to cross-site scripting (CSS) holes. And privacy leaking, and who knows what all else.

A related question, which I haven’t answered, is who can see the cookie? If the JavaScript code comes from social.com, then social.com needs to be able to see the user configuration cookie to know which buttons’ code to serve up. But if the reader is looking at content.com, there’s no good way to get a social.com cookie and pass it along. It might be possible to set a content.com cookie, but of course that won’t help when the user surfs over to othersite.org, where we want the end-user to see the same button bar.

I confess that I’m not entirely clear on the policies that govern which sites/scripts can see what data. So it’s quite possible that I’m missing something glaringly obvious.

Fun With Barcodes

If you have an Android phone, odds are that you have the Barcode Scanner app. And if you’ve looked in the settings, you may have noticed one called “Custom Search URL”.

This is, as the name says, a URL you can fill in. Once you do, you’ll get a “Custom search” button at the bottom of the screen when you scan a barcode. A “%s” in the URL will be replaced by the numeric UPC value, and “%f” with its format (which is displayed next to the code when you scan one).

It seems to me that this can be used as a poor man’s plugin API. You can use http://www.mydomain.org/barcode?f=%f&s=%s, and make barcode be a CGI/PHP/whatever script that looks at the format and code and decides what to do.

For instance, $EMPLOYER has barcoded asset tags on all inventory items. So today I was able to scan a machine’s code and be redirected to the inventory web page for that machine.

Likewise, if it’s an EAN-13 code that begins with 978 or 979, then presumably it’s an ISBN or ISMN, and you can look it up at Amazon, your library, or wherever.

As far as I know, you can’t recognize that a UPC corresponds to a CD or DVD, without having a table of every CD/DVD publisher, but there’s nothing that says your script has to only do redirection; you can present a list of links to the user. So anyway, for CDs, you can construct MusicBrainz or Discogs lookup URLs. Or perhaps you can parse the code, get the manufacturer, and based on the user’s choice, remember what sort of item that manufacturer corresponds to. Over time you can build up a “good enough” database of the things you scan most often.

I wouldn’t mind having a properly organized library of books, CDs, etc. Which is kind of the point of looking this data up on the net in the first place. But while a phone may make a serviceable barcode scanner, it’s no good for lengthy data input. So really, what I’d like would be for the script to remember what I’ve scanned, along with a quick and dirty readable reference (e.g., “ISBN such-and-such: The Wee Free Men by Terry Pratchett) and stash that someplace so that I can later go back and import the data into Koha or whatever I’m using.

Of course, since it’s a web page, I’m guessing you have access to the full range of goodies that people put in browsers these days. I’m thinking of geographical location, in particular. So the script could in principle behave differently based on where you’re located at the moment (e.g., at home or at work).

There are lots of possibilities. Must. Explore.

A Couple of Shell Quickies

Since I got asked several sh-related questions, I might as well get a post out of them.

One person asks:

I’m writing a portable shell script to download a file from the web, and then compare it to a locally-stored version of the same file, using diff.

My first version of the script used mktemp to download the web-file to temporary file, run the diff command, and then delete the temporary file afterwards. e.g.

TEMPFILE=$(mktemp)
wget -q $ONLINEFILEURL -O $TEMPFILE
diff $TEMPFILE $LOCALFILE
rm $TEMPFILE

However I later discovered that the BSD version of mktemp has an incompatible syntax to the GNU version of mktemp. So then I got rid of the usage of temporary files completely, by using input-redirection, e.g.

diff <(wget -q $ONLINEFILEURL -O -) $LOCALFILE

However while this works fine under bash and ksh, it fails under ash and sh with

Syntax error: "(" unexpected

to which I replied:

The first obvious problem here is that “(wget -q $ONLINEFILEURL -O -)” isn’t a filename, it’s a subprocess. So the shell sees “<” and expects a filename, but finds “(” instead.

It looks as though the way to get diff to read from stdin is the standard way: specify “-” as the filename, and give it input on stdin. Since you’re feeding it the output from a process, you want to use a pipe:

wget -q $ONLINEFILEURL -O - | diff - $LOCALFILE

I also suggested that he could try to figure out which version of mkfile he was using:

# Wrapper function for GNU mktemp
gnu_mktemp() {
	mktemp /tmp/tmpfile.XXXXXX "$@"
}

# Wrapper function for BSD mktemp
bsd_mktemp() {
	mktemp -t /tmp/tmpfile.XXXXXX "$@"
}

# Try to figure out which wrapper to use
if mktemp -V | grep version >/dev/null 2>&1; then
	MKTEMP=gnu_mktemp
else
	MKTEMP=bsd_mktemp
fi

mytmpfile=`$MKTEMP`

And, of course, if race conditions and security aren’t a big concern, there’s always

mytmpfile=/tmp/myprogramname.$$

Another person wanted to write a bash script that would do one thing when run by him or root, and another thing if run by anyone else (basically, die with an error message about insufficient privileges and/or grooviness).

He asked whether the following two expressions were equivalent:

  • if [[ ( `whoami` != "root" ) || ( `whoami` != "coolguy" ) ]]
  • if [[ ! ( `whoami` = "root" ) || ( `whoami` = "coolguy" ) ]]

They’re not, but maybe not for obvious reasons, because propositional logic is a harsh mistress.

In the first expression,

if [[ ( `whoami` != "root" ) || ( `whoami` != "coolguy" ) ]]

let’s say that the user is joeblow. In this case, “`whoami` != "root"” is true, and so the shell can short-circuit the rest of the “||“, because the entire expression is true.

If the user is root, then the first part, “( `whoami` != "root" )” is false. However, the second part, “( `whoami` != "coolguy" )” is true (because rootcoolguy), and so the entire expression is “false || true”, which is true.

The second expression,

if [[ ! ( `whoami` = "root" ) || ( `whoami` = "coolguy" ) ]]

is closer to what he wanted, but doesn’t work because of operator precedence: “!” binds more tightly than “||“, so the expression is equivalen tto “(whoami ≠ root) || (whoami = coolguy)”.

In this case, if the user is joeblow, the first clause, “whoami ≠ root“, is true, and so the entire expression is true.

Worse yet, if the user is root, then neither the first nor second clause is true, so the entire clause is false.

What he really wanted was something like:

if [[ ( `whoami` = "root" ) || ( `whoami` = "coolguy" ) ]]; then
	# Do nothing
	:
else
	# Do something
	echo "Go away"
	exit 1
fi

Except, of course, that since the if-clause is empty, it can be cut out entirely. Then all we need to do is to negate the condition and only keep the code in the else-clause:

if [[ ! ((`whoami` = "root" ) || ( `whoami` = "coolguy" )) ]]

Note the extra pair of parentheses, to make sure that the “!” applies to the whole thing.

(Update, May 18: Fixed HTML entities.)

GDMF Exchange!

Yet more reason to hate MS Exchange. Here are the relevant headers and MIME lines from a meeting notification I got recently:

Header:

Subject: XXX Staff Meeting
Content-Type: multipart/alternative;
	boundary="_000_A32122FBAE23C145ABBEFDEA852187CE01E77A1505B7BLADEBLA03V_"
MIME-Version: 1.0

Body:

--_000_A32122FBAE23C145ABBEFDEA852187CE01E77A1505B7BLADEBLA03V_
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64

…
--_000_A32122FBAE23C145ABBEFDEA852187CE01E77A1505B7BLADEBLA03V_
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: base64

…
--_000_A32122FBAE23C145ABBEFDEA852187CE01E77A1505B7BLADEBLA03V_
Content-Type: text/calendar; charset="utf-8"; method=REQUEST
Content-Transfer-Encoding: base64

…

--_000_A32122FBAE23C145ABBEFDEA852187CE01E77A1505B7BLADEBLA03V_--

At first glance, all might look normal: there’s a calendar entry with a note attached.

So first we have the plain text version of the note, followed by the HTML version of the note, followed by the vCalendar file describing the meeting itself.

But a closer look at the header shows that the message as a whole has Content-Type: multipart/alternative.

RFC 1521 says:

The multipart/alternative type is syntactically identical to multipart/mixed, but the semantics are different. In particular, each of the parts is an “alternative” version of the same information.

Systems should recognize that the content of the various parts are interchangeable. Systems should choose the “best” type based on the local environment and preferences, in some cases even through user interaction.

In other words, any standards-compliant mail reader should see those three MIME chunks as three different versions of the same information. So if it decides to (or you tell it to) display the HTML version of the note, it shouldn’t display the calendar file. And if it displays the calendar entry, it shouldn’t show the attached note.

Clearly somebody at Microsoft needs to be slapped. Hard.

(And in case you’re wondering, the proper way to do what they’re trying to do would be for the message as a whole to be multipart/mixed with a multipart/alternative chunk for the note, and a text/calendar chunk for the calendar entry. The note chunk would be further subdivided into a text/plain chunk and a text/html chunk.)

I think that when people are first told that Exchange is both a mail server and a calendar server, they think it’s kind of like a goose — something that can competently walk, swim, and fly, even though it may not excel at any of those — but in reality, it’s more like a crocoduck: massive fail at every level, no matter how you look at it.

Using snoop/tcpdump as a Filter

Okay, this is kinda cool. Yesterday, I ran snoop (Sun’s version of tcpdump) to help the network folks diagnose a problem we’ve been seeing. Unfortunately, I let it run a bit too long, and wound up with a 1.5Gb file. And the guy who’s going to be looking at this is at a conference, and would rather not download files that big.

Now, I’d known that snoop can dump packets to a file with -o filename and that that file can be read with -i filename; and of course that you can give an expression to say what kinds of packets you want to scan for. But until now, it never occurred to me to put the three of them together. And it turns out that not only does snoop support that, it Does The Right Thing to boot.

Now, one of the reasons I wound up with 1.5Gb worth of packets is that we didn’t know which port the process we were trying to debug would run on, until it ran. (That, and the fact that I started scanning early because I wasn’t sure when it would run. And ending late because the Internet dangled shiny things in front of me.)

At any rate, I was able to run

# snoop -i old-snoop-log -o new-snoop-log host thehost.dom.ain port 50175

and wind up with a packet capture file of manageable size.

And a bit of experimentation showed that tcpdump does the same thing (adjust arguments as appropriate). I’ll have to remember this.

Countdown to Backpedaling Widget

Over on the right, in the sidebar, you should see a countdown clock entitled “Countdown to Backpedaling”. (If not, then something went wrong.)

If you’ve been listening to Ask an Atheist, then you should recognize this as a widget version of the Countdown to Backpedaling clock. And if not, then you should definitely be listening to them. Because they’re cool.

At any rate, it’s a clock that counts down to May 22, the day after Jesus’ return and the Day of Judgment, when the backpedaling and excuses begin.

So anyway, now you want to know a) where to download this, b) how to install it, and c) how to complain to me about all the problems you’ve had with (a) and (b).

Download

The main download page is .

If you’re using WordPress, you can download , put it in your wp-content/plugins directory, and with any luck, a “Countdown to Backpedaling” widget should magically show up in your “Widgets” control panel. You can then drag it into position, and it should work.

If you’re using some other software, you’ll want . Installation depends on what you’re using, of course, but you should be able to insert it anywhere that takes HTML.

Configuration

The main configuration option is the “show_secs” variable at the top. If you want to see seconds in the countdown, set it to true. If you find the seconds’ flashing annoying, set it to false.

You can also look through the CSS part, and edit as you please. You might need to change the width.

I might improve on this, if time permits and I don’t get raptured before getting around to it.

If you have any comments or complaints, leave a comment. Bug reports accompanied by a rum and Coke will get higher priority. Bug reports accompanied by a patch will get even higher priority.

Bourne Shell Introspection

So I was thinking about how to refactor our custom Linux and Solaris init scripts at work. The way FreeBSD does it is to have the scripts in /etc/rc.d define variables with the commands to execute, e.g.,

start_cmd='/usr/sbin/foobard'
stop_cmd='kill `cat /var/run/foobar.pid`'

run_rc_command "$1"

where $1 is “start”, “stop”, or whatever, and run_rc_command is a function loaded from an external file. It can check whether $stop_cmd is defined, and if not, take some default action.

This is great and all, but I was wondering whether it would be possible to check whether a given shell function exists. That way, a common file could implement a generic structure for starting and stopping daemons, and the daemon-specific file could just set the specifics by defining do_start and do_stop functions.

The way to do this in Perl is to iterate over the symbol table of the package you’re looking for, and seeing whether each entry is a function. The symbol table for Foo::Bar is %Foo::Bar::; for the main package, it’s %::. Thus:

while (my ($k, $v) = each %::)
{
	if (defined())
	{
		print "$k is a functionn";
	}
}

sub test_x() {}
sub test_y() {}
sub test_z() {}

But I didn’t know how to do it in the Bourne shell.

Enter type, which tells you exactly that:

#!/bin/sh

# List of all known commands
STD_CMDS="start stop restart status verify"
MORE_CMDS="graceful something_incredibly_daemon_specific"

do_start="This is a string, not a function"

do_restart() {
	echo "I ought to restart something"
}

do_graceful() {
	echo "I am so fucking graceful"
}

for cmd in ${STD_CMDS} ${MORE_CMDS}; do
	if type "do_$cmd" >/dev/null 2>&1; then
		echo "* do_$cmd is defined"
	else
		echo "- do_$cmd is not defined"
	fi
done

And yes, this works not just in bash, but in the traditional, bourne-just-once shell, on every platform that I care about.

So yay, it turns out that the Bourne shell has more introspection than
I thought.