I have a love/hate relationship with daemontools. It’s powerful way to manage services (like web servers and mail servers), but it sometimes doesn’t act the way I expect it to.
Today, I tried to set up qmail-pop3d. I set up a directory called /var/qmail/supervise/qmail-pop3d
and put a file named ‘run’ inside it. This contained:
#!/bin/sh
PATH=/var/qmail/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/bin
export PATH
tcpserver -H -R 0 pop-3
/var/qmail/bin/qmail-popup vkurup.acornhosting.net
/home/vpopmail/bin/vchkpw /var/qmail/bin/qmail-pop3d Maildir &
Then I connected the run script to my /service directory
# ln -s /var/qmail/supervise/qmail-pop3d /service
Then I checked to see if it was running:
# svstat /service/qmail-pop3d
/service/qmail-pop3d: up (pid 13524) 0 seconds
Hmmm… up 0 seconds. That’s not right - it should be at least 5 or 6 seconds (I can’t type that fast). That usually means that the service is repeatedly failing and restarting itself.
ps shows that qmail-pop3d is running, but trying to connect to port 110 doesn’t work:
$ telnet localhost 110
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection timed out
And now I’m stuck. Nothing is getting logged to the qmail-pop3d logs. After some headbanging, I do another ps -ax
and this time notice a process called readproctitle:
11919 ? S 0:05 readproctitle service errors: ...lready used?tcpserve
r: fatal: unable to bind: address already used?tcpserver: fatal: unable to bind:
address already used?tcpserver: fatal: unable to bind: address already used?tcp
server: fatal: unable to bind: address already used?tcpserver: fatal: unable to
bind: address already used?tcpserver: fatal: unable to bind: address already use
d?tcpserver: fatal: unable to bind: address already used?
What’s that? I look it up and find that readproctitle is a kind of scrolling-log for daemontools which shows up when you run ps
. So, it looks like qmail-pop3d
failing because it’s trying to bind port 110 even though it’s already been bound. So this makes me thing that qmail-pop3d
isn’t failing, but that it’s running over and over again. And then I look back at my run script and look at that pesky little ‘&’ at the end. Doh! That tells the process to detach once it starts. It starts normally, then detaches. Once it detaches, daemontools thinks it’s down, so it tries to start it again. (At least that’s the way I understand it). Getting rid of the ‘&’ and adding ‘exec’ to the beginning fixes it and everything works now.
Me: 1 Daemontools: 2123