Discussion:
Suddenly sv does not start, gives a timeout
Peter Hickman
2013-05-22 09:30:54 UTC
Permalink
One of our servers has started to have a problem with runit. Even after a
reboot we get this:

$ sv start ./service/unicorn/
timeout: down: ./service/unicorn/: 1s, normally up, want up

This has just started without (as far as we can tell) there being any
change to the server. I've even nuked the ./service/* directory so that it
will get rebuilt when the application is deployed (via capistrano - this is
a Rails app) but that does not seem to help.

The other 23 servers which are set up in the same way have no problem so I
am at a loss as to where to start looking.

Any idea of where I should look for clues?
Robin Bowes
2013-05-22 10:16:44 UTC
Permalink
Post by Peter Hickman
Any idea of where I should look for clues?
In the logs. What do the logs say?

Or try stopping the service and running it manually from the command
line so you can see the output from the run script.

R.
Peter Hickman
2013-05-22 13:32:14 UTC
Permalink
Well this is what we have. Firstly we manually started it so lets kill it:

$ ps ax | grep scorecard
731 ? S 0:11 runsv scorecard_cricket_scores_importer
2980 ? Sl 0:34 services/scorecard_cricket_scores_importer.rb


16599 pts/0 S+ 0:00 grep scorecard
$ kill -9 2980
$ ps ax | grep scorecard
731 ? S 0:11 runsv scorecard_cricket_scores_importer
16671 pts/0 S+ 0:00 grep scorecard

The process has gone and will not be restarted no matter how long you wait.
So we try and start it with sv:

$ sv start ./service/scorecard_cricket_scores_importer/
timeout: down: ./service/scorecard_cricket_scores_importer/: 1s, normally
up, want up
$ ps ax | grep scorecard
731 ? S 0:11 runsv scorecard_cricket_scores_importer
16868 pts/0 S+ 0:00 grep scorecard

Still not started. So we try it manually:

$ ./service/scorecard_cricket_scores_importer/run &
[1] 16929
$ ps ax | grep scorecard
731 ? S 0:12 runsv scorecard_cricket_scores_importer
16929 pts/0 Sl 0:10 services/scorecard_cricket_scores_importer.rb


18896 pts/0 R+ 0:00 grep scorecard
$

And it keeps running without any problems for as long as you let it

There are no errors in the logs and nothing reported in:

runsvdir -P /etc/service log:
..................................................................................................................................................................................................................................................................

Is there some other runit log that I should look into?
Charlie Brady
2013-05-22 13:40:16 UTC
Permalink
Post by Peter Hickman
$ ps ax | grep scorecard
731 ? S 0:11 runsv scorecard_cricket_scores_importer
2980 ? Sl 0:34 services/scorecard_cricket_scores_importer.rb
16599 pts/0 S+ 0:00 grep scorecard
$ kill -9 2980
You have a race condition here - process 2980 may have already died. Use
"sv d services/scorecard_cricket_scores_importer.rb" to stop the process.

You also should not be using -9 unless you have exhausted other options.
Use -TERM or -QUIT. Using -9 is a bad habit to have.
Post by Peter Hickman
$ ps ax | grep scorecard
731 ? S 0:11 runsv scorecard_cricket_scores_importer
16671 pts/0 S+ 0:00 grep scorecard
The process has gone and will not be restarted no matter how long you wait.
$ sv start ./service/scorecard_cricket_scores_importer/
timeout: down: ./service/scorecard_cricket_scores_importer/: 1s, normally
up, want up
$ ps ax | grep scorecard
731 ? S 0:11 runsv scorecard_cricket_scores_importer
16868 pts/0 S+ 0:00 grep scorecard
$ ./service/scorecard_cricket_scores_importer/run &
[1] 16929
Why start it in the background?
Post by Peter Hickman
$ ps ax | grep scorecard
731 ? S 0:12 runsv scorecard_cricket_scores_importer
16929 pts/0 Sl 0:10 services/scorecard_cricket_scores_importer.rb
18896 pts/0 R+ 0:00 grep scorecard
$
And it keeps running without any problems for as long as you let it
Then your service is faulty. Failing silently is not satisfactory.

Use strace to see what your process is doing, and when and why it is
exiting.
Post by Peter Hickman
..................................................................................................................................................................................................................................................................
Is there some other runit log that I should look into?
Peter Hickman
2013-05-22 14:22:36 UTC
Permalink
Aaargh found the cause and it was not sv :)

The way ruby was installed on the machine had changed but the change was
not visible if you logged on as the user and ran the commands manually.
However when sv did it's magic it didn't have the same PATH values and
failed.

I am off to view the log files and see who I should castigate >_<

Thank you for your time and sorry for wasting it

Loading...