Bugs: Browse | Submit New | Admin

[#20764] Too many open files on large deploys

Date:
2008-06-21 16:37
Priority:
3
Submitted By:
Daniel Kionka (dkionka)
Assigned To:
Ryan Davis (zenspider)
Category:
vlad
State:
Closed
Summary:
Too many open files on large deploys

Detailed description
I have a complex vlad deploy script that worked fine on our test pods, but sometimes failed when I ran it on our production
site, which has 30 hosts.  I have not pinpointed the problem, but the more run's there are, the worse it was.

The original error was "Too many open files".  I added `/usr/sbin/lsof -p $$ | wc -l` to see how many open
files there were, and I got up to 800, which is far more than there should be.  Our IT doubled "ulimit -n",
but then I started getting random errors.  It is simply using too many resources.

My work-around was to reduce the number of run's by adding commands to an array and running them all at once with set
-e.

I do not know what pushes it over the edge, but one possibility is that I do my own conditional update/restart based
on the error code of a remote diff.  That requires catching the exception from run -- unless there is a better way.
I will also open a feature request on that.

Add A Comment: Notepad

Please login


Followup

Message
Date: 2009-03-04 23:00
Sender: Ryan Davis

1.3.0 fixes this problem. Thanks for the bug report.
Date: 2009-03-02 19:48
Sender: Ryan Davis

I've switched to ThreadGroup for the Action#execute workers.
I don't know if that is going to address the issue or not as
I have yet to replicate this issue. At the least it'll drop thread
instances quicker/easier.

I'd like to do something to replicate this issue and then introspect
and hunt down those open files...

Can you provide me some more basic info? size of project (# of
files), etc? I'm gonna try to write up a simulator script that'll
generate project and vlad setups with the appropriate knobs.
Date: 2008-06-30 11:57
Sender: Daniel Kionka

I found a problem with the work-around today.  If it fails 
on some host, all you get is this long string of commands 
in the error message, and you don't know which of the 
commands actually failed.

Attached Files:

Name Description Download
No Files Currently Attached

Changes:

Field Old Value Date By
close_date2009-03-04 23:002009-03-04 23:00zenspider
resolution_idNone2009-03-04 23:00zenspider
status_idOpen2009-03-04 23:00zenspider
summary"Too many open files" on large deploys2009-02-17 22:27zenspider
summary"Too many open files" on large deploys2008-06-30 11:57dkionka