[Mongrel] random cpu spikes, EBADF errors

Zachary Powell zach at plugthegap.co.uk
Wed Oct 24 09:14:58 EDT 2007


Hi, short follow up. I did
lsof -p 27787 | grep mongrel  -c
=> 64 files open

while it was spiking, so it doesn't look like a max file issue. Looking
thorough the list, the only thing of note was

mongrel_r 27787 rails    5w  unknown
 /proc/27787/fd/5 (readlink: No such file or directory)

5 was likely the id of the file getting EBADF (usually 4-6). I didn't catch
which file it was before it the problem ended (running lsof was very slow,
took 20 secs while problem was occuring), but previously when I've checked
its always been there and accessable from web. Also, the dir in question is
full of files, so it hasn't been sweeped recently (couldn't have been
cache-expiry while reading or anything weird like that).

Zach



On 10/24/07, Zachary Powell <zach at plugthegap.co.uk> wrote:
>
> Hi Guys,
>
> I'm using latest Litespeed 3.2.4 (just upgraded), Mongrel 1.0.1, and Ruby
> 1.8.6, running Red Hat Enterprise Linux ES 4 . We have
>
>
> Web/Mysql/Mail server:
>
> # RAID Configuration: RAID 1 (73GBx2)
>
> # HP Memory: 2 GB HP RAM
>
> # HP DL385 G2 Processor: Dual Socket Dual Core Opteron 2214 2.2 GHz
>
>
> (runs between 0.10 - 0.20, mostly MySql, but spikes up when issue occurs
> with litespeed taking ~30% cpu)
>
>
> and App Server:
>
>
> # RAID Configuration: RAID 1 (146GBx2)
>
> # HP Memory: 4 GB HP RAM
>
> # HP DL385 G2 Processor: Dual Socket Dual Core Opteron 2214 2.2 GHz
>
>
> (usually 0.20 - 0.60 with legitimate spikes up to 1.8 for backend
> processes. Spikes up to 2-4 when it happens, depending on how many mongrels
> get the problem (sometimes 2))
>
>
> And these are the  mongrels running:
>
>
> MONGREL   CPU  MEM  VIR RES  DATE  TIME   PID
> 8010:hq   3.2  3.4  145 138  Oct23 43:13  20409
> 8011:hq   0.6  3.0  132 125  Oct23 8:15   20412
> 8012:hq   0.1  1.8   81  74  Oct23 1:28   20415
> 8015:dhq  0.0  1.0   50  44  02:41 0:08   4775
> 8016:dhq  0.0  0.7   34  30  02:41 0:01   4778
> 8017:dhq  0.0  0.7   36  30  02:41 0:01   4781
> 8020:af   9.0  3.3  143 137  Oct23 114:41 26600
> 8021:af   5.6  2.0   90  84  Oct23 71:56  26607
> 8022:af   2.4  1.8   80  74  Oct23 30:37  26578
> 8025:daf  0.0  1.0   49  42  02:41 0:04   4842
> 8026:daf  0.0  0.7   34  30  02:41 0:02   4845
> 8027:daf  0.0  0.7   36  30  02:41 0:02   4848
> 8030:pr   0.1  1.5   67  61  Oct23 1:50   16528
> 8031:pr   0.0  0.9   47  40  Oct23 0:17   16532
> 8032:pr   0.0  0.9   44  38  Oct23 0:13   16536
> 8035:dpr  0.2  0.7   36  30  12:30 0:02   22335
> 8036:dpr  0.2  0.7   35  30  12:30 0:02   22338
> 8037:dpr  0.2  0.7   35  30  12:30 0:02   22341
>
>
> (the ones starting with D are in dev mode, will try turning them off
> tonight, I hadn't considered this a spill over issue, but it happened just
> now and turning them off didn't ease it. We had alot less when the problem
> was occurring before, but also a 1 box set-up).
>
>
> Its the 8020-8022 ones that have trouble. It is indeed picking up the page
> cache, and while its happening I can go to one of those pages in question,
> or cat it in SSH with no problems. I've monitored the rails log while it was
> happening and haven't seen any EBADF spilling over. Though
> its conceivable that a spike of hits from Google crawl could cause a
> problem, I could try siege/ab tonight.
>
>
> Not familiar with checking file limits, but this is what I get from
> googling a command:
>
>
> cat /proc/sys/fs/file-nr:
> 2920    0       372880
>
>
> ulimit -a:
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 1024
> max locked memory       (kbytes, -l) 32
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1024
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> stack size              (kbytes, -s) 10240
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 77823
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
>
>
>
>
> Will post the ls -l /proc/{mongrel.pid}/fd/, next time it happens, and see
> how close its getting to 1024. (Though again it seems strange that it could
> be having a max open problem when the other non-cached pages on that pid
> that are definitely opening files work fine).
>
>
> Thanks again,
>
>
> Zach
>
>
>
> On 10/23/07, Dave Cheney <dave at cheney.net> wrote:
> >
> > Are you using a web server in front of your mongrels ? It should be
> > picking up the page cached file before even considering handing the request
> > to a mongrel.
> >
> > Cheers
> >
> >
> > Dave
> >
> > On 24/10/2007, at 11:30 AM, Zachary Powell wrote:
> >
> > close(5)                                = 0
> > close(5)                                = -1 EBADF (Bad file descriptor)
> > read(5, "GET /flower_delivery/florists_in_covehithe_suffolk_england_uk
> > HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, image"..., 16384)
> > = 473
> > close(5)                                = 0
> > close(5)                                = -1 EBADF (Bad file descriptor)
> >
> >
> > the file its trying to get is page cached, and exists/is fine (can even
> > go to url while this is going on).
> >
> >
> >
> >
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20071024/5924e636/attachment.html 


More information about the Mongrel-users mailing list