[Mongrel] random cpu spikes, EBADF errors

Zachary Powell zach at plugthegap.co.uk
Wed Oct 24 08:09:03 EDT 2007


Hi Guys,

I'm using latest Litespeed 3.2.4 (just upgraded), Mongrel 1.0.1, and Ruby
1.8.6, running Red Hat Enterprise Linux ES 4. We have

Web/Mysql/Mail server:

# RAID Configuration: RAID 1 (73GBx2)

# HP Memory: 2 GB HP RAM

# HP DL385 G2 Processor: Dual Socket Dual Core Opteron 2214 2.2 GHz


(runs between 0.10 - 0.20, mostly MySql, but spikes up when issue occurs
with litespeed taking ~30% cpu)


and App Server:


# RAID Configuration: RAID 1 (146GBx2)

# HP Memory: 4 GB HP RAM

# HP DL385 G2 Processor: Dual Socket Dual Core Opteron 2214 2.2 GHz


(usually 0.20 - 0.60 with legitimate spikes up to 1.8 for backend processes.
Spikes up to 2-4 when it happens, depending on how many mongrels get the
problem (sometimes 2))


And these are the  mongrels running:


MONGREL   CPU  MEM  VIR RES  DATE  TIME   PID
8010:hq   3.2  3.4  145 138  Oct23 43:13  20409
8011:hq   0.6  3.0  132 125  Oct23 8:15   20412
8012:hq   0.1  1.8   81  74  Oct23 1:28   20415
8015:dhq  0.0  1.0   50  44  02:41 0:08   4775
8016:dhq  0.0  0.7   34  30  02:41 0:01   4778
8017:dhq  0.0  0.7   36  30  02:41 0:01   4781
8020:af   9.0  3.3  143 137  Oct23 114:41 26600
8021:af   5.6  2.0   90  84  Oct23 71:56  26607
8022:af   2.4  1.8   80  74  Oct23 30:37  26578
8025:daf  0.0  1.0   49  42  02:41 0:04   4842
8026:daf  0.0  0.7   34  30  02:41 0:02   4845
8027:daf  0.0  0.7   36  30  02:41 0:02   4848
8030:pr   0.1  1.5   67  61  Oct23 1:50   16528
8031:pr   0.0  0.9   47  40  Oct23 0:17   16532
8032:pr   0.0  0.9   44  38  Oct23 0:13   16536
8035:dpr  0.2  0.7   36  30  12:30 0:02   22335
8036:dpr  0.2  0.7   35  30  12:30 0:02   22338
8037:dpr  0.2  0.7   35  30  12:30 0:02   22341


(the ones starting with D are in dev mode, will try turning them off
tonight, I hadn't considered this a spill over issue, but it happened just
now and turning them off didn't ease it. We had alot less when the problem
was occurring before, but also a 1 box set-up).

Its the 8020-8022 ones that have trouble. It is indeed picking up the page
cache, and while its happening I can go to one of those pages in question,
or cat it in SSH with no problems. I've monitored the rails log while it was
happening and haven't seen any EBADF spilling over. Though
its conceivable that a spike of hits from Google crawl could cause a
problem, I could try siege/ab tonight.

Not familiar with checking file limits, but this is what I get from googling
a command:

cat /proc/sys/fs/file-nr:
2920    0       372880

ulimit -a:
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 1024
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 77823
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited



Will post the ls -l /proc/{mongrel.pid}/fd/, next time it happens, and see
how close its getting to 1024. (Though again it seems strange that it could
be having a max open problem when the other non-cached pages on that pid
that are definitely opening files work fine).

Thanks again,

Zach

On 10/23/07, Dave Cheney <dave at cheney.net> wrote:
>
> Are you using a web server in front of your mongrels ? It should be
> picking up the page cached file before even considering handing the request
> to a mongrel.
>
> Cheers
>
>
> Dave
>
> On 24/10/2007, at 11:30 AM, Zachary Powell wrote:
>
> close(5)                                = 0
> close(5)                                = -1 EBADF (Bad file descriptor)
> read(5, "GET /flower_delivery/florists_in_covehithe_suffolk_england_uk
> HTTP/1.1\nAccept: image/gif, image/x-xbitmap, image/jpeg, image"..., 16384)
> = 473
> close(5)                                = 0
> close(5)                                = -1 EBADF (Bad file descriptor)
>
>
> the file its trying to get is page cached, and exists/is fine (can even go
> to url while this is going on).
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rubyforge.org/pipermail/mongrel-users/attachments/20071024/dc52362d/attachment-0001.html 


More information about the Mongrel-users mailing list