[Nitro] NASTY bug

On Nov 9, 2007 7:54 PM, George Moschovitis wrote:
> Dear devs,
> I am trying to find a nasty bug in
> lib/raw/context/session/cookie.rb
> this file implements a cookie based session store, ie the session data is
> serialized to/from a cookie.
> for security we store both the serialized session data and an encrypted
> version of it (called diggest).
> when deserializing we check the raw data against the diggest to find out if
> the user has tampered the data.
> this scheme works 90%. But some times (seemingly random) the diggest check
> fails (ie  crypt(data) != diggest)
> for no apparent reason.

I don't use Nitro so I only reply because your context could involve
simultaneous disk and network activity, so your experience might
mirror mine, and it took me months to work out what it was.....
I had file copies _randomly_ fail a cmp/diff checks.
I reproduce some details below.
If I was you I'd jump straight to the kernel boot parameters, place
the disks and network under _heavy_ load and look for lost-ticks in

Apparent symptom:
   - Files copied to the PVFS2 area might fail a diff or cmp check
(see thread below).
   - Typically this occurs when:
       a) large files are copied and
       b) several clients are copying/reading to the PVFS2 area.
   - no errors were reported in /var/log/messages (but you might see
reports about lost ticks or cpu frequency changes)

Real symptom:
  - The disks are being placed under load when the network connection
is also under some load.

Related reports:

How I diagnosed:
 - kernel boot parameters:
    report_lost_ticks apic=debug mce=bootlog showopts

Conjectured Workaround
This allowed me to download, compile and install a new kernel.  These
boot parameters may or may not remedy the inconsistent file copy
 - Add kernel boot parameter (severe and gave me boot up problems)
 - Or, less severe, and worked for me, add:

 - Upgrade to kernel 2.6.21 (or more recent?, i.e. I'm using
No kernel parameters need be passed, e.g. can drop the no_timer_check.

  - 3 sata drives arranged as 3 stripe LVM, formatted with xfs
(openSUSE10.2 defaults)
 - This may be specific to the nVidia ck804 chipset and/or the AMD
64bit processors (?)


> I would like to really ask everyone on this list with some free time to have
> a look at the code and help me track down
> this nasty bug.
> thanks in advance,
> -g.
