[Backgroundrb-devel] UTF8 postgres args saving issue

Justin Wood justin.wood at trifectagis.com
Fri Mar 27 00:24:39 EDT 2009

Hi, All

I have an encountered an issue where the args field is not saved correctly
to the database.  I encounter an error like this:

ActiveRecord::StatementInvalid (RuntimeError: ERROR    C22021    M invalid
byte sequence for encoding "UTF8": 0xcb3a    H This error can also happen if
the byte sequence does not match the encoding expected by the server, which
is controlled by "client_encoding".

Here's the SQL that gets send to the database (note that the dump-ed args
are written as a string):

F.\src\backend\utils\mb\wchar.c    L1545    Rreport_invalid_encoding: INSERT
INTO "bdrb_job_queues" ("args", "job_key", "taken", "worker_key",
"worker_method", "priority", "finished_at", "tag", "worker_name", "timeout",
"submitted_at", "finished", "runner_info", "submitter_info", "archived_at",
"scheduled_at", "started_at")
VALUES(E'{:car_idiË:inspection_report_name"First week inspection',
E'66', 0, E'', E'send_warranty_notice', NULL, NULL, NULL,
E'notification_worker', NULL, '2009-03-27 03:53:14.036000', 0, NULL, NULL,
NULL, '2009-04-03 03:53:13.917000', NULL) RETURNING "id"):

This won't happen all the time ... only when the array of bytes from the
dump (represented as a string) have combinations or bytes that can't be
interpreted as UTF8.

This could be fixed in a couple of ways I suppose the most obvious being how
the adapter saves bytea fields and encoding the dump as UTF8 ... but I
wasn't sure if it would un-encode.  So I implemented the fool proof option
of Base64 encoding the data, which means never having to worry about
encoding again (because this is not the first time I've had a character
encoding issue)

Here's the bulletproof hack that I added to my BdrbJobQueue

  #these accessors get around any possible character encoding issues with
the database
  def args=(args)
    write_attribute(:args, Base64.b64encode(args))
  def args

Hope that helps someone.  It will help anyone who has the problem referred
to here
Note, to the best of my knowledge all my other UTF8 settings are correct.

