Inserting large blobs in MySQL
on Monday 12th January, 2009 Gabe speculated thusly…Last week I setup MySQL replication for a database with two slaves. I wanted to see how resilient it was, and one of the tests I wanted to perform was to create a record on the master and then interrupt the replication of the same record on the slave. This would allow me to see whether MySQL could recover from such an error.
So I thought the easiest way of doing this would be to to insert a large 700mb
file in to MySQL – to give me enough time to interrupt a slave before
replication had completed. Thus being able to test MySQL’s resilience to
loss of connectivity during UPDATE/INSERT queries.
I have never used BLOB columns before so my quest took my on a journey during which I learned a lot about these blobs. Now, even though a MySQL long blob can hold 4GB of data I have encountered
problem after problem dealing with it:
PHP scripts are limited to 32MB memory maximum, this can be upped, but
is not scalable and not a good solution. It means that you cannot simply
read the data of file and perform an INSERT since the file data is
stored in PHP’s memory first. This meant the development of a PHP script
that could read chunks (just a meg or so) from a file and perform UPDATE
queries along with MySQL CONCAT function to effectively append each
chunk.
CONCAT, as I eventually found out, returns NULL if any of it’s arguments
are NULL. In my case when attempting to UPDATE a row with additional
chunks the resulting blob size was always zero. This was because the
very first time CONCAT is used the blob field is empty (NULL), therefore
CONCAT always returned NULL – and so, the data in the blob column never
grew. The query looked like this:
UPDATE `$table` SET `data` = CONCAT(data, '{mysqli_real_escape_string($buffer)}') WHERE `id`=$id";
I then combined CONCAT with COALESCE which will return an alternative
value that you specify in the eventuality the first argument is NULL. By
combining these two I could overcome the above problem.
Resulting query structure:
UPDATE `$table` SET `data` = COALESCE(CONCAT(data, '{mysqli_real_escape_string($buffer)}'), '') WHERE `id`=$id";
Great, now we’re finally adding our chunks on. Then my computer ran out
of disc space – the reason: 7 gigs of MySQL replication logs. Took a
while to find that out. Fixed.
Next problem, although I was certain (via in-depth PHP debugging) that
PHP was correctly reading chunks from the file in a consecutive order,
of the correct size, and the SQL query was being properly executed
without any sly errors the blob column would never end up growing above
12MB (according the PHP MyAdmin). Through even more intensive debugging
I found that after concatenating each chunk the blob would grow by the
correct amount up to 16MB, then it would zero and start again. The
result of pushing a 700MB ISO in to it would always be less 16MB.
Working from a hunch I upped the max_packet_size in my.conf from 16MB to
36MB, the result was as above but this time the blob would grow to 32MB
before zeroing itself.
I then found this MySQL bug report:
http://bugs.mysql.com/bug.php?id=22853
It is considered a bug simply because by upping max_packet_size to
something like 4GB or anything substantial leaves the MySQL server open
to network DoS attacks. There is no work around. A fix is scheduled for
version 6.
Anyway, since we don’t care about security I maxed out the packet_size,
and my chunking script worked, but very quickly I noticed a huge amount of
slowdown once the blob had grown to just a few tens of megs. In fact it
took about 20 minutes to get to 200MB. Without supporting my theory with
any evidence I reckon the CONCAT function reads all the data out of the
database in to memory, does it’s stuff and then stuffs it back in to the
database. The result is an increasingly slow query.
It would seem that although MySQL supports blobs of about 4GB that you
can never get that size unless you:
1. Change the max_packet_size to a stupidly high number, leaving your
server open to DoS attacks so then you can just do a single INSERT. Using CONCAT
is out since it would take forever.
2. Change the memory limit of PHP scripts to be greater than 4GB.
3. Have absolutely bags of RAM since three copies of the file are stored
in RAM (certainly two copies). Since you cannot chunk the file and
perform CONCATS PHP will read the entire file in to it’s memory. This
will all then be buffered by the MySQL driver. Then this is passed to
the database. Therefore if you wish to insert a 4 GB file you will
require >12GB of RAM.
One work around is to chunk the file as before but each chunk is
represented by a single row. These can then be linked together by giving
each row a master file ID, so that all necessary rows can be retrieved.
For my purposes this doesn’t help, I needed queries that took a while,
inserting rows like this will take but a fraction of a second each.
Posted in Development, Information, MySQL, PHP, Programming, Server