Commit when error

josemarquez · September 26, 2006, 6:39pm

Ok, this is my problem:
Im loading around 900,000 records from a file to my database.
So, i commit after it has processed 10,000 with something like this:

if (count % 10000) == 0
@dbh.commit
end

(Where the variable count is the number of lines loaded.)

And now i have this code that process each line:

begin
executeQuery(query)
rescue
$stderr.puts "err: " + $!.to_s
@dbh.commit
end

Here is the problem:
For example, if the programm is loading the 12,000 line and this
executeQuery method produces an error, the lines 10,000 to 12,000 DONT
load to the database. Because i commit after 10,000 lines, so those
2,000 lines havent got commit. So, what i want is that when the
executeQuery produces an error, ok, dont load that line, but i want to
commit the rest of lines that were loaded before.
I putted the “@dbh.commit” in the rescue block, but doesnt work. What
can i do? Where should i put a @dbh.commit in my code?

Hope i explained my case clear,

Thanks in advance

josemarquez · September 27, 2006, 3:21pm

Paul L. wrote:

Jose Montero wrote:

And now i have this code that process each line:
executeQuery method produces an error, the lines 10,000 to 12,000 DONT
load to the database. Because i commit after 10,000 lines, so those
2,000 lines havent got commit. So, what i want is that when the
executeQuery produces an error, ok, dont load that line, but i want to
commit the rest of lines that were loaded before.
I putted the “@dbh.commit” in the rescue block, but doesnt work. What
can i do? Where should i put a @dbh.commit in my code?

Why not solve the problem instead of trying to work around it? Why not
commit on each read record, then commit on groups of 100 read records,
gradually increasing the commit interval until you come up with some
clues
to the problem?

Obviously, once an error condition comes up, trying to commit then is
too
late – the prevailing error condition prevents the commit from being
carried out.

Well the thing here is that if i commit each 5 records, and i get an
error loading the 4th record, those 4 records wont commit, wont load. I
think the thing here is how to recover from an error, how to commit the
records that were already loaded to the database…is it possible?

Thanks

josemarquez · September 26, 2006, 7:55pm

Jose Montero wrote:

And now i have this code that process each line:
executeQuery method produces an error, the lines 10,000 to 12,000 DONT
load to the database. Because i commit after 10,000 lines, so those
2,000 lines havent got commit. So, what i want is that when the
executeQuery produces an error, ok, dont load that line, but i want to
commit the rest of lines that were loaded before.
I putted the “@dbh.commit” in the rescue block, but doesnt work. What
can i do? Where should i put a @dbh.commit in my code?

Why not solve the problem instead of trying to work around it? Why not
commit on each read record, then commit on groups of 100 read records,
gradually increasing the commit interval until you come up with some
clues
to the problem?

Obviously, once an error condition comes up, trying to commit then is
too
late – the prevailing error condition prevents the commit from being
carried out.

josemarquez · September 27, 2006, 5:51pm

Jose Montero wrote:

/ …

Obviously, once an error condition comes up, trying to commit then is
too
late – the prevailing error condition prevents the commit from being
carried out.

Well the thing here is that if i commit each 5 records, and i get an
error loading the 4th record, those 4 records wont commit, wont load. I
think the thing here is how to recover from an error, how to commit the
records that were already loaded to the database…is it possible?

Apparently not. I want to emphasize this is just at first glance,
without
running any tests. To me it would seem better to try to figure out why
the
error is coming up. You obviously have a consistent though random error
that isn’t going away, and that is undermining your database processing.
I
would want to know why.

It looks as though the error is on the database side (not the file
reading
side), and since the database is in an error state after the failed
query,
it cannot commit either. I think you need to analyze the error itself,
rather than try to get past it.

josemarquez · September 27, 2006, 7:15pm

Paul L. wrote:

Jose Montero wrote:

/ …

Obviously, once an error condition comes up, trying to commit then is
too
late – the prevailing error condition prevents the commit from being
carried out.

Well the thing here is that if i commit each 5 records, and i get an
error loading the 4th record, those 4 records wont commit, wont load. I
think the thing here is how to recover from an error, how to commit the
records that were already loaded to the database…is it possible?

Apparently not. I want to emphasize this is just at first glance,
without
running any tests. To me it would seem better to try to figure out why
the
error is coming up. You obviously have a consistent though random error
that isn’t going away, and that is undermining your database processing.
I
would want to know why.

It looks as though the error is on the database side (not the file
reading
side), and since the database is in an error state after the failed
query,
it cannot commit either. I think you need to analyze the error itself,
rather than try to get past it.

Well yes, i have detected when an error comes up. Sometimes its cause
the line contains malformed UTF-8 caracter, in other cases the length is
longer than the size of the field in the database,etc…And im
controlling all this cases, but what i want to do is that if appear an
unknown error, ok, dont process that line, continue and commit the rest
of already loaded lines. And what im trying to figure out is exactly
what u say: what is the state error the database is in, after the failed
query? How could i know this?

Thanks

josemarquez · September 27, 2006, 9:35pm

Jose Montero wrote:

/ …

Well yes, i have detected when an error comes up. Sometimes its cause
the line contains malformed UTF-8 caracter, in other cases the length is
longer than the size of the field in the database,etc…And im
controlling all this cases,

Actually, you are controlling all the cases you are controlling, and the
error comes up because of those cases you aren’t controlling.

but what i want to do is that if appear an
unknown error, ok, dont process that line, continue and commit the rest
of already loaded lines. And what im trying to figure out is exactly
what u say: what is the state error the database is in, after the failed
query? How could i know this?

IMHO, you need to create a bulletproof syntax check that filters out bad
records before they are submitted to the database. The proof will be the
periodic errors you are seeing will stop.

One approach is to create a small test query, using a table that is
identical in design to the main table, but with a single record. Submit
the
records to this table, and if they pass, then submit them to the main
database. This way, you have detected any errors before they are
committed
to the main database. If the small-table test fails, you can still
commit
the prior records in the mainstream activity and then deal with the
error
at your leisure.

The problem with this approach is it takes twice as long to read records
(two queries per record). But any robust error-detection algorithm might
also do that.

josemarquez · September 28, 2006, 11:09pm

On Thu, Sep 28, 2006 at 02:15:28AM +0900, Jose Montero wrote:

Well yes, i have detected when an error comes up. Sometimes its cause
the line contains malformed UTF-8 caracter, in other cases the length is
longer than the size of the field in the database,etc…And im
controlling all this cases, but what i want to do is that if appear an
unknown error, ok, dont process that line, continue and commit the rest
of already loaded lines. And what im trying to figure out is exactly
what u say: what is the state error the database is in, after the failed
query? How could i know this?
If you every line but the line with the error to be commited, then well
you need to commit each and every line. The purpose of a transaction is
to do things atomically, not necessarily to do things in a batch. If one
line is a an atomic transaction for you, then treat it as one. Using
transactions to chunk your interactions with the DB doesn’t make sense.

josemarquez · September 30, 2006, 12:35am

Jose Montero wrote:

Well the thing here is that if i commit each 5 records, and i get an
would want to know why.
longer than the size of the field in the database,etc…And im
controlling all this cases, but what i want to do is that if appear an
unknown error, ok, dont process that line, continue and commit the rest
of already loaded lines. And what im trying to figure out is exactly
what u say: what is the state error the database is in, after the failed
query? How could i know this?

Thanks

Just to add something perhaps irrelevant.

As far as i take it the idea with transaction and commit is to commit
every time database is in a consistent state. In this case that would be
after every line, correct?

I’d assume the main objection would be performance, but isn’t that
another issue entirely? Well, perhaps not. But if you do want to
bulk-load database without the chance of rollback, why not do
bulk-load/bulk insert. Thats the idea of it. If i remember correctly you
can do that (have a weak memory about it since some hack a long time
ago). I haven’t figured out what db you are using, but look for it. I
would be to use the tools transaction/commit/rollback the intended way.

Bulk loading inside transaction is really intended to be able to
rollback it ALL if it failed.

Just my 2 cents (and even them a loan).
/Thomas