Bulk Load, Same Table, Sequential COPY INTO, MAPI

Brandon Jackson usbrandon at gmail.com
Sun Jan 13 18:59:02 CET 2013


Dear developers,

In the last two weeks we have made a lot of progress on the MonetDB
Streaming Bulk Loader Step in the Pentaho Data Integration.
Yesterday, I tried to put it into use similar to a larger scale
production use and started getting some errors along the lines of:

"COMMIT: transaction is aborted because of concurency conflicts, will
ROLLBACK instead"

Not sure when these clear up.  In this case after previous errors like
this, I established a new connection and wanted to simply create a
table.

sql>CREATE TABLE "testing"."fact_cen_hist"
(
  "CENSUS_SK" INT
, "FACILITY_SK" INT
, "RESIDENT_SK" INT
, "RESIDENT_LOCATION_SK" INT
, "DISCHARGED_TO" CLOB
, "PAYOR" VARCHAR(20)
, "PAYOR_CATEGORY" VARCHAR(20)
, "CENSUS_STATUS" VARCHAR(50)
, "CENSUS_EVENT" VARCHAR(50)
, "IS_ACTIVE" INT
, "IS_MEDICAID_DAY" INT
, "IS_MEDICARE_DAY" INT
, "IS_HOSPICE_DAY" INT
, "IS_COMMERCIAL_INSURANCE_DAY" INT
, "IS_PRIVATE_PAY_DAY" INT
, "IS_MANAGED_CARE_DAY" INT
, "IS_MEDICARE_SECOND_PAYOR" INT
, "IS_MEDICAID_PENDING_DAY" INT
, "DATE_OF_STAY" INT
, "NEXT_EVENT_DATE" INT
, "DATE_BEFORE_NEXT_EVENT" INT
, "CENSUS_EVENT_DATE" INT
, "IS_MEMORY_CARE_DAY" INT
, "IS_AL_DAY" INT
, "IS_SNF_DAY" INT
, "IS_DISCHARGED" INT
, "IS_RTA" INT
, "ROOM_KEY" VARCHAR(5)
, "IS_BED_HOLD" INT
, "BILLABLE_DAY" INT
, "HEAD_IN_BED_DAY" INT
, "FINANCIAL_DAY" INT
, "IS_ADMISSION" INT
, "IS_READMISSION" INT
, "IS_EXPIRED" INT
, "PAYOR_SK" INT
, "IS_SUBMISSION" INT
, "SPLIT_CENSUS" INT
)
;

operation successful (2.7s)
COMMIT: transaction is aborted because of concurency conflicts, will
ROLLBACK instead
sql>

----

Generally speaking, our bulk loader reads in a chunk of rows into
memory say 10k, 20k or even 100k or 1 Million rows, then it will
generate a COPY INTO, know how many rows are outbound; stream the rows
through a BufferedWriter with all the new lines, and reading from
BufferedReader and waiting for prompts etc.   Then PDI will fetch
another chunk of rows and repeat the process.  I noticed that when
using the MAPI API and trying to append a COMMIT in the output brought
back an error saying that COMMIT is not allowed in auto commit mode,
so it looks like after each big COPY INTO statement sends its complete
pack of rows, with the new line stuff, should immediately try to
COMMIT.

What I want to figure out is why after the first COPY INTO of however
many rows, 10k for example, would I get a message about failure to
commit because of concurrency, ROLLBACK instead.

Once that happens, is MonetDB in a funk?  Do I need to shut it down and restart?

What can I do to provide more helpful information to understand the
nature of the problem?

I read on:
http://www.monetdb.org/Documentation/Cookbooks/SQLrecipies/LoadingBulkData

That I could supply a guess much larger on the first COPY INTO so
MonetDB can prepare itself for a huge influx of rows, despite getting
a smaller batch.   Example in my mind is know that my table has
roughly 6 million rows, so let 6 million be the guess, then have
really accurate calls where COPY INTO is told 20k rows are on the way
in each subsequent call leading up to the 6 million that I guessed
would be necessary.   It worries me that this appears to be brittle.
We might produce 10 million rows in a source table and want to stream
it over.  At current, it looks like COMMIT concurrency errors would
come up as we sequentially do COPY INTO's building up to finally
having the bulk load complete.

Any ideas or feedback is very much appreciated.  I can carefully
provide code, more detail and follow any procedures necessary to help
diagnose the issue.  Sample data is also available.

Thanks ahead of time,

Brandon Jackson
_______________________________________________
developers-list mailing list
developers-list at monetdb.org
http://mail.monetdb.org/mailman/listinfo/developers-list



More information about the developers-list mailing list