Thanks Arjen,

I perfectly understand that MonetDB cannot see / handle the problem. I was mostly interested in whether you guys already had experiences to share about it.

In the meantime, we have observed that the same ETL fails consistently on a file-based LUN *with* thin provisioning, but never fails on a file-based LUN *without* thin provisioning (all space is pre-allocated). We didn't try with a block-based LUN, but I suspect it would work.
The exact reason why thin provisioning makes it fail is still unknown, but perhaps this may help others hitting the same issues.

Roberto

On 24 November 2015 at 12:58, Arjen de Rijke <Arjen.de.Rijke@cwi.nl> wrote:
Hi Roberto,

The obvious explanation is network "issues". The error message would likely be wrong, because the assumption is that monetdb runs on top of a local filesystem. If your write takes longer because of some network issue, it will timeout at some point. But timeout setting on a network connection are different then timeouts on writes to harddiscs. This might lead to different errors, that are not handled by monetdb in the proper way. I am sure we do not test this setup at the moment.

And another thing that might be important is how well memory mapping works with iscsi.

Arjen de Rijke

----- Original Message -----
> From: "Roberto Cornacchia" <roberto.cornacchia@gmail.com>
> To: "Communication channel for MonetDB users" <users-list@monetdb.org>
> Sent: Tuesday, November 24, 2015 12:24:44 PM
> Subject: commit failures with dbfarm on iSCSI LUN

> Hi there,
>
> Do you have any experience with running a dbfarm over iSCSI?
>
> We have tried to use the NAS in our 1Gbit LAN for our largish daily experiments
> with MonetDB. It's a very handy setup and seems more suited than NFS.
>
> It seems to achieve reasonable performance, but we get quite regularly (though
> not predictably for now) commit failures during a rather long ETL.
> We do not get such commit failures when the same db and ETL are run on a local
> disk.
>
> Excerpt from merovingian.log (Jul2015-SP1):
>
> 2015-11-24 11:52:37 ERR trec01[19110]: !ERROR: bm_subcommit: commit failed
> 2015-11-24 11:52:37 ERR trec01[19110]: !ERROR: log_tend: write failed
> 2015-11-24 11:52:37 ERR trec01[19110]: !FATAL: 40000!COMMIT: transation commit
> failed (perhaps your disk is full?) exiting (kernel error: !ERROR: GDKsave:
> error on: name=07/717, ext=theap, mode=1
> 2015-11-24 11:52:37 ERR trec01[19110]: !OS: Input/output error
> 2015-11-24 11:52:37 ERR trec01[19110]: )
>
> The disk is most definitely not full, 1.5 TB available (the same works on a
> local disk with less space available).
> It looks like iSCSI is the problem (which works perfectly except these random
> failures).
>
> Can you think of any reason why iSCSI could could fail where a real local block
> device would not?
>
> iscsi client (where MonetDB runs): libiscsi 1.11.0
> iscsi storage (where the dbfarm is stored): iscsid 2.0-871
>
> The iSCSI LUN is created as regular file with thin provisioning (a file that
> dynamically grows on the NAS). We haven't tried yet with a fixed-size
> block-level LUN (trying this today anyway)
>
> Hoping someone can have an idea already.
>
> Roberto
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list