In our test environment we frequently dump MonetDB databases using the
mclient -D command and restore using mclient passing in the sql file created
by the dump.
We decided to try out the latest release (5.10.3 , February2009-SP1)
We installed MonetDB from the monetdb-install script using our
customary --enable-sql --nightly==stable settings.
Installation appeared to complete normally.
Our dump was from
MonetDB server v5.4.1 (64-bit), based on kernel v1.22.1 (64-bit oids)
Copyright (c) 1993-2007 CWI, all rights reserved
Visit http://monetdb.cwi.nl/ for further information
Configured for prefix: /monetdb
Libraries:
libpcre: 7.2 2007-06-19 (compiled with 7.2)
openssl: OpenSSL 0.9.8e 23 Feb 2007 (compiled with OpenSSL 0.9.8e 23 Feb
2007)
Compilation: gcc -g -O2 -std=c99 -fgnu89-inline
Linking : ld -IPA -m elf_x86_64
We attempted to load into
MonetDB server v5.10.3 (64-bit), based on kernel v1.28.3 (64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2009 MonetDB B.V., all rights reserved
Visit http://monetdb.cwi.nl/ for further information
Configured for prefix: /monetdb
Libraries:
libpcre: 7.2 2007-06-19 (compiled with 7.2)
openssl: OpenSSL 0.9.8e 23 Feb 2007 (compiled with OpenSSL 0.9.8e 23 Feb
2007)
Compilation: gcc -O2 -std=c99 -fgnu89-inline
Linking : ld -IPA -m elf_x86_64
We create our databases by
> merovingnian
> monetdb create DATABASE
> monetdb start DATABASE
The SQL error we get when attempting the load is
ERROR = !SQLException:sql:missing sep line 0 field 0
!SQLException:importTable:failed to import table
There are no errors in the merovingian log.
We have tried 3 installs on 2 different machines - each install succeeds, but
we have never been able to restore from the dump from 5.4.1
Our tables are 'wide and sparse', that is many columns will contain NULL for
many OIDs
We have no trouble creating databases on 5.10.3 and populating them with SQL
files made up of INSERT statements.
We tried doing a more manual load of test data (INSERT), then doing a dump
(from 5.10.3), and then doing a restore (to 5.10.3) from that dump and that
works.
Hi,
I was selecting 1 record from a table with 100 fields and the server
crashed following message:
MAPI=monetdb@localhost:50000;
ACTION=read_line
QUERY=select * from table1;
ERROR=connection terminated
Press any key to continue...
What does it mean? Thanks. Dariusz.
Hello Francois,
Since the error you get also happens without XRPC, it's very possible
that there is bug in the query execution, so I have CC-ed this e-mail
to the monetdb-users mailing list, sothat more people could look at
this problem.
my answers / further questions are inlined below.
On Mar 27, 2009, at 16:07 , francois guérin wrote:
> Le 26 mars 2009 11:36, Ying Zhang <Y.Zhang(a)cwi.nl> a écrit :
>> Hello Francois,
>>
>> I need some more information to be able to tell what's wrong here.
>>
>> - which version of monetdb/xquery are you using? how did you
>> install it
>> (from CVS, using the supper tarball, windows installer, RPM package)?
>
> the 18th mars nightly built version, with a super tarball
>
>> - on which OS? CPU? RAM?
>
> 64 bit Linux Kernel 2.6.24 Distrib Mandriva
> Cpu Intel(R) Pentium(R) 4 CPU 3.40GHz
> 2Go RAM
>
>> - how did you insert you document? using pf:add-doc() via e.g.
>> mclient or
>> by making an xrpc call?
>
> i used pf:add-doc()
>
>> - have you checked that the insertion of your document has
>> succeeded or not?
>
> it has. the document appears when i call the function pf:documents()
> (but i cant view the whole file, cause it's too big)
It should be sufficient. To be more sure, maybe try something like
count(doc("yourdoc.xml")//*), but this could take quite some time with
your document :)
>
>> - have you inserted the document as a read-only document or updatable
>> document?
>
> as an updatable document
>
>> - what is the exact xrpc request you have sent?
>
> get_collocations_PASSAGE("wikipedia","section")
>
>> - which function are you trying to call (please send me the xquery
>> code)?
>
> declare function foo:get_collocations_PASSAGE($corpus as
> xs:string,$word as xs:string){
> for $grp in doc($corpus)/DOCUMENT/Sentence/G
> for $w in $grp/W/@form
> return if(fn:contains($w,$word))
> then foo:groupe_text_PASSAGE($grp)
> else ()
> };
Would you please also send us the code of the function
foo:groupe_text_PASSAGE()
(and the code of the other user-defined-functions called by this
function, if any)?
>
> (that code worked with a corpus much more little than the one called
> "wikipedia")
What is the size of your smaller document?
Would it be possible for you to investigate what is the (size of the)
largest
document on which your query still run?
>> - can you execute the same function without xrpc?
>
> with an .xq file read by "mclient -lx", i got the same error.
>
> my .xq file:
> for $grp in doc("wikipedia")/DOCUMENT/Sentence/G
> for $w in $grp/W/@form
> return if(fn:contains($w,"section"))
> then $grp
> else ()
So the error seems unrelated with the function
foo:groupe_text_PASSAGE().
Have you executed any updating function before you execute this
function/code?
Since the functions you mentioned in your e-mail are all read-only
functions.
Could you please also try to add your (4.6GB) document as a read-only
document,
and execute the functions with/without XRPC to see what would happen?
If your functions execute without any error on a read-only document,
it's most
likely that the problem is in the update related code (updatable
documents are
stored differently than read-only documents).
To reproduce and debug your problem, we would need your document.
Would it be possible to put the (zipped) document in a place we could
download?
> - after a document has been stored in the database, is the url (the
> one appearing in pf:documents()) useful? or may i change my files name
> in my computer? or even delete them once they've been stored?
You can rename/remove the original documents, once they have been
added to the DB.
As far as I know, the URLs displayed by pf:documents() are merely
adminstrative info.
However, you might want to keep your original document as a backup.
Kind regards,
Jennie
>
>
> Thanks,
>
> francois.
>
>
>> Kind regards,
>>
>> Jennie
>>
>> On Mar 25, 2009, at 11:54 , francois guérin wrote:
>>
>>> Hello,
>>>
>>> (it's francois again)
>>>
>>> I've inserted a huge file (4,8G) in MonetDB xml database - all the
>>> process did well - , but when i send an xrpc request, i always get
>>> that error message:
>>> (and i dont what that means :s )
>>>
>>> <?xml version="1.0" encoding="utf-8"?>
>>> <env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
>>> <env:Body>
>>> <env:Fault>
>>> <env:Code>
>>> <env:Value>env:Sender</env:Value>
>>> </env:Code>
>>> <env:Reason>
>>> <env:Text xml:lang="en">Error occurred during execution.
>>> !ERROR: [remap]: 2 times inserted nil due to errors at tuples 3@0,
>>> 5@0.
>>> !ERROR: [remap]: first error was:
>>> !ERROR: CMDremap: operation failed.
>>> !ERROR: interpret_unpin: [remap] bat=559,stamp=-664 OVERWRITTEN
>>> !ERROR: BBPdecref: 1000000006_rid_nid does not have pointer fixes.
>>> !ERROR: interpret_params: leftfetchjoin(param 2): evaluation error.
>>>
>>> </env:Text>
>>> </env:Reason>
>>> </env:Fault>
>>> </env:Body>
>>> </env:Envelope>
>>>
>>> Where does it come from?
>>> Perhaps, should i make several little files instead of a big one
>>> (even
>>> if the documentation says to prefer one big to several little)...
>>> My app is to store huge xml datas (either in a big one file, or
>>> several little ones) and to have a fast storage and request timing.
>>> What would you advise me?
>>>
>>> Regards,
>>>
>>>
>>> --
>>> francois.
>>
>>
>
>
>
> --
> francois.
On Thu, Mar 26, 2009 at 09:13:57AM -0700, Yue Sheng wrote:
> (A) For thread 4: are you sure that it is waiting for anything,
> here is the iostat output, each observation one second apart:
msync is called. Which is supposed to write your dirty pages
to disk. This needs some investigation but would surely require a
MacOS system.
>
> disk0 cpu load average
>
> KB/t tps MB/s us sy id 1m 5m 15m
>
> 0.00 0 0.00 0 0 100 0.04 0.05 0.00
>
> So there is no cpu nor disk load. As for threads 5 and 6, one
> would think that the argument '--set gdk_nr_threads=1'
> constrains the number of clients that can connect. Even if that
> assumption is incorrect, right now there are no clients
> connected. During the lifetime of the server process though,
> there have been three possible client candidates:
The number of threads is for processing, ie queries will use
up to the specified number of threads to compute the query. Still
each connected client will use its own thread.
>
> - The server itself. We run the server with "bin/mserver5
> --dbfarmUsers/howardshek/Onur-tmp/dbfarm --dbname=evatick_new
> --dbinit="include sql;" --set gdk_nr_threads=1 --daemon=yes ".
> So I am not sure if it qualifies as a client as well.
Yes its a client but not on the sql level, so doesn't take up
the lock we were worried about.
>
> - Command line client ran by: "bin/mclient -d evatick_new -l
> sql". This runs a single sql command: "drop table Japan2008;"
> at the very beginning before the data load, and then I exit the
> client via Ctrl-D.
Should have finised ie would be strange if it still hangs on the lock.
>
> - The R code using JDBC. At the moment, that process is killed
> too, so mserver5 might be waiting for some lock/semaphore/mutex
> to finally switch to the thread relevant to that particular
> client and do the cleanup.
Both the killed clients may be the cause but it would need a mac box
etc
>
> (B) My reply is getting long, and keeps getting rejected by the
> user list....
Sorry about that, but I guess we cannot fix sf.
Niels
Thank you for quick reply.
So, you mean the '-f none' option is not working??
and, when testing monetDB performance with tpc-h like benchmark,
the performance is tested using 'cat file.sql |mclient -lsql -t -f raw
|grep Timer' ??
Thanks so much.
Maria
Re: [MonetDB-users] monetdb output mode -f
From: Martin Kersten <Martin.Kersten@cw...> - 2009-03-25 19:58
Maria wrote:
> Hi, I would like to run query with out printing output in order to
> monitor its performance.
> I tried to use '-f none' in command-line, but it didn't work for
> monetdb5. (I am not using xquery here)
The way to avoid the timing in output rendering then you might consider
replacing your queries by count(*) queries. In all other cases, the data is
sent from server to mclient for rendering. If you use the mode 'raw' this
amounts to a plain dump without further rendering concerns.
The following might be what you want:
cat file.sql |mclient -lsql -t -f raw |grep Timer
>
> > ./mclient -lsql -uxman -ddemo -t -f none 1.sql
>
> ===> then, it prints all the results on the screen and display time.
>
> I want it only display time, not the query results.
> I also tried to redirect using /dev/null. In this case, it didn't
> display time.
>
> Isn't the '-f none' working in monetdb5?? If so, how can I measure the
> query time ?
>
> Thanks in advance.
> Maria.
Now have tried the same bulk load with COPY on the latest (v5.10.0) with
single thread setting.
Failed again!
System:
OS X
MonetDB complied in 64bit
number of files to load: 322
total size of files: 15GB
Max row in file: 3million
On Wed, Mar 18, 2009 at 11:51 AM, Martin Kersten <Martin.Kersten(a)cwi.nl>wrote:
> Yue Sheng wrote:
>
>> I'm not sure how "The (parallel) load used scratch area as well" is
>> related to the question.
>>
> If you look at the code, you will notice that there is a two phase
> loading process involving (possibly) multiple threads
>
>>
>> Sorry if I'm a bit slow.
>>
>> On Wed, Mar 18, 2009 at 11:25 AM, Martin Kersten <Martin.Kersten(a)cwi.nl<mailto:
>> Martin.Kersten(a)cwi.nl>> wrote:
>>
>> Yue Sheng wrote:
>>
>> Sorry, if I wasn't clear on the first question:
>>
>> (1) we ramp up N for the first insert to claim sufficient space.
>> Sure, understand that one.
>>
>> But:
>>
>> The claimed space got "given back" *right after* the first
>> insert. (this is the part I don't understand.)
>>
>> The (parallel) load used scratch area as well
>>
>> Question: how does the second, third, .... inserts get the
>> "benefit" of the ramp up that we did for the first insert?
>>
>> Is this a bit clearer what my question pertains?
>>
>> Thanks.
>>
>>
>> On Wed, Mar 18, 2009 at 10:26 AM, Martin Kersten
>> <Martin.Kersten(a)cwi.nl <mailto:Martin.Kersten@cwi.nl>
>> <mailto:Martin.Kersten@cwi.nl <mailto:Martin.Kersten@cwi.nl>>>
>> wrote:
>>
>> Yue Sheng wrote:
>>
>> Three questions that bothers me are:
>> (1) why we need to ramp up N to total of all line in
>> first insert.
>>
>> to let the kernel claim sufficient space
>>
>> Reason I ask is that right after first insert, the
>> allocation
>> drop right down from, say 100GB to 35GB, and stays
>> roughly there
>> for *all* subsequent inserts. I totally do not understand
>> this.
>> (2) in your opinion, based on this experience, what could
>> be the
>> potential problem here?
>>
>> little to none, as the files are memory mapped, which only
>> may cause
>> io on some systems
>>
>> (3) in your opinion, would the newer version cure the
>> problem?
>>
>> a system can never correctly guess what will come,
>> especially since the source of a COPY command need not be a file
>> but standard input, i.e. a stream.
>>
>>
>> Thanks.
>>
>>
>> On Tue, Mar 17, 2009 at 10:51 PM, Martin Kersten
>> <Martin.Kersten(a)cwi.nl <mailto:Martin.Kersten@cwi.nl>
>> <mailto:Martin.Kersten@cwi.nl <mailto:Martin.Kersten@cwi.nl>>
>> <mailto:Martin.Kersten@cwi.nl
>> <mailto:Martin.Kersten@cwi.nl> <mailto:Martin.Kersten@cwi.nl
>> <mailto:Martin.Kersten@cwi.nl>>>>
>>
>> wrote:
>>
>> Yue Sheng wrote:
>>
>> Martin,
>>
>> It almost worked...
>>
>> This is what I did and what have happened:
>>
>> I have 322 files to insert into data base,
>> totaling 650
>> million rows
>>
>> I divided the file list into two, then for each
>> sub list
>>
>> (a) I insert first file in the list with N set to
>> 650milllion
>> rows, (b) all subsequent files have N set to the
>> number
>> of lines
>> in *that* file
>>
>> once first list done, then
>>
>> (c) I insert first file in the second list with N
>> set to
>> 650million rows,
>> (d) all subsequent files have N set to the number
>> of lines in
>> *that* file
>>
>> Then the same problem happened: it stucked at file
>> number
>> 316.
>>
>>
>> ok. using the 650M enables MonetDB to allocate enough
>> space
>> and does
>> not have to fall back on guessing. Guessing is
>> painful, because
>> when a file of N records has been created and it needs
>> more,
>> it makes
>> a file of size 1.3xN. This leads to memory fragmentation.
>>
>> in your case i would have been a little mode spacious
>> and used
>> 700M as a start, because miscalculation of 1 gives a
>> lot of pain.
>> Such advice is only needed in (a)
>>
>>
>> Note: This is farther then previous tries, which all
>> stopped in
>> the region of file number 280 +/- a few.
>>
>> My observation:
>> (i) at (a), the VSIZE went up to around 46GB, then
>> after
>> first
>> insert, it drops to around 36GB
>>
>> ok fits
>>
>> (ii) at (c), the VSIZE went up to around 130GB, then
>> after first
>> insert, it drops to around 45GB
>>
>> you tell the system to extend existing BATs prepare for
>> another 650 M,
>> which means it allocates 2*36 G, plus room for the old
>> one
>> gives 108GB
>> then during processings some temporary BATs may be
>> needed,e.g. to check
>> integrity constraints after each file,.
>> Then it runs out of swapspace.
>>
>> (iii) the "Free Memory", as reported by Activity
>> Monitor,
>> just
>> before it failed at file number 316, dipped to as
>> low as
>> 7.5MB!
>>
>> yes, you are running out of swapspace on your system.
>> This should not have happened, because the system uses
>> mmapped files
>> and may be an issue with the MacOS or relate to a
>> problem we
>> fixed
>> recently
>>
>>
>>
>> My question:
>> (1) why we need to ramp N up to total number of
>> lines (it
>> takes
>> along time to do that), then only have it drop down
>> to
>> 30GB-40GB
>> right after
>>
>> this might indicate that on MacOS, just like Windows,
>> mmaped
>> files
>> need to be written to disk. With a disk bandwidth of
>> 50MB/sec it
>> still takes several minutes
>>
>> the first insert and stay roughly there? Does it
>> mean we're
>> giving back all the pre-allocation space back to
>> the OS? Then
>> should we set N always to total number of lines?
>> If so,
>> it would
>> take much much longer to process all the files...
>> (2) How come RSIZE never goes above 4GB? (3) Does
>> sql log
>> file
>> size have some limit, that we need to tweak?
>>
>> no limit
>>
>> (4) Has anyone successfully implemented the 64bit
>> version of
>> MondeDB and successfully inserted more than
>> 1billion rows?
>>
>> you platform may be the first, but Amherst has worked
>> with
>> Macs for
>> years
>>
>> (5) when you say you "...The VSIZE of 44G is not too
>> problematic, i am looking at queries letting it
>> tumble
>> between
>> 20-80 GB....," What does it mean? Mine went up to
>> as high as
>> 135GB...
>>
>> explained above.
>>
>> regards, Martin
>>
>>
>> Thanks, as always.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
Hi, I would like to run query with out printing output in order to monitor
its performance.
I tried to use '-f none' in command-line, but it didn't work for monetdb5.
(I am not using xquery here)
> ./mclient -lsql -uxman -ddemo -t -f none 1.sql
===> then, it prints all the results on the screen and display time.
I want it only display time, not the query results.
I also tried to redirect using /dev/null. In this case, it didn't display
time.
Isn't the '-f none' working in monetdb5?? If so, how can I measure the query
time ?
Thanks in advance.
Maria.
Hi,
I've got this messages while executing 'copy... into...'
runtime error R6031
Attempt to initialize CRT more than once
What does it mean? Thanks. Dariusz.
two tables: table_1 and table_2 both have exactly the same schema. table_1
has 2.15MM rows of data, table_2 has 3.35MM row of data. No indexing
created a new table: table_combined using same schema as table_1
insert into table_combined select * from table_1;
(30 minutes later)
insert into table_combined select * from table_2;
(40 minutes later)
select count(*) * 1 from table_combined;
this give ONLY 2.15MM rows with same rows as table_1
What happened to table_2?!?
Yes, I've check dfarm size. It equals 2*(table_1 + table_2 )
So it's there but "just not seeing it"