Hello,
I am interested in evaluating MonetDB/XQuery. I have some XML collections
for a project I work on that are of 1-2 million individual XML documents
for which I use XQuery to access for OLAP style reporting operations. This
is a Java-based project, and while working on this project I developed a
thin XQuery-centric API so that I could evaluate/use many different XQuery
implementations, which my company released as an open source project
called Xaj (http://sourceforge.net/projects/xaj). Currently I am using a
Berkeley DB XML implementation, but am interested in exploring other
options.
I'd like to implement a MonetDB version of this API now, but I am having
some difficulty finding out how to effectively access MonetDB through
Java. So far I was able to use the MonetDB JDBC driver to construct XQuery
statements and get XML results, and also I was able to construct MonetDB
pf:add-doc() statements to add documents, but I am wondering if this is
the best approach (and if so, how to add documents to a remote MonetDB
server using the pf:add-doc() mechanism).
The API I am trying to implement you can view here, to give you an idea
what I'm trying to do:
http://xaj.svn.sourceforge.net/viewvc/xaj/xaj/src/net/sf/xaj/XmlDb.java?rev…
It was somewhat modeled after the defunct XML:DB API, but focused just on
add/store/XQuery operations. Any help/advise would be greatly appreciated!
-- m@
I have tables with around 12 million rows and large primary keys. I
find the performance for various queries to be rather strange.
* Why is Q2 (see below) so much slower than Q1? It is actually faster
for me to use Q1 and do the rest of the matching myself.
* Why is Q3 so immensely much faster than Q1? I understand that the
first part of the primary key is selective in Q3. Do I understand right
if Q1 scans until "type" matches, then scans until "name" matches, then
scans until "value" matches? No indexes are put on disk, right? Then
the matching for Q3 should be scan until "value" matches, which
shouldn't be more than at most three times faster, right (if three
columns are read instead of one for Q1)? I do Q1 twice to look for cash
effects, but none are seen... Could some columns from the table "value"
be cached and kept, without giving room for those from "type_name_value"?
* Q4 has the same performance as Q1, as expected.
* Why isn't Q5 faster than Q6? 82% of the rows have type=3. Fewer
should be scanned to find that "!!foo" doesn't exist than for "~~foo",
as "!" has a low ASCII value, while "~" is high. (See COUNT queries at
the bottom.)
Klem fra Nils
Q1: SELECT * FROM type_name_value WHERE type = 3 AND name = 'author' AND
value = 'Jurgen Annevelink'
Q2: SELECT * FROM type_name_value WHERE type = 3 AND name = 'author' AND
value = 'Jurgen Annevelink' AND doc = 1 AND nenc = '1.1.1.1' AND pos = 4
Q3: SELECT * FROM value WHERE value = 'Jurgen Annevelink' AND type = 3
AND name = 'author'
Q4: SELECT * FROM type_name_value WHERE type = 3 AND name = 'author' AND
value = '__bar'
Q5: SELECT * FROM type_name_value WHERE type = 3 AND name = '!!foo' AND
value = '__bar'
Q6: SELECT * FROM type_name_value WHERE type = 3 AND name = '~~foo' AND
value = '__bar'
Q7: SELECT * FROM type_name_value WHERE type = 3000 AND name = '__foo'
AND value = '__bar'
See runs below.
Number of rows
SELECT COUNT(type) FROM type_name_value
% nilsgri.type_name_value # table_name
% count_no_nil_type # name
% int # type
% 8 # length
[ 12165109 ]
Query time: 0.066192
SELECT COUNT(type) FROM value
% nilsgri.value # table_name
% count_no_nil_type # name
% int # type
% 8 # length
[ 12165109 ]
Query time: 0.058111
The PRIMARY KEY of type_name_value is (type,name,value,doc,nenc,pos)
The PRIMARY KEY of value is (value,doc,nenc,pos)
Q1: Search 3 first, match 3 first.
SELECT * FROM type_name_value WHERE type = 3 AND name = 'author' AND
value = 'Jurgen Annevelink'
% nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value # table_name
% type, name, value, doc, nenc, pos, pathid, complete # name
% int, varchar, clob, int, varchar, int, int, boolean # type
% 1, 6, 17, 1, 12, 8, 3, 5 # length
[ 3, "author", "Jurgen Annevelink", 1, "1.1.1.1", 4, 5, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.23579.2.1", 724888, 56, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.2487.3.1", 81996, 5, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.42.2.1", 1071, 5, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.533986.2.1", 17294725, 56, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.576139.3.1", 18745432, 56, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.944085.1.1", 30104242, 107, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.99173.2.1", 3191470, 56, true ]
Query time: 1.643438
Q1: Repeat to check for disk cache effects.
SELECT * FROM type_name_value WHERE type = 3 AND name = 'author' AND
value = 'Jurgen Annevelink'
% nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value # table_name
% type, name, value, doc, nenc, pos, pathid, complete # name
% int, varchar, clob, int, varchar, int, int, boolean # type
% 1, 6, 17, 1, 12, 8, 3, 5 # length
[ 3, "author", "Jurgen Annevelink", 1, "1.1.1.1", 4, 5, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.23579.2.1", 724888, 56, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.2487.3.1", 81996, 5, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.42.2.1", 1071, 5, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.533986.2.1", 17294725, 56, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.576139.3.1", 18745432, 56, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.944085.1.1", 30104242, 107, true ]
[ 3, "author", "Jurgen Annevelink", 1, "1.99173.2.1", 3191470, 56, true ]
Query time: 1.615900
Q2: Search entire, match entire.
SELECT * FROM type_name_value WHERE type = 3 AND name = 'author' AND
value = 'Jurgen Annevelink' AND doc = 1 AND nenc = '1.1.1.1' AND pos = 4
% nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value # table_name
% type, name, value, doc, nenc, pos, pathid, complete # name
% int, varchar, clob, int, varchar, int, int, boolean # type
% 1, 6, 17, 1, 7, 1, 1, 5 # length
[ 3, "author", "Jurgen Annevelink", 1, "1.1.1.1", 4, 5, true ]
Query time: 11.073827
Q3: Use table value instead.
SELECT * FROM value WHERE value = 'Jurgen Annevelink' AND type = 3 AND
name = 'author'
% nilsgri.value, nilsgri.value, nilsgri.value, nilsgri.value,
nilsgri.value, nilsgri.value, nilsgri.value, nilsgri.value # table_name
% value, doc, nenc, pos, type, name, pathid, complete # name
% clob, int, varchar, int, int, varchar, int, boolean # type
% 17, 1, 12, 8, 1, 6, 3, 5 # length
[ "Jurgen Annevelink", 1, "1.1.1.1", 4, 3, "author", 5, true ]
[ "Jurgen Annevelink", 1, "1.23579.2.1", 724888, 3, "author", 56, true ]
[ "Jurgen Annevelink", 1, "1.2487.3.1", 81996, 3, "author", 5, true ]
[ "Jurgen Annevelink", 1, "1.42.2.1", 1071, 3, "author", 5, true ]
[ "Jurgen Annevelink", 1, "1.533986.2.1", 17294725, 3, "author", 56, true ]
[ "Jurgen Annevelink", 1, "1.576139.3.1", 18745432, 3, "author", 56, true ]
[ "Jurgen Annevelink", 1, "1.944085.1.1", 30104242, 3, "author", 107, true ]
[ "Jurgen Annevelink", 1, "1.99173.2.1", 3191470, 3, "author", 56, true ]
Query time: 0.000477
Q4: Search 3 first, match 2 first.
SELECT * FROM type_name_value WHERE type = 3 AND name = 'author' AND
value = '__bar'
% nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value # table_name
% type, name, value, doc, nenc, pos, pathid, complete # name
% int, varchar, clob, int, varchar, int, int, boolean # type
% 1, 0, 0, 1, 0, 1, 1, 5 # length
Query time: 1.617259
Q5: Search 3 first, match 1 first. '!' early in alphabet.
SELECT * FROM type_name_value WHERE type = 3 AND name = '!!foo' AND
value = '__bar'
% nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value # table_name
% type, name, value, doc, nenc, pos, pathid, complete # name
% int, varchar, clob, int, varchar, int, int, boolean # type
% 1, 0, 0, 1, 0, 1, 1, 5 # length
Query time: 0.530727
Q6: Search 3 first, match 1 first. '~' late in alphabet.
SELECT * FROM type_name_value WHERE type = 3 AND name = '~~foo' AND
value = '__bar'
% nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value # table_name
% type, name, value, doc, nenc, pos, pathid, complete # name
% int, varchar, clob, int, varchar, int, int, boolean # type
% 1, 0, 0, 1, 0, 1, 1, 5 # length
Query time: 0.534592
Q7: Search 3 first, match 0 first.
SELECT * FROM type_name_value WHERE type = 3000 AND name = '__foo' AND
value = '__bar'
% nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value,
nilsgri.type_name_value, nilsgri.type_name_value # table_name
% type, name, value, doc, nenc, pos, pathid, complete # name
% int, varchar, clob, int, varchar, int, int, boolean # type
% 1, 0, 0, 1, 0, 1, 1, 5 # length
Query time: 0.000407
Number of rows with type and name mathcing.
SELECT COUNT(type) FROM type_name_value WHERE type = 3 AND name = 'author'
% nilsgri.type_name_value # table_name
% count_no_nil_type # name
% int # type
% 7 # length
[ 2730959 ]
Query time: 0.683863
Number of rows with type mathcing.
SELECT COUNT(type) FROM type_name_value WHERE type = 3
% nilsgri.type_name_value # table_name
% count_no_nil_type # name
% int # type
% 8 # length
[ 10020569 ]
Query time: 0.032573
SELECT COUNT(type) FROM type_name_value WHERE type = 3 AND name < '!!foo'
% nilsgri.type_name_value # table_name
% count_no_nil_type # name
% int # type
% 1 # length
[ 0 ]
Query time: 0.161186
SELECT COUNT(type) FROM type_name_value WHERE type = 3 AND name < '~~foo'
% nilsgri.type_name_value # table_name
% count_no_nil_type # name
% int # type
% 8 # length
[ 10020569 ]
Query time: 0.341732
Number of rows with type mathcing.
real 0m18.372s
user 0m0.005s
sys 0m0.002s
I am using the latest version of monetDB from the web-site(the super source
tarball).
My machine is a X86-64 machine with a red-hat installed on it. When I insert
data into monetDB,
it always requires memory from the system. Then it will be crashed with
running out of memory.
Is there a way to set the limit for memory used by monetDB ? I've read the
docs on the web site
and can't get what I need from that.
Thank you.
--
------------------------------------------------------------
Kang Zhang
Computer Science Dept.
Shanghai Jiao Tong University
I am having some problems with the functions mapi_param*().
1. It seems mapi_param() really wants a char* cast to a char**. (See the
inner workings of mapi_param() vs. mapi_param_string().)
2. If mapi_param(), mapi_param_string() and mapi_param_type() get a
char*, the value used in the query does not change as the char* is
updated.
Using
mapi_bind_var( , , MAPI_INT, int*) and
mapi_bind_var( , , MAPI_VARCHAR, char**)
works as expected, while
mapi_param_type( , , MAPI_INT, MAPI_INT, int*) and
mapi_param_type( , , MAPI_VARCHAR, MAPI_VARCHAR, char**)
do not.
Is there something wrong with interface, or am I just using it wrong?
Here is how I have tried using the functions:
#include <cassert>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <Mapi.h>
using namespace std;
Mapi dbh;
void check(MapiHdl hdl, int ret = MOK, bool fatal = true) {
if (ret != MOK || hdl == NULL || mapi_error(dbh)) {
if (hdl) {
mapi_explain_result(hdl,stderr);
} else {
mapi_explain(dbh,stderr);
}
if (fatal) assert(0);//exit(-1);
}
}
void paramInt(MapiHdl hdl, int fldnr, int* val) {
int ret = mapi_param_type(hdl, fldnr, MAPI_INT, MAPI_INT, val);
check(hdl, ret);
}
void paramStr(MapiHdl hdl, int fldnr, char** val) {
// int ret = mapi_param(hdl, fldnr, val);
/* Case 1:
* QUERY = INSERT INTO t VALUES(1, p`)
* ERROR = !SELECT: identifier 'p`' unknown
* Cast to (void*) in call to mapi_param_type() in mapi_param().
*/
int ret = mapi_param(hdl, fldnr, (char**)*val);
/* Case 2:
* Works, sort of.
* Seems it wants a (char*) really.
* But 'foo' is inserted the second time instead of bar.
*/
// int ret = mapi_param_string(hdl, fldnr, MAPI_VARCHAR, *val, NULL);
/* 'foo' twice.
*/
// int ret = mapi_param_string(hdl, fldnr, MAPI_VARCHAR, (char*)val,
NULL);
/* As Case 1.
*/
// int ret = mapi_param_type(hdl, fldnr, MAPI_VARCHAR, MAPI_VARCHAR, val);
/* As Case 1.
*/
// int ret = mapi_param_type(hdl, fldnr, MAPI_VARCHAR, MAPI_VARCHAR,
*val);
/* 'foo' twice.
*/
check(hdl, ret);
}
int main() {
dbh = mapi_connect("localhost", 50000, "nilsgri", "nilsgri",
"sql", "proto");
MapiHdl hdl = mapi_query(dbh, "DROP TABLE t");
check(hdl, MOK, false);
mapi_close_handle(hdl);
hdl = mapi_query(dbh, "CREATE TABLE t (a INT, b VARCHAR(255))");
check(hdl);
mapi_close_handle(hdl);
hdl = mapi_prepare(dbh, "INSERT INTO t VALUES(?, ?)");
check(hdl);
int a = 1;
paramInt(hdl, 0, &a);
char* b1 = strdup("'foo'");
char* b = b1;
paramStr(hdl, 1, &b);
int ret = mapi_execute(hdl);
check(hdl, ret);
a = 2;
char* b2 = strdup("'bar'");
b = b2;
// paramStr(hdl, 1, &b); // I would like not having to do this.
ret = mapi_execute(hdl);
check(hdl, ret);
mapi_close_handle(hdl);
hdl = mapi_quick_query(dbh, "SELECT * FROM t", stderr);
check(hdl, ret);
mapi_close_handle(hdl);
mapi_disconnect(dbh);
mapi_destroy(dbh);
free(b1);
free(b2);
}
Hi all,
I am using MD4 with XQuery + PF/Tijah module.
I have a problem that my server sometimes/frequently crashes. I could
not detect a trigger, as it occurs randomly and at any time.
It gives me this error:
26747 Segmentation fault Mserver --dbinit="module(pathfinder);"
but it gives me any number before it.
I am not sure where I can locate the MD error log (if it exists), and
how it is triggered.
Help?
Thanks,
junte
Our application includes multiple services that independently connect to the
database. When each service connects it reads the database metadata to
discover the column names, many services may start at about the same time.
We are getting the following error:
java.sql.SQLException: TypeException:user.s7172_2[262]:'io.stdout' undefined
in: _416:any
MALException:opt.remap:no implementation present
at
nl.cwi.monetdb.jdbc.MonetConnection$ResponseList.executeQuery(MonetConnection.java:2058)
at
nl.cwi.monetdb.jdbc.MonetConnection$ResponseList.processQuery(MonetConnection.java:1808)
at
nl.cwi.monetdb.jdbc.MonetStatement.internalExecute(MonetStatement.java:371)
at nl.cwi.monetdb.jdbc.MonetStatement.execute(MonetStatement.java:340)
at nl.cwi.monetdb.jdbc.MonetStatement.executeQuery(MonetStatement.java:404)
at
nl.cwi.monetdb.jdbc.MonetDatabaseMetaData.getTypeInfo(MonetDatabaseMetaData.java:2454)
# check status of database
/monetdb status
name state uptime crashes
BUSINESS running 1d 1h 40m 38s 3 2008-04-22 11:23:59
# try a query using mclient running on the same machine as the database
./mclient -dBUSINESS -umonetdb -Pmonetdb -lsql
sql>select * from tables;
!TypeException:user.s5301_3[130]:'io.stdout' undefined in: _193:any
sql>select * from schemas;
!TypeException:user.s5302_3[40]:'io.stdout' undefined in: _63:any
sql>select * from columns;
!TypeException:user.s5303_3[146]:'io.stdout' undefined in: _210:any
sql>\q;
When we stop and restart the merovingian demon, the problem goes away, but may
reoccur the next time we try and start multiple services in short succession.
Environment
MonetDB
MSG merovingian[12454]: starting database 'BUSINESS', up min/avg/max:
121/43165/92925, crash average: 0.00 0.30 0.10 (7-4=3)
MSG BUSINESS[12459]: executing '/monetdb/bin/mserver5' for database 'BUSINESS'
MSG BUSINESS[12459]: # MonetDB server v5.5.0, based on kernel v1.23.0
MSG BUSINESS[12459]: # Serving database 'BUSINESS'
MSG BUSINESS[12459]: # Compiled for x86_64-unknown-linux-gnu/64bit with 64bit
OIDs dynamically linked
JDBC Driver
1.7 (Canephora_p2 20080418 based on MCL v1.3)
hi,
first: about that bug, i'll defiantly file it.
second:
when i migrated this table from mssql to monetdb5 I intentionally inserted
it in that order (recid,ft,val). i used COPY INTO while supplying a file
that i prepared in advanced with such a sorting.
when i first decided to test monetdb5 i did it because i understood it
should perform better than row-oriented DBMSs (like mssql), especially in
CRM-like applications. so... i now understand that monetsdb5 is actually
being expected to perform BAD in that specific situation i described before
(in my previous message).
from my experience in building CRM applications, that kind of "left join"
according to unique number (recid) is very common in CRM applications. am i
missing something here??
Thanks again,
ewilde.
I was testing some queries, and for example this query has made the mserver5
process 100% CPU, which i could not kill and machine could not be rebooted.
running Fedora 8 64 bit.
query like this:
select col1, col2, sum(col3), sum(col4), sum(col5)
from thetable
group by col1, col2;
there are about 160 columns in the table.
col1 and col2 are "smallint"
col3, col4, col5 are "real"
number of distinct values:
in col1 is about 120
in col2 about 80
total number of rows in table is about 400M
I tested the query 2 times,
simpler queries work ok, like this:
select col1, count(*) from thetable group by col1;
this query has crashed mserver5 before, not sure if this already fixed:
select col1, count(1) from thetable group col1;
--
View this message in context: http://www.nabble.com/mserver5-process-locked-unkillable-tp16789362p1678936…
Sent from the monetdb-users mailing list archive at Nabble.com.
hello,
this is how my table - table1(int,int,varchar) - looks like:
recid | ft | val
---------------------------------------
1001 | 301 | a
1001 | 302 | b
1001 | 303 | c
1002 | 301 | d
1002 | 302 | e
1003 | 303 | f
1003 | 301 | g
1003 | 302 | h
1004 | 303 | i
in reality this table has a few millions of records.
for the following query:
select t1.val as val1 , t2.val as val2
from table1 t1
left join table1 t2 on t1.recid=t2.recid and t2.ft=303
where t1.recid=1002 and t1.ft=301
i get the expected result:
val1 | val2
---------------------
d | f
the execution time of this query is very slow: ~1.5 seconds - in comparison
to MSSql:0.016 seconds.
the kind of a query that i really need in my production application doesn't
have only one such a "left join...", but may have even 10. with 5 such
"left join..."s the query execution time is ~25 seconds, which is so far
slower than MSSql: 0.156
more info:
1. i run the queries from SQuirreL SQL Client. the MonetDB5 is installed\run
on the same machine.
the MSSql server was running on another machine in the same LAN.
2. the primary key of table1 is : recid,ft,val
3. at the beginning i had another problem, that i have got an empty result
when running a simple query like this:
select * from table1 t1 where t1.recid=1002 and t1.ft=301
i'm not sure about this, but i guess that this problem had been solved by
creating a view such as:
create view v1 as select recid,ft,val from table1
i have no clue whether these two problems related to each other, but i
mentioned it since it might give you a hint for solving the other problem.
thanks in advanced for any help,
ewilde.