Hi,
As per MonetDB Solution suggestion, we are asking here about this topic.
We are studying how MonetDB store its database on disk.
So far we were able to understand pretty much all the files (BAT,
journal, BBP.dir, heap, hash, imprints, and so on). We have custom tools
to decode and dump them (experimental at this point).
There is one thing however that is unclear: we thought that "delta BAT",
which are the BAT assigned to an SQL tables to list the OIDs of rows
that were deleted (named "D_<schema>_<table>" internaly), would contains
only *uniques* values (since you can never delete twice the same row).
But on several databases that we are running, we found that some of
theses BATs contain duplicates. Lot of duplicates actually. Especially
on D_sys__columns and D_sys__tables BATs (respectively for the
sys._columns and sys._tables system tables). Some databases do not have
this "issue" (while they all have the same schema, and process the same
kind of data as the other ones).
Can someone explain if duplicates are expected in theses BATs ?
Here is an excerpt of D_sys__tables:
# hexdump -C 02/210.tail
00000000 2e 00 00 00 00 00 00 00 2f 00 00 00 00 00 00 00 |......../.......|
00000010 30 00 00 00 00 00 00 00 31 00 00 00 00 00 00 00 |0.......1.......|
00000020 32 00 00 00 00 00 00 00 33 00 00 00 00 00 00 00 |2.......3.......|
00000030 34 00 00 00 00 00 00 00 35 00 00 00 00 00 00 00 |4.......5.......|
00000040 36 00 00 00 00 00 00 00 37 00 00 00 00 00 00 00 |6.......7.......|
00000050 42 00 00 00 00 00 00 00 43 00 00 00 00 00 00 00 |B.......C.......|
00000060 44 00 00 00 00 00 00 00 45 00 00 00 00 00 00 00 |D.......E.......|
00000070 46 00 00 00 00 00 00 00 47 00 00 00 00 00 00 00 |F.......G.......|
00000080 48 00 00 00 00 00 00 00 49 00 00 00 00 00 00 00 |H.......I.......|
00000090 4a 00 00 00 00 00 00 00 4b 00 00 00 00 00 00 00 |J.......K.......|
000000a0 4c 00 00 00 00 00 00 00 4d 00 00 00 00 00 00 00 |L.......M.......|
000000b0 4e 00 00 00 00 00 00 00 4f 00 00 00 00 00 00 00 |N.......O.......|
000000c0 50 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 |P.......P.......|
000000d0 51 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 |Q.......P.......|
000000e0 51 00 00 00 00 00 00 00 52 00 00 00 00 00 00 00 |Q.......R.......|
000000f0 50 00 00 00 00 00 00 00 51 00 00 00 00 00 00 00 |P.......Q.......|
00000100 52 00 00 00 00 00 00 00 53 00 00 00 00 00 00 00 |R.......S.......|
00000110 50 00 00 00 00 00 00 00 51 00 00 00 00 00 00 00 |P.......Q.......|
00000120 52 00 00 00 00 00 00 00 53 00 00 00 00 00 00 00 |R.......S.......|
...
You can clearly see duplicates OID (the first one being 0x50 at 0xc8).
Its entry in BBP.dir (split into sections):
136 32 tmp_210 tmpr_210 02/210 610523782 2 171807 0 0 171807 172032 0 0 0 0
void 0 1 1793 0 0 0 0 0 1000651 0 0 0
oid 8 0 1024 24 25 27 1 46 773603235 1374456 1376256 1
We are using: MonetDB 5 server v11.21.14 (64-bit, 64-bit oids, 128-bit integers).
--
Frédéric Jolliton
Sécuractive
Hi,
Last year, I was looking for the MAL definition of group_concat and
Nik Schuiling gave me the MAL definition which is below.
We compile MonetDB from source and this definition was working on
MonetDB-11.19.9 (Oct2014-SP2). Since this version, there has been a
number of newer versions of MonetDB. So, I was trying to upgrade our
system to the latest version of MonetDB (11.21.13). After the
installation of the newer version, I ran:
select "DataSetId", group_concat("FieldName") from datasetmarkermeta
group by "DataSetId";
from mclient interface and got the following error message:
TypeException:kddart.subgroup_concat[5]:'algebra.uselect' undefined
in: TIDs:any := algebra.uselect(g:bat[:oid,:oid],grpid:oid);
MALException:kddart.subgroup_concat[0]:Error in cloned function
TypeException:user.s2_1[15]:'kddart.subgroup_concat' undefined in:
X_27:bat[:oid,:str] :=
kddart.subgroup_concat(X_26:bat[:oid,:str],X_16:bat[:oid,:oid],r1_16:bat[:oid,:oid],true:bit);
program contains errors
It says algebra.uselect function is not defined. How do I fix this problem?
group_concat MAL definition start
-------------------------------------------
module kddart;
command group_concat(b:bat[:oid,:str]):str
address KDDARTgroup_concat
comment "Mal kddart";
function group_concat(b:bat[:oid, :str]):str;
value := "";
barrier (h, t) := iterator.new(b);
value := value + t;
value := value + "";
redo (h, t) := iterator.next(b);
exit (h, t);
return value;
end group_concat;
function subgroup_concat(b:bat[:oid,:any_1],g:bat[:oid,:oid],e:bat[:oid,:any_2],skip_nils:bit)
:bat[:oid,:str];
nw := aggr.count(e);
nl := calc.lng(nw);
bn := bat.new(:oid, :str, nl);
# check 'e' has some values - this is the list of group IDs in the head
# 'g' is the group to the data BAT head ID mapping
# 'b' is the data column BAT that we are aggregating over
barrier (grpid, t) := iterator.new(e);
# select GID from TID->GID map to get matching TIDs
TIDs := algebra.uselect(g,grpid);
# get DATA for matching TIDs
b_data := algebra.kintersect(b,TIDs);
# aggregate
grpval := group_concat(b_data);
# Store the result for this group
bat.insert(bn, grpid, grpval);
redo (grpid, t) := iterator.next(e);
exit (grpid, t);
return bn;
end subgroup_concat;
I use two versions of Monetdb. 1 is PythonUDF and another Monetdb 11.21.13.
Function "bitand" is compiled and used well in Monetdb, but when I try to
compile PythonUDF it gives me errors:
udf.c:14:1: error: no previous prototype for 'UDFbitand'
[-Werror=missing-prototypes]
UDFbitand(flt *ret, lng*a, lng *b)
^
cc1: all warnings being treated as errors
Makefile:876: recipe for target 'lib_udf_la-udf.lo' failed
any ideas? thanks in advance.
Before filing a bug report, I'd like to ask here whether this is
intentional/expected.
sql>create table t(a int);
operation successful (0.650ms)
sql>insert into t values (1),(0),(3),(2);
4 affected rows (0.638ms)
sql>
sql>create table sorted_t as select a from t order by a with data;
operation successful (0.807ms)
sql>
sql>select table,column,sorted from sys.storage() where "table"='sorted_t';
+----------+--------+--------+
| table | column | sorted |
+==========+========+========+
| sorted_t | a | false |
+----------+--------+--------+
I am pretty sure it use to be that the sorting used in a "CREATE TABLE AS
... WITH DATA" would result in the tsorted property to be true for the
(first) sorted column.
Now this isn't the case. Is that a bug or a feature?
Roberto
Hi,
Today, we encountered this serious memory issue with MonetDB.
*Data : *10 tables - each with ~600 columns - all tables are EMPTY (No
rows).
*Query : *One query that involves with 25 joins across all the 10 tables.
*MonetDB version :* July2015-SP3
The above query is ran on the same MonetDB version with the same table
structure with all tables empty on two different machines as follows.
*Scenario 1 - Query ran on Macbook Pro *
- RAM - 8 GB (with other processes like Chrome/Eclipse running in
parallel)
- Free disk space - ~100 GB
- MonetDB July2015-SP3 downloaded as installable
The query is ran.
*Expected result* - return an empty set in a few seconds.
*Actual result* - returns an empty set in ~5 minutes
RAM - 8 GB utilised completely.
Swap space used from disk - 10 GB
This swap space is cleared after sometime automatically (around 5 mins)
*Scenario 2 - With a CentOS virtual machine*
- RAM - 128 GB
- Free disk space - ~650 GB
- MonetDB July2015-SP3 compiled from source code
The query is ran.
*Expected result* - return an empty set in a few seconds.
*Actual result* - the query rebooted this high end CentOS machine.
RAM - 128 GB utilised completely.
After reboot, the swap memory was freed.
I am puzzled with the following questions.
1. Why should a query that runs on 10 EMPTY tables require memory in
GBs?
2. How did this query manage to run on a Mac with just 8 GB RAM, but
crashed a machine with 128 GB RAM?
Is this a known memory leak issue?
Any help much appreciated. Thanks in advance.
Regards,
Vijay.
How to find all running processes and their statuses in MonetDB??
I found this...
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/monitor
But I couldn't get this work... I ran a query (which took some 20 seconds
to complete) in a window and connected the same database in another
terminal and ran "select * from sys.queue" (several times during those 20
seconds) but I could get only this...
sql>select * from sys.queue;
+-------+---------+----------------------------+----------------------------+----------+---------+------------+--------------------------+
| qtag | user | started | estimate
| progress | status | tag | query |
+=======+=========+============================+============================+==========+=========+============+==========================+
| 27339 | monetdb | 2015-08-13 05:35:40.000000 | null
| null | running | 12532134@0 | select * from sys.queue; |
+-------+---------+----------------------------+----------------------------+----------+---------+------------+--------------------------+
I am using Oct-2014 version of monetdb on CentOS...
Is there a way to find all running processes and their statuses in
MonetDB??
Thanks & Regards,
Vijayakrishna.P.
Mobile : (+91) 9500402305.
Hi,
I'm using Perl DBD::monetdb from MonetDB source for close to a year
now. Recently, I discovered using SQL SELECT for one or two fields for
a large number of records, for example, 100000 records is too slow
compared to how faster MonetDB engine is. The same SQL SELECT is way
faster in MySQL due to C implementation of DBD::mysql.
MonetDB connector in Python is still way faster than Perl DBD::monetdb
because it allows user to tell the connector the maximum number of
records the user needs and the connector uses Xexport command
straightaway and only once.
In Perl, DBD::monetdb calls Mapi.pm via getReply which issues Xexport
command many times depending on the number of records in the SELECT
statement. Each time, it retrieves only 100 records. For 100000
records, it needs to call Xexport 1000 times.
I did my own patch to make getReply call Xexport only once a the rest
of the records after the initial request. With the patch, the time it
takes to select 2 fields for 101024 records drops from more 24 seconds
to under 4 seconds. My patch is:
<pre>
sub getReply {
my ($self)= @_;
if ($self->{active} == 0) {
return $self->getBlock();
} elsif ($self->{next} < $self->{replysize} + $self->{skip}) {
return $self->getRow();
} elsif (${self}->{offset} + $self->{replysize} < $self->{count}) {
# get next slice
my $rs = $self->{replysize};
my $offset = $self->{offset} + $rs;
my $therest = $self->{count} - $self->{replysize};
$self->putblock("Xexport $self->{id} $offset $therest"); # FAST
#$self->putblock("Xexport $self->{id} $offset $rs"); # SLOW
return $self->getBlock();
} else {
# close large results, but only send on next query
if ($self->{id} > 0 && $self->{count} != $self->{replysize}) {
push @{$self->{piggyback}}, "Xclose $self->{id}";
$self->{skip_in}++;
}
$self->{active} = 0;
}
return $self->{active};
}
</pre>
I hope this could help other people.
Regards,
Puthick
I compiled Monetdb-python version over regular Monetdb source compilation
and after it I receive error:
2016-03-30 20:14:11 MSG merovingian[23154]: starting database 'db', up
min/avg/max: 0s/0s/0s, crash average: 0.00 0.00 0.00 (0-0=0)
2016-03-30 20:14:11 MSG db[23176]: arguments: /usr/local/bin/mserver5
--dbpath=/home/shmagi/db/db --set
merovingian_uri=mapi:monetdb://infdbserv02:50000/db --set mapi_open=false
--set mapi_port=0 --set mapi_usock=/home/shmagi/db/db/.mapi.sock --set
monet_vault_key=/home/shmagi/db/db/.vaultkey --set gdk_nr_threads=24 --set
max_clients=64 --set sql_optimizer=default_pipe --set monet_daemon=yes
2016-03-30 20:14:11 MSG merovingian[23154]: database 'db' (23176) has
crashed (dumped core)
2016-03-30 20:14:16 ERR merovingian[23154]: client error: database 'db'
appears to shut itself down after starting, check monetdbd's logfile for
possible hints
I tried ro ./configure, make clean, make , make install from pure monetdb
source (without python), but still receive these errors. I tried to use
prefix to install in another directory, tried to delete manually all files
I found, but still same error. Any ideas how to fix this problem?
I wrote a function bitand. When I do "select bitand(1111,1010);" it gives
"0" instead of "0.5". My function:
--UDF.c--
str
UDFbitand(long long int a, long long b)
{
long long int weirdAnd(unsigned int a, unsigned int b) {
long long int result = 0;
int coef = 1;
while (a || b) {
result += ((a % 10) && (b % 10)) * coef;
coef *= 10;
a /= 10;
b /= 10;
}
return result;
}
long long int temp = weirdAnd(a, b);
long long int temp2 = weirdAnd(a, b);
int length;
int count;
while (temp != 0)
{
temp/=10;
length++;
}
while (temp2 != 0) {
if (temp2 % 10 == 1)
count++;
temp2 /= 10;
}
float final = count/(float)length;
return MAL_SUCCEED;
}
--UDF.mal--
command bitand(a:lng,b:lng):flt
address UDFbitand
comment "bitand";
--80_UDF.sql--
create function bitand(a BIGINT,b BIGINT)
returns REAL external name udf.bitand;
When I test my C code in netbeans it works as supposed. Any ideas? Thanks
in advance.