Hi Stefan,
Thanks for your reply!
We have run the query a few times with different size of data. There we
used 16G RAM(actually 13.5G was used), and find the size of 10G's data
is the critical point that can run the query. All of the data files'
size are listed below, each file name is a table name(there are only a
few tables are refered -- store_sales, date_dim, item, customer,
catalog_sales, web_sales):
7.4K call_center.dat
1.6M catalog_page.dat
212M catalog_returns.dat
2.9G catalog_sales.dat
27M customer_address.dat
64M customer.dat
77M customer_demographics.dat
9.9M date_dim.dat
77B dbgen_version.dat
149K household_demographics.dat
328B income_band.dat
2.6G inventory.dat
28M item.dat
61K promotion.dat
1.7K reason.dat
1.1K ship_mode.dat
27K store.dat
323M store_returns.dat
3.8G store_sales.dat
4.9M time_dim.dat
1.2K warehouse.dat
19K web_page.dat
98M web_returns.dat
1.5G web_sales.dat
12K web_site.dat
So we guess that the monetdb has no memory management?
For the output of `mserver5 --version` is:
MonetDB 5 server v11.27.13 "Jul2017-SP4" (64-bit, 128-bit integers)
Copyright (c) 1993 - July 2008 CWI
Copyright (c) August 2008 - 2018 MonetDB B.V., all rights reserved
Visit https://www.monetdb.org/ for further information
Found 17.0GiB available memory, 40 available cpu cores
Libraries:
libpcre: 8.38 2015-11-23 (compiled with 8.38)
openssl: OpenSSL 1.0.2g 1 Mar 2016 (compiled with OpenSSL 1.0.2g 1
Mar 2016)
libxml2: 2.9.3 (compiled with 2.9.3)
Compiled by: monetdb(a)MonetDB-0.0 (x86_64-pc-linux-gnu)
Compilation: gcc -g -O2
Linking : /usr/bin/ld -m elf_x86_64
And the size of processes is not limited.
To let you reproduce the problem conveniently, I'll provide more details
here:
you can get tpc-ds from its website(we use version 2.6.0).
Install the tpc-ds, access the directory v2.6.0/tools and run `./dsdgen
-scale 10 -dir /home/monetdb/tpc-ds_test_data10G` to generate the data.
When data has been generated, using the script /expe.sh/ to create
tables and load the data. The query script is 123.tpcds.23.sql.(The
syntaxs of other queries that tpc-ds generates is not suitable for
monetdb all, we don't modify them all when the problem occurred).
One more question, I can't get your reply email, so I don't know how to
reply you, for this case, I could only send a new mail echo time.
Thanks!
Regards,
Rancho
For using the MonetDB-rmath extension, I was interested in using
table-returning functions to return multiple values. Is there a fast and
elegant way to combine the results with the source table in SQL? To make
this concrete, consider some test data and a function to calculate a 95%
confidence interval from a mean and standard error:
drop table test;
create table test as select cast(value as double) as value from
generate_series(0,5);
drop function cis(double,double);
create function cis(mu double, se double) returns table(lci double, uci
double)
begin return select mu-1.96*se, mu+1.96*se;
end;
We can then call the table-returning function:
select * from cis((select value, 1.0 from test));
If we want to re-join the results with the source table, we could use
row_number() or use lateral:
-- Using row_number:
select * from
(select *, row_number() over() as rid from test) as t1
natural join
(select *, row_number() over() as rid from cis((select value,1.0 from
test))) as t2;
-- Using lateral:
select * from test, lateral cis(test.value, cast(1.0 as double)) as t2;
I believe that the latter will lead to a slow loop, even when BAT
functions are available. The table-returning function can be used in the
select statement only for a scalar expression:
select cis(0,1).*; -- ok
select cis(0,1).* from test; -- Error: no such operator 'cis'
select cis(value,1).* from test; -- Error: identifier 'value' unknown
Again, is there a fast and elegant way to combine the source table with
a table-returning function in SQL?
Kindly, Mark.
The JIT functions by Mark Raasveldt provide a powerful alternative
approach to bring C and C++ code into MonetDB. I have provided an
alternative implementation of the MonetDB-rmath extension using JIT
functions.
The JIT implementation is considerably shorter and simpler, and slightly
more general - where the code can deal with function arguments that are
any combination of scalars and BATs. However, there are indications that
the functions are considerably slower. As a minor point, I needed to
rename the functions (e.g. using r_ as a prefix) -- otherwise the
function names would collide with the function signatures in Rmath.h.
To use the JIT-based rmath extension:
git clone https://github.com/mclements/MonetDB-rmath.git
git checkout jit
make
sudo make install
monetdb create test
monetdb release test
monetdb set embedc=yes test
mclient -d test -s "select r_pnorm(1.96,0,1);"
As a challenge for JIT-based extensions, new C functions (e.g. new
functions that call Rmath.h, and for which we want to call the new
functions in MonetDB) will need a header file that is available to
MonetDB. A fragile (non-portable?) solution to find the MonetDB include
folder is to use:
MONETDBINCLUDE = $(patsubst -I%,%,$(firstword $(shell pkg-config
--cflags-only-I monetdb5)))
in the Makefile. This assumes that the MonetDB include folder is the
first include folder in CFLAGS. The extension-specific header for any
new functions can then be copied to that folder:
cp rmath_ext.h $(MONETDBINCLUDE)
I hope that this is useful.
Sincerely, Mark.
Hi all:
I am receiving an error "Failed to import table ... Leftover data ..."
I am using a simple BASH script and the mclient tool to import my data from hundreds of CSV files.
The fields are pipe | separated, text values are surrounded with double-quotes, and rows end with LF (\n).
This is the command I am using to process the files -
for csvfile in `ls | grep -i 2018.*\.csv$`; do mclient -d mydatabase -s "COPY INTO table FROM '/path/to/csv/files/$csvfile' USING DELIMITERS '|', '\n', '\"'"; done
I have a .monetdb file setup in my home directory so as not to need the username and password.
Thanks kindly,
Adam Doherty
Hello,
How do we make MonetDB release memory after usage.
We are seeing high cached memory on the DB Servers and We also see queries failing with the below error - Memory Allocation Failure : Total Memory - 1123348 Physical Memory - 1123348 Virtual Memory - 2667052
How do we overcome this issue?
Thank You,
Gautham
Hello everyone,
I believe I've encountered a bug with the remote table functionality. Here are the steps to replicate it. Tested with MonetDB 11.29.3, both on Ubuntu 14.04.5 LTS and Ubuntu 16.04.4 LTS. The problem persists even after restarting the DB farms and the MonetDB daemons.
On node 1
Create a normal table, insert some data and check its contents:
create table testtable(id int not null, name text, valid boolean);
insert into testtable (id, name) values (1, '1');
insert into testtable (id, name) values (2, '2');
insert into testtable (id) values (3);
select * from testtable;
+------+------+-------+
| id | name | valid |
+======+======+=======+
| 1 | 1 | null |
| 2 | 2 | null |
| 3 | null | null |
+------+------+-------+
3 tuples (6.446ms)
On node 2
Create a remote table and check its contents to make sure it works:
create remote table testtable(id int not null, name text, valid boolean) on 'mapi:monetdb://node1:50000/dbfarm';
select * from testtable;
+------+------+-------+
| id | name | valid |
+======+======+=======+
| 1 | 1 | null |
| 2 | 2 | null |
| 3 | null | null |
+------+------+-------+
3 tuples (6.446ms)
Then try to select anything with a WHERE clause:
select * from testtable where name is null;
+------+------+-------+
| id | name | valid |
+======+======+=======+
| 3 | null | null |
+------+------+-------+
1 tuple (5.186ms)
It seemingly works fine, but now when you try to select anything else, you get the same result:
select * from testtable;
+------+------+-------+
| id | name | valid |
+======+======+=======+
| 3 | null | null |
+------+------+-------+
1 tuple (4.466ms)
If you log out from mclient and then back in, select * works fine again, so there's no data loss. However, even if you try the opposite of the previous query (is not null instead of is null), you get the same result:
select * from testtable where name is not null;
+------+------+-------+
| id | name | valid |
+======+======+=======+
| 3 | null | null |
+------+------+-------+
1 tuple (5.358ms)
Can someone else please try this to confirm it's a bug and not something on my end?
Thanks in advance.
Best regards,
Marcell
Dear all,
A first version for MonetDB-rmath is now available. It includes
approximately 90% of the functions in Rmath.h, including
[rpdq](norm|unif|gamma|beta|lnorm|chisq|nchisq|f|t|binom|cauchy|exp|geom|hyper|
nbinom|pois|weibull|logis|wilcox|signrank) and another 40 functions. The
current implementation uses m4 for code generation and requires the
r-mathlib package under Debian.
One can clone the GitHub repository and install using:
git clone https://github.com/mclements/MonetDB-rmath.git
cd MonetDB-rmath
make
sudo make install
You will then need to use a fresh database. An example script is:
-- calculate the quantile and cumulative distribution function for a
unit normal
select qnorm(0.025,0,1), pnorm(-1.96,0,1);
-- calculate E(Z^2) for Z~Normal(0,1)
select sum(R_pow(value,2)*0.01*dnorm(value,0,1)) from
sys.generate_series(-5.0,5.0,0.01);
-- random numbers
select set_seed(1,2);
select rpois(value*0+100) from sys.generate_series(1,10); -- ok
select rpois(100) from sys.generate_series(1,10); -- repeats the values
10 times:-<
All of the functions are currently scalars. Next steps include adding
BAT functions, including some of the constants at the beginning of
Rmath.h, and adding some statistical tests. As an example, I have
included poissonci(y,boundary) for an exact Poisson confidence interval
for count y for a specific boundary (1=left, 2=right) with default
confidence level of 95%:
select poissonci(10,1), poissonci(10,2);
The equivalent code in R would be poisson.test(10). For confidence
intervals, it would be nice to have a type that is a pair of doubles.
Sincerely, Mark.
I am trying to adapt MonetDB-gsl to use the rmath library. My code is at:
https://github.com/mclements/MonetDB-rmath
Under Ubuntu 16.04 LTS, I used:
sudo apt install r-mathlib
git clone https://github.com/mclements/MonetDB-rmath
cd MonetDB-rmath
make
sudo make install
// start up MonetDB server with a test database
mclient -d test -s "select sys.pchisq(20.0,5.0);"
This compiles but when I try "select sys.pchisq(20.0,5.0);" I get the error:
SELECT: no such binary operator 'pchisq(decimal,decimal)'
Any guidance would be appreciated.
(You may ask why one would want to do this when one could use embedded R
- I wanted to see how fast and easy it was to use the rmath library.)
Kindly, Mark.