Hi all –
I have always used the COPY BINARY INTO … commands to load my 2.0 Billion row genetic data into a monetdb table. With 135 columns, it has been blindingly fast.
Last week I moved from the June2016-SP2 release to dec2016-SP2. My binary loads are taking WAY longer. I killed one after 3 hours (via “call sys.stop(pid)” so it could clean up properly). I then started the load again, thinking perhaps the problem was related to the new columns I was adding.
I have since dropped the table and remade it using the same data and scripts that worked in just over 3 minutes in February on the jun2016-SP2 load. It is really chugging along – I’m up to 30 minutes and counting. I don’t have access to the sql log files, but the Merovingian.log shows nothing.
I do notice that previously the binary files, once loaded, were removed from the loading directly. This does not happen now. Were these files previously “moved” and now they are copied?
Has anyone see this performance issue with Dec2016-SP2 COPY BINARY INTO …. Commands?
Thanks - Lynn
Allocating "only" about 2.8 GB does not ring alarms that the plan night be degenerated - assuming your data is not tiny.
Could you share:OS, full query, data sizes, free disk space at time of failure, mserver5 process size at time of failure?
Thanks,Stefan
-------- Original message --------From: Anthony Damico <ajdamico(a)gmail.com> Date: 11/22/17 12:13 (GMT+01:00) To: Communication channel for MonetDB users <users-list(a)monetdb.org> Subject: algebra.join heapextend crash
hi, i'm using monetdblite-r 0.5.0 on a confidential dataset. my syntax worked on monetdblite 0.3.1 on the same machine
ftruncate: Invalid argument
Error in .local(conn, statement, ...) :
Unable to execute statement 'CREATE TABLE ccaed123_b_svcmon_xwalk AS
( SELECT
a.enrolid ,
a.yr ,
a.dataty...'.
Server says 'MALException:algebra.join:GDK reported error. !ERROR: GDKextendf: c
!ERROR: HEAPextend: failed to extend to 2890530816 for 02\57\25766.tail: GDKmre
map() failed '.
it's a local drive, the hard disk isn't close to full and the machine has 700gb of ram, but the
table it's trying to create might be bigger than that. dunno if related to https://www.monetdb.org/bugzilla/show_bug.cgi?id=3791
thanks
Hi all,
can you help me with the below query?
Is there an other way to return query results using the jdbc interface (without using sql.resultSet)?
Thanks in advance!
Best regards,Dennis
-------- Ursprüngliche Nachricht --------Von: Dennis Fassl <dennis.fassl(a)uni-konstanz.de> Datum: 25.11.17 16:55 (GMT+01:00) An: users-list(a)monetdb.org Betreff: MAL query "sql.resultSet"
Hi all,
I am trying to execute the MAL query below.
But the command "sql.resultSet" is terminating mclient. When using the
JDBC interface I get "SQLException: Connection to server lost! (mserver
still alive?)".
Do you have an idea why this is happening?
Thanks and regards,
Dennis
sql.init();
table_names := bat.new(nil:str);
bat.append(table_names, "t0");
bat.append(table_names, "t1");
table_attrs := bat.new(nil:str);
bat.append(table_attrs, "int_col");
bat.append(table_attrs, "str_col");
table_types := bat.new(nil:str);
bat.append(table_types, "int");
bat.append(table_types, "str");
table_lens := bat.new(nil:int);
bat.append(table_lens, 1);
bat.append(table_lens, 1);
table_scales := bat.new(nil:int);
bat.append(table_scales, 0);
bat.append(table_scales, 0);
ints := bat.new(nil:int);
bat.append(ints, 42);
bat.append(ints, 23);
strs := bat.new(nil:str);
bat.append(strs, "foo");
bat.append(strs, "bar");
sql.resultSet(table_names, table_attrs, table_types, table_lens,
table_scales, ints, strs);
_______________________________________________
users-list mailing list
users-list(a)monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
Hi all,
I am trying to execute the MAL query below.
But the command "sql.resultSet" is terminating mclient. When using the
JDBC interface I get "SQLException: Connection to server lost! (mserver
still alive?)".
Do you have an idea why this is happening?
Thanks and regards,
Dennis
sql.init();
table_names := bat.new(nil:str);
bat.append(table_names, "t0");
bat.append(table_names, "t1");
table_attrs := bat.new(nil:str);
bat.append(table_attrs, "int_col");
bat.append(table_attrs, "str_col");
table_types := bat.new(nil:str);
bat.append(table_types, "int");
bat.append(table_types, "str");
table_lens := bat.new(nil:int);
bat.append(table_lens, 1);
bat.append(table_lens, 1);
table_scales := bat.new(nil:int);
bat.append(table_scales, 0);
bat.append(table_scales, 0);
ints := bat.new(nil:int);
bat.append(ints, 42);
bat.append(ints, 23);
strs := bat.new(nil:str);
bat.append(strs, "foo");
bat.append(strs, "bar");
sql.resultSet(table_names, table_attrs, table_types, table_lens,
table_scales, ints, strs);
All,
May be a dump question, but curious to know. What is the maximum number of MonetDB instance anyone has setup in a single server? And what level of concurrency has been tested?
Regards,
Sreejith
All,
Is there any data compression methods available in MonetDB? What I understand from I read is, by default MonetDB doesn't have a compression methods to avoid the overhead of uncompression while loading the files into memory mapped files. Is that true? If there is a compression mechanism (alike QUERY HIGH Compression in Oracle) what is it in here?
Regards,
Sreejith
hi, i'm using monetdblite-r 0.5.0 on a confidential dataset. my syntax
worked on monetdblite 0.3.1 on the same machine
ftruncate: Invalid argument
Error in .local(conn, statement, ...) :
Unable to execute statement 'CREATE TABLE ccaed123_b_svcmon_xwalk AS
( SELECT
a.enrolid ,
a.yr ,
a.dataty...'.
Server says 'MALException:algebra.join:GDK reported error. !ERROR:
GDKextendf: c
!ERROR: HEAPextend: failed to extend to 2890530816 for
02\57\25766.tail: GDKmre
map() failed '.
it's a local drive, the hard disk isn't close to full and the machine has
700gb of ram, but the table it's trying to create might be bigger than
that. dunno if related to
https://www.monetdb.org/bugzilla/show_bug.cgi?id=3791
thanks
Hi,
Following up on this thread (
https://www.monetdb.org/pipermail/users-list/2016-April/009107.html) from
last year, the issue was that the pcre-based regex replacement did not
implement back references in the replacement string.
In need for this feature, I so far used the pcre implementation via R
function. It works well, but using R just for this is a bit odd, especially
because R-core has a pretty dumb dependency on the whole TeX distribution,
and building MonetDB docker images that need to include R and TeX just
because of regex isn't a neat option.
I have also tried to use both re and regex packages via python, but this is
very slow compared to pcre.
So I finally decided to put a few hours on the MonetDB implementation and
added support for back references.
They can be indicated with both \1 and $1 syntax. Back-reference 0 is the
whole match. Out-of-bound back-references are empty strings.
While at it:
- pcre.replace replaces all matches, so I made a variant pcre.replacefirst
to replace only the first match.
- the bat implementation was ignored because of a missing "module batpcre"
in pcre.mal, it always resorted to manifold on the string version. Fixed
that.
Also, I'm not sure why there don't seem to exist SQL functions declared for
this.
I use the following:
CREATE FUNCTION pcre_replace(s string, pattern string, repl string, flags
string) RETURNS string EXTERNAL NAME pcre."replace";
CREATE FUNCTION pcre_replacefirst(s string, pattern string, repl string,
flags string) RETURNS string EXTERNAL NAME pcre."replace_first";
Perhaps they could come by default?
Please feel free to use and modify the patch (Jul2017) in attachment if you
think it's useful. I've tested it quite a bit, no issues found.
Here are some examples:
sql>select name, pcre_replace(name, '([a-z]+)_([a-z]+)', '[\\1]-[\\2]', '')
as replaced from sys.functions limit 2;
+-------------+---------------------------------------------+
| name | replaced |
+=============+=============================================+
| mbr_overlap | [mbr]-[overlap] |
| mbr_overlap | [mbr]-[overlap] |
+-------------+---------------------------------------------+
2 tuples (4.650ms)
sql>select name, pcre_replacefirst(name, '([a-z])', '[\\1]', '') as
replaced from sys.functions limit 2;
+-------------+---------------------------------------+
| name | replaced |
+=============+=======================================+
| mbr_overlap | [m]br_overlap |
| mbr_overlap | [m]br_overlap |
+-------------+---------------------------------------+
2 tuples (4.728ms)
sql>select name, pcre_replace(name, '([a-z])', '[\\1]', '') as replaced
from sys.functions limit 2;
+-------------+-----------------------------------------------------------------------------------------------------+
| name | replaced
|
+=============+=====================================================================================================+
| mbr_overlap | [m][b][r]_[o][v][e][r][l][a][p]
|
| mbr_overlap | [m][b][r]_[o][v][e][r][l][a][p]
|
+-------------+-----------------------------------------------------------------------------------------------------+
2 tuples (10.360ms)
Hi,
I created a table with 9 columns and passed 6 columns from the table to a
UDF. This UDF is fully vectorizable. However, we found only single
thread was used during the execution. Then, we came to think about what
the problem behind and we created the two simple functions python_min_map_3
and python_min_map_4 which takes 3 and 4 parameters separately. The
MonetDB Linux version we used was July 2017.
If PYTHON_MAP is enabled, it would return the results of each segment
processed by parallel code. The number of results depends on how many
threads are used. Otherwise, a single value should be returned (e.g.
because of numpy.min).
The result is
- python_min_map_3 returns a couple of numbers
- python_min_map_4 returns a single number
Even when we increased the number of the arguments, a single number was
always returned. Is it a restriction in using PYTHON_MAP when the number
of arguments should be no more than 3? Here is our example code below.
CREATE FUNCTION python_min_map_3(x0 FLOAT, x1 FLOAT, x2 FLOAT)
RETURNS FLOAT LANGUAGE PYTHON_MAP {
return numpy.min(x0)
};
select python_min_map_3(x0, x1, x2) from table_0;
CREATE FUNCTION python_min_map_4(x0 FLOAT, x1 FLOAT, x2 FLOAT, x3 FLOAT)
RETURNS FLOAT LANGUAGE PYTHON_MAP {
return numpy.min(x0)
};
select python_min_map_4(x0, x1, x2,x3) from table_0;
NumPy in MonetDB reference:
https://www.monetdb.org/blog/embedded-pythonnumpy-monetdb
Best regards,
Hanfeng Chen