Mark,
I think there is a bug in the implementation of the PYTHON_MAP aggregate functions.
The return value of a PYTHON_MAP aggregate function can change if there are other functions in the query.
The return value of my function weighted_percentile_0 (definition below) changes if I include the function median() in my query. The results are incorrect if median() isn't in the query, but are correct if median() is in the query. In this test case, the weights (v2) are all 1 and the values (v1) are uniformly sampled integers from [1,10]. Number of rows for each fctr is ~1000, and my function should reduce to min(v1) in this case.
Is some information bleeding between threads? Should I submit a bug report? This is on the default branch.
Thanks,
Dave
sql>select fctr,weighted_percentile_0(v1,v2),min(v1) from mini where fctr < 10 group by fctr order by fctr;
+------+--------------------------+------+
| fctr | L1 | L2 |
+======+==========================+======+
| 1 | 3 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 1 |
| 4 | 1 | 1 |
| 5 | 3 | 1 |
| 6 | 3 | 1 |
| 7 | 2 | 1 |
| 8 | 1 | 1 |
| 9 | 2 | 1 |
+------+--------------------------+------+
9 tuples (126.928ms)
sql>select fctr,weighted_percentile_0(v1,v2),min(v1),median(v1) from mini where fctr < 10 group by fctr order by fctr;
+------+--------------------------+------+------+
| fctr | L1 | L2 | L3 |
+======+==========================+======+======+
| 1 | 1 | 1 | 5 |
| 2 | 1 | 1 | 5 |
| 3 | 1 | 1 | 5 |
| 4 | 1 | 1 | 5 |
| 5 | 1 | 1 | 5 |
| 6 | 1 | 1 | 5 |
| 7 | 1 | 1 | 5 |
| 8 | 1 | 1 | 5 |
| 9 | 1 | 1 | 5 |
+------+--------------------------+------+------+
9 tuples (519.195ms)
sql>
CREATE AGGREGATE weighted_percentile_0(a DOUBLE, w DOUBLE)
RETURNS DOUBLE
LANGUAGE PYTHON_MAP {
import numpy as np
# Standardize and sort based on values in a
q = np.array([0]) / 100.0
idx = np.argsort(a)
a_sort = a[idx]
w_sort = w[idx]
# Get the cumulative sum of weights
ecdf = np.cumsum(w_sort)
# Find the percentile index positions associated with the percentiles
p = q * (w_sort.sum() - 1)
# Find the bounding indices (both low and high)
idx_low = np.searchsorted(ecdf, p, side='right')
idx_high = np.searchsorted(ecdf, p + 1, side='right')
idx_high[idx_high > ecdf.size - 1] = ecdf.size - 1
# Calculate the weights
weights_high = p - np.floor(p)
weights_low = 1.0 - weights_high
# Extract the low/high indexes and multiply by the corresponding weights
x1 = np.take(a_sort, idx_low) * weights_low
x2 = np.take(a_sort, idx_high) * weights_high
wp = np.add(x1,x2)
return(wp[0])
};
hi, monetdb does not support SUM() OVER commands.. does anyone have a
smart alternative to implement this? thanks
SELECT
batch_id,
job_count,
SUM(job_count) OVER (ORDER BY duration) as cumjobs,
SUM(job_count) over () as totjobs,
duration
FROM
test_data ;
Hello,
In my db I have a table with ~300million rows on which I have to perform ~1k aggregation queries periodically.
I have noticed that if I delete even a small amount of rows from the table I have a greatly reduced performance compared to when no delete has been made.
Specifically, with no delete the 1k queries complete in about 2 hours, if I delete a few records the task completes in about 10-12 hours.
Inserting new rows in the table does not present the issue. Moreover if I dump and restore the table after the delete, everything’s back to normal and the queries execute fast.
So the issue seems to be related to the fact that monet does not actually delete the records from the bat file, right? Are you aware of the issue?
Is there some kind of vacuum operation available? If not, what is the fastest way to dump or copy and restore the table; or is there any other trick that may solve the issue?
Any suggestion would be much appreciated,
Thank you
Stefano
Hi:
I have a MonetDB 5 server v11.23.7 "Jun2016-SP1 instance hosting a
table with over 5600 columns (for the US General Social Survey)
If we run a simple query like
SELECT * FROM gss7214_v4 limit 2
we get the first two records as expected.
However, the same query with all the column names explicitly stated like
select
"year","id","wrkstat","hrs1","hrs2","evwork","occ","prestige","wrkslf",<...thousands
of additional columns - see attached...> from "gss7214_v4" WHERE
"year"=1972 LIMIT 2 OFFSET 0;
seem to kill the server. The process keeps running but become
non-responsive.
Now this is on a AWS EC2 t2.small instance with just ~2Gb or memory.
Maybe more memory may solve this issue (?). But in any case concerned
that the server hangs without warning or error message in the log. Any
potential workaround? Suggestion most appreciated
best
*P
Hello,
In my db I have a table with ~300million rows on which I have to perform ~1k aggregation queries periodically.
I have noticed that if I delete even a small amount of rows from the table I have a greatly reduced performance compared to when no delete has been made.
Specifically, with no delete the 1k queries complete in about 2 hours, if I delete a few records the task completes in about 10-12 hours.
Inserting new rows in the table does not present the issue. Moreover if I dump and restore the table after the delete, everything’s back to normal and the queries execute fast.
So the issue seems to be related to the fact that monet does not actually delete the records from the bat file, right? Are you aware of the issue?
Is there some kind of vacuum operation available? If not, what is the fastest way to dump or copy and restore the table; or is there any other trick that may solve the issue?
Any suggestion would be much appreciated,
Thank you
Stefano
Hi,
so I read several times on the mailing-list, that it is indeed possible
to run several MonetDB instances on the same server. I tried this myself
with the following configuration:
System: Linux ... 3.13.0-79-generic #123-Ubuntu SMP Fri Feb 19 14:27:58
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
MonetDB version: mercurial revision 324165f1d055 from branch default
(also tested with the topmost version)
Configure script parameters: --disable-geom --disable-fits
--disable-netcdf --disable-lidar --disable-shp --disable-gsl
--disable-rintegration --disable-pyintegration --disable-jdbc
--prefix=/home/bruder/monetdb-root --disable-assert --disable-testing
--enable-optimize --program-suffix='-dsh'
1. Another version is running with a different user on the default port
50000
2. With my user, I set up a local db-farm, using port 49999. (A prefix
is used on binaries, to prevent any conflicts with the existing version.)
Creating a new database with *monetdbd* works, but once I try to start
it with *monetdb*, it gives me the following error message:
starting database 'test'... FAILED
start: starting 'test' failed: database 'test' appears to shut
itself down after starting, check monetdbd's logfile for possible hints
Another thing I tried was starting mserver5 by hand: mserver5-dsh
--dbpath=/home/bruder/monetdb-dbfarm/test --set
merovingian_uri=mapi:monetdb://dbiron2:49999/test --set mapi_open=false
--set mapi_port=0 --set
mapi_usock=/home/bruder/monetdb-dbfarm/test/.mapi.sock --set
monet_vault_key=/home/bruder/monetdb-dbfarm/test/.vaultkey --set
gdk_nr_threads=16 --set max_clients=64 --set sql_optimizer=default_pipe
--set monet_daemon=yes
This time it won't accept my password (I assume the default password is
*monetdb*): mclient-dsh -p49999 -dtest -umonetdb
InvalidCredentialsException:checkCredentials:invalid credentials
for user 'monetdb'
I added the log file and the properties of the db-farm.
Regards,
Moritz
Is there a place to make suggestions on improvement for monetdb?
Three things that probably seem minor compared to the goals of monetdb, but
which would be very helpful in my opinion:
- better copy to csv, a la postgres, with keywords lile "csv" and "header"
- ctrl-c to cancel a query but not kill the mclient sesion
- history kept and re-used between mclient sessions
--
http://yves.zioup.com
gpg: 4096R/32B0F416
Hi all,
I'm working on a utility to bulk load data into MonetDB using the binary
bulk loading options. At the moment I'm running into the issue whereby I'm
unable to insert real NULL values in varchar columns. Only a literal "null"
is inserted, which obviously fails the 'WHERE col IS NULL' condition. Can
anyone tell me what value is used to import a null value into a varchar
column?
Best regards,
Dennis
Hi,
I just noticed a bug in the current revision (accc14b26ef3) of the
default branch. I added a short mal snippet to reproduce the issue. I
guess it's somehow related to sorting a void BAT, but I couldn't reduce
the bug to a simple BAT (without the TPCH table).
Regards,
Moritz
Are there any MonetDB specific advice/instructions for using a RAID 0 SSD array with MonetDB beyond make sure you have backups?
I’m not sure what I’m looking for, other than may be tips on how to configure a dbfarm (i.e. monetdbd/mserver5 properties).
Thanks,
Vince
The information transmitted, including any attachments, is intended only for the individual or entity to which it is addressed, and may contain confidential and/or privileged information. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by individuals or entities other than the intended recipient is prohibited, and all liability arising therefrom is disclaimed. If you have received this communication in error, please delete the information from any computer and notify the sender.