Hello,
I have a large database that is commonly used for relatively simple
SELECT queries. Some of these can take quite a bit of time (hours...
overnight) so I've been looking into it a bit. What seems clear is that
mserver5 (Oct2014-SP1) is disk-bound, but one curious thing is that most
of the time it is write-bound. Despite doing only SELECT queries (no
updates or inserts), the amount of data being written is nearly two
times as much as data read.
Why is that? Is the server restructuring the database on the fly in
order to get better performance in later queries? 3X the disk I/O seems
excessive so I wonder if there is some way to tune this, or investigate
further to see what is happening?
Thanks!
Tim
Hi All,
I have been looking into UPSERTING records in MonetDB but didn't find
anything.
Only few years before somebody had asked the same question -
https://www.monetdb.org/pipermail/users-list/2011-May/004916.html but no
solution for that.
Since its been few years from then , so I was wondering if MonetDB does
provide the ability to do UPSERTS.
Any pointers here will be really helpful.
Thanks
The MonetDB team at CWI/MonetDB BV is pleased to announce the
Oct2014-SP2 bugfix release of the MonetDB suite of programs.
More information about MonetDB can be found on our website at
<http://www.monetdb.org/>.
For details on this release, please see the release notes at
<http://www.monetdb.org/Downloads/ReleaseNotes>.
As usual, the download location is <http://dev.monetdb.org/downloads/>.
Oct 2014-SP2 bugfix release
Client Package
* Changes to the Perl interface, thanks to Stefan O'Rear: 1. removes
"use sigtrap", because this has global effects and should not be
used by modules, only by the application. 2. allows Perl 5.8.1+
Unicode strings to be passed to quote() and included in statements
(UTF-8 encoded, as expected by Monet's str module) 3. quote and
unquote now use the same quoting rules as the MonetDB server,
allowing for all characters except NUL to be round-tripped 4.
several character loops have been reimplemented in regex for much
greater performance 5. micro-optimizations to the result fetch loop
6. block boundaries are preserved in piggyback data so that Xclose
is not appended or prepended to a SQL command 7. diagnostic
messages #foo before a result header are ignored, this is necessary
to use recycler_pipe 8. fail quickly and loudly if we receive a
continuation prompt (or any other response that starts with a
non-ASCII character) 9. header lines must start with %, not merely
contain %, fixing a bug when querying a table where string values
contain % 10. after closing a large resultset, account for the fact
that a reply will come and do not lose sync 11. allow a MAPI_TRACE
environment variable to dump wire protocol frames to standard
output 12. fixes maximum MAPI block size to match the server limit
of 16k. previously would crash on blocks larger than 16k
SQL
* Fixed a typo in a column name of the sys.tablestoragemodel view
(auxillary changed to auxiliary).
Bug Fixes
* 3467: Field aliases with '#' character excise field names in result
set.
* 3605: relational query without result
* 3619: Missing dll on MonetDB Start
* 3622: Type resolution error
* 3624: insert of incomplete or invalid ip address values in inet
column is silently accepted but the values are not stored (they
become/show nil)
* 3626: casting a type without alias results in program contains
errors
* 3628: mclient and ODBC driver report 'type mismatch' when
stddev_pop used in a select which returns 0 rows
* 3629: IF THEN ELSEIF always evaluates the first test as true
* 3630: segv on rel_order_by_column_exp
* 3632: running make clean twice gives an error in
clients/ruby/adapter
* 3633: Wrong result for HAVING with floating-point constant
* 3640: Missing implementation of scalar function: sql_sub(<date>,
<month interval>)
* 3641: SQL lexer fails to detect string end if it the last character
is U+FEFF ZERO WIDTH NO-BREAK SPACE
* 3642: Combined WHERE conditions less-than plus equals-to produce
incorrect results
* 3643: Missing implementations of scalar function: sql_sub(<timetz>,
arg2)
* 3644: COPY INTO fails to import "inet" data type when value has
prefix length in CIDR notation
* 3646: ORDER BY clause does not produce proper results on 'inet'
datatype
* 3649: recycler crashes with concurrent transactions
Hi! I am trying to convert Lucene index into MonetDB with my app (
https://github.com/hjortron/LuceneToMonetDBConverter). App iterates through
all Lucene docs and creates "insert into" query for each. First time
problem appears on 80k row(and then dramatically falls down to 250 row),
app throws error: INSERT INTO
sys."EventTracking"(datetime,Hash,UserIP,EventType,SearchResultState,ViewType,PageNumber,ResultsCount,ElapsedTime)
VALUES('2014-12-10
13:22:41','hod','127.0.0.1','Search','>No',0,0,2368195,85)
Unhandled Exception: System.Data.Odbc.OdbcException: ERROR [HY000]
[MonetDB][ODBC Driver 11.19.7]General error
In merovingian.log I have only: "2015-01-26 00:43:03 ERR discovery[846]:
cannot retrieve name info: Temporary failure in name resolution", that
repeats each 30 minutes. On another instance of MonetDB my app works fine.
Please help.
Hi all,
I am wondering if there are any plans to develop horizontal sharding
support for MonetDB? There aren't really other open source databases out
there that support horizontal sharding with anywhere close to the
performance of MonetDB.
I realise there are a lot of challenges and scalability issues with a
lot of typical DB operators, but for certain use cases that I have seen
it could work really well, likely significantly faster than
HBase/SparkSQL and it's rivals.
Does anyone else think this would be a desirable feature?
Best regards,
Alastair
Hi,
No changes done to the basic setup.
I'm running OLAP queries and not all 54 columns are used in any query, it depends on the query which columns are used.
The query is kind of SELECT SUM(),AVG(), SUM() + WHERE + GROUP BY
The WHERE columns have a great compression so it's faster to find by them.
My last test runs in 340 ms and aggregates between 900 and 2000 rows per result.
Care: "select * from table where key = 123456" (not an expert but this is pure OLTP)
Same for me, "select * from bigcalendar limit 2" -> Sometimes I have to kill it, :) but it's something that I don't need at all.
I'm using MonetDB as a kind of cache of my OLTP system.
IMO MonetDB is perfect for OLAP (not only analytical, operational as well! Now I'm working on the refresh approach)
if you are looking for a fast search engine by key, maybe a pure key/value-store is more appropriate.
Hope this helps.
Gustavo Brian
Developer
-----Original Message-----
From: users-list [mailto:users-list-bounces+gustavo.brian=sunhotels.net@monetdb.org] On Behalf Of users-list-request(a)monetdb.org
Sent: martes, 27 de enero de 2015 12:00
To: users-list(a)monetdb.org
Subject: users-list Digest, Vol 29, Issue 20
Send users-list mailing list submissions to
users-list(a)monetdb.org
To subscribe or unsubscribe via the World Wide Web, visit
https://www.monetdb.org/mailman/listinfo/users-list
or, via email, send a message with subject or body 'help' to
users-list-request(a)monetdb.org
You can reach the person managing the list at
users-list-owner(a)monetdb.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of users-list digest..."
Today's Topics:
1. Re: Best background refresh strategy (shamsul hassan)
2. Join index structure (Joseph D'silva)
----------------------------------------------------------------------
Message: 1
Date: Mon, 26 Jan 2015 11:44:49 +0000
From: shamsul hassan <shamsulbuddy(a)gmail.com>
To: Communication channel for MonetDB users <users-list(a)monetdb.org>
Subject: Re: Best background refresh strategy
Message-ID:
<CAONo-dtZpv4LfhKrM+cCh7tZcsDM4daTU80rLigcQR-ruYEbag(a)mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
*338 million rows / 54 columns table and ran queries in ms* -- WOW .. Do you mind sharing your server configuartion and the setting for mserver5.
Also what kind of queries are you firing .. ??
I find it bit slow on line level queries , something like select * from table where key = 123456
Thanks
On Sun, Jan 25, 2015 at 6:56 AM, Martin Kersten <martin(a)monetdb.org> wrote:
> On 25/01/15 06:48, Gustavo Brian wrote:
>
>> Hi,
>>
> Hi,
>
> thanks for the interest and appraisal.
>
>>
>> I'm trying monetdb as a bigtable for calendar based queries.
>> I've created a 338 million rows / 54 columns table and ran queries in ms.
>> That's perfect!
>> I wonder to see if monetdb is suitable for a search service with high
>> traffic (several hundred request per second) and which will the best
>> data update strategy because during the "updates/insert/copy into"
>> actions everything gets blocked.
>>
> Queries should not be a problem, but for updates you should read and
> understand
> https://www.monetdb.org/blog/monetdb-sql-transaction-management-scheme
> https://www.monetdb.org/Documentation/Manuals/SQLreference/Transaction
> s
>
> I would like to know how cluster mode will help as well.
>>
> Depends on the complexity of the query and your machine specs.
> regards, Martin
>
>
>> Thanks,
>> Gustavo Brian
>> _______________________________________________
>> users-list mailing list
>> users-list(a)monetdb.org
>> https://www.monetdb.org/mailman/listinfo/users-list
>>
>>
> _______________________________________________
> users-list mailing list
> users-list(a)monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
Hello,
Does anyone know the Join index structure used in the database ?
(implementation) and how it is being utilized by the optimizer ? I couldn't
find any documentation / papers out there other than a single sentence
mentioned at the end of this page, which is not much.
https://www.monetdb.org/Documentation/Manuals/MonetDB/Optimizers/Landscape
"*Foreign Key replacements Goal: to improve multi-attribute joins over
foreign key constraints Rationale: the code produced by the SQL frontend
involves foreign key constraints, which provides many opportunities for
speedy code using a join index. Impact: large Status: Implemented in the
SQL strategic optimizer.*"
Thanks
Joseph
Hi,
I'm trying monetdb as a bigtable for calendar based queries.
I've created a 338 million rows / 54 columns table and ran queries in ms. That's perfect!
I wonder to see if monetdb is suitable for a search service with high traffic (several hundred request per second) and which will the best data update strategy because during the "updates/insert/copy into" actions everything gets blocked.
I would like to know how cluster mode will help as well.
Thanks,
Gustavo Brian
Hi All,
I have recently install monetdb "OCT 2014-SP1" release on my Oracle
Enterprise 6 Linux server.
Server configuration -
8 CPU Cores
64 GB RAM
I have loaded a data set with 70 columns having 15 Millions Rows.
I have started mserver5 as below --
mserver5 --debug=10 --set *gdk_nr_threads=8* --dbpath=/data/monetdb/testdb
--set mapi_open=true --set mapi_port=54321 --set
merovingian_uri=mapi:monetdb://localhost:54321/testdb --set
mapi_usock=/data/monetdb/testdb/.mapi.sock --set
monet_vault_key=/data/monetdb/testdb/.vaultkey --set
sql_optimizer=default_pipe --set max_clients=64
Now I have ran a query as below --
select product_type,COUNT(DISTINCT sales_document),sum(ord_qty) FROM
bookings where ord_entry_date between '2014-04-01' and '2014-10-01' group
by product_type;
This query took around 90 Secs ..
So I started tomograph and checked plotted graph ... which shows
the parallelism usage as 39.9 % only and its utilizing only 1 thread with
maximum time went in the step - group.subgroupdone almost 98%.
My Question is that even I have started the server with gdk_nr_threads=8 ,
then why it is just using single thread.
More info --
mserver5 --version
MonetDB 5 server v11.19.7 "Oct2014-SP1" (64-bit, 64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2014 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 63.0GiB available memory, 8 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.1e 11 Feb 2013 (compiled with OpenSSL 1.0.1e-fips 11
Feb 2013)
libxml2: 2.7.6 (compiled with 2.7.6)
Compiled by: root@localhost (x86_64-unknown-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64
Any pointers ??
Thanks
As mentioned in previous posts, I have some use cases for MonetDB, which act on tables with a billion rows. MonetDB works fast and stable on straight SELECT statements or aggregates on single tables
Problems occured when doing big joins, which should produce a result set of 30 rows only, but running hours before failing.
SELECT
tbl1.*,
tbl2.*
FROM
tbl1,
tbl2
WHERE
tbl1.xid = tbl2.xid
AND tbl1.pk = 12345678
(same behavior, if written with JOIN explicitly)
So I am wondering, if I can help MonetDB with partitioning or sharding.
Asked already here:
https://www.monetdb.org/pipermail/developers-list/2014-October/004325.html
What is the state regarding manual partitioning?
During last weeks I saw significant acceleration with sharding on other DBMS. Is there anybody, who can give some introduction or share experiences with sharding on MonetDB? 3rd party tools?
What ist the maximum number of tables within a MonetDB database? What is the estimated overhead of a single table, multiplied a million times?
Thank you
Robert