Hello,
I had reported in https://www.monetdb.org/bugzilla/show_bug.cgi?id=3549
that string case conversion is very inefficient in MonetDB.
I had a look at the code. For each UTF8 character it performs a hash
lookup from the origin case bat and finds the corresponding character in
the destination case bat.
However, this is an overkill for ASCII characters:
- for letters, [A-Z] + 32 = [a-z]
- all other ASCII characters stay the same
With the assumption that single-byte characters are very frequent in most
texts, it makes sense to invest in a simple test and perform the hash
lookup only for multi-byte characters.
I tested this on 831MB (over 360K tuples) of standard English text:
- original str.toLower/str.toUpper: 101 seconds (8 MB/s)
- modified version: 3.6 seconds (230 MB/s)
I guess that even when the text is highly multi-byte oriented the added
test wouldn't hurt that much.
A side-observation perhaps worth investigating is why that hash lookup is
so expensive.
Please find my patch in attachment.
Roberto
The MonetDB team at CWI/MonetDB BV is pleased to announce the
Jul2015-SP2 bugfix release of the MonetDB suite of programs.
More information about MonetDB can be found on our website at
<http://www.monetdb.org/>.
For details on this release, please see the release notes at
<http://www.monetdb.org/Downloads/ReleaseNotes>.
As usual, the download location is <http://dev.monetdb.org/downloads/>.
Jul 2015-SP2 bugfix release
Bug Fixes
* 2014: 'null' from copy into gets wrong
* 3817: opt_pushselect stuck with multi-table UDF
* 3835: windows does not release ram after operations
* 3836: rand() only gets evaluated once when used as an expression
* 3838: Update column with or without parenthesis produce different
results
* 3840: savepoints may crash the database
* 3841: mclient fails with response "Challenge string is not valid"
* 3842: SQL execution fails to finish and reports bogus error
messages
* 3845: Too many VALUES in INSERT freeze mserver5
* 3847: Wrong SQL results for a certain combination of GROUP BY /
ORDER BY / LIMIT
* 3848: mserver segfault during bulk loading/updating
* 3849: HUGEINT incorrect value
* 3850: DEL character not escaped
* 3851: expression that should evaluate to FALSE evaluates to TRUE in
SELECT query
* 3852: CASE statement produces GDK error on multithreaded database:
BATproject does not match always
* 3854: Complex expression with comparison evaluates incorrectly in
WHERE clause
* 3855: Foreign key referencing table in a different schema - not
allowed.
* 3857: Large LIMIT in SELECT may abort the query
* 3861: Using window functions cause a crash
* 3864: Error in bulk import for chinese character
* 3871: NOT x LIKE triggers "too many nested operators"
* 3872: mserver crashes under specific combination of JOIN and WHERE
conditions
* 3873: mserver5: gdk_bat.c:1015: setcolprops: Assertion `x != ((void
*)0) || col->type == 0' failed.
* 3879: Database crashes when querying with several UNION ALLs.
* 3887: Querying "sys"."tracelog" causes assertion violation and
crash of mserver5 process
* 3889: read only does not protect empty tables
* 3895: read only does not protect this table
Steps for error:
1- In testing with java client and JDBC 2:19, an error occurs when
saving decimal field, for example the value 11:23 is recorded as 13.
2- When I run the same test with the 2.8 driver, saves normally and correct.
Not only is the 2.19 driver, all above 2.8 is that same mistake.
My Java client is Pentaho Data Integration, My MonetDB is JUL2015-SPI
I go zip package test, and send fo tests.
Att,
--
Luciano Sasso Vieira
Data Scientist & Solutions Architect
luciano(a)gsgroup.com.br <http://www.gsgroup.com.br> | tel: 17 3353-0833
| cel: 17 99706-9335
www.gsgroup.com.br <http://www.gsgroup.com.br>
---
Este email foi escaneado pelo Avast antivírus.
https://www.avast.com/antivirus
Hello!
Thanks for the answer. What would be the branch to base my patches /
developments on? I see two candidates in
http://dev.monetdb.org/hg/MonetDB/branches:
- default
- Jul2015
Could you advise?
Thanks!
> The documentation for setting up the development environment and for
> getting the source code is pretty good, but I am missing a process
> description, how a volunteer can participate in the software
> development process. E.g. how are bugs assigned,
You could start by picking a bug, trying to fix it and then attach the
patch to the bug description. We will have a a look at it for sure.
> submitted, what needs to be done besides code production?
Writing test cases for bugs without them is also a nice way of getting started.
--
Jörg Strebel
Aachener Straße 2
80804 München
Hello!
I find MonetDB as a software product interesting and would be
interested in contributing to its development (and learning about
In-Memory databases in the process) .
The documentation for setting up the development environment and for
getting the source code is pretty good, but I am missing a process
description, how a volunteer can participate in the software
development process. E.g. how are bugs assigned, how are patches
submitted, what needs to be done besides code production?
Is there a description of sorts?
Best regards
Jörg
--
Jörg Strebel
Aachener Straße 2
80804 München
Hello,
I am looking into these column oriented databases for a while and wondered
with monetdb's performance over others.
I started studying about monet and found very interesting. Many concepts
you use are really interesting.
I want to know how monet executes a query which contain string columns in
it. Does it maintain a dictionary or what kind of index it uses or is there
any inverted index involved for strings etc.,
Any kind of material, source code portions or any other documentation
regarding this, is most welcome. I would love to contribute back from my
side once I get to know what is happening.
Regards,
Mit