Hello,
Running the same query on the same data leeds to different results
between postgresql and monetdb.
I have not been able to get a simple example to reproduce the problem.
So here is what I observe:
the query:
select temps_mois.rfoperdmo as c1,
sum((case when dwhinv.dwhinv___rfodomide = 'RH' and
dwhinv.dwhinv___rfoindide = 'tech_rh_effectif_rem_reel' then
dwhinv.dwhinvqte else 0 end)) as m0
from rfoper_temps_mois as temps_mois,
dwhinv as dwhinv,
rfovsn as rfovsn_0,
rrhamv as rrhamv_1,
rrhcov as rrhcov_2,
rfoadv as rfoadv_3
where temps_mois.rfoper___rforefide = 'HPLUS'
and dwhinv.dwhinv___rforefide = 'HPLUS'
and dwhinv.dwhinv___rfodomide = 'RH'
and dwhinv.dwhinv___rfoindide = 'tech_rh_effectif_rem_reel'
and dwhinv.dwhinvdtd = temps_mois.rfoperdtd
and temps_mois.rfoperyea = '2011'
and rfovsn___rforefide = 'HPLUS'
and dwhinv.dwhinv___rfovsnide = rfovsn_0.rfovsnide
and rfovsn_0.rfovsnide = '201111_reel'
and rrhamv_1.rrhamv___rrhvemide='GRACOR'
AND rrhamv_1.rrhamvrvs=1
AND rrhamv_1.rrhamv___rforefide= 'HPLUS'
and dwhinv.dwhinv___rrhempide = rrhamv_1.rrhamvinf
and rrhamv_1.rrhamvsup = 'CEMP'
and rrhcov_2.rrhcov___rrhvcoide='CONTRATS'
AND rrhcov_2.rrhcovrvs=1
AND rrhcov_2.rrhcov___rforefide= 'HPLUS'
and dwhinv.dwhinv___rrhcntide = rrhcov_2.rrhcovinf
and rrhcov_2.rrhcovsup = 'CONTRATS'
and rfoadv_3.rfoadv___rfovdeide='STRC'
AND rfoadv_3.rfoadvrvs=1
AND rfoadv_3.rfoadv___rforefide= 'HPLUS'
and dwhinv.dwhinv_p2rfodstide = rfoadv_3.rfoadvinf
and rfoadv_3.rfoadvsup = 'HPLUS'
group by c1
order by c1
Postgresql result:
c1 | m0
----+-----------------
01 | 7111.5376967750
02 | 7100.9108821426
03 | 7150.2597967742
04 | 7151.4283666667
05 | 7109.1641451610
06 | 6976.2108421239
07 | 6329.7404193564
08 | 6302.9823032247
09 | 6426.9459633351
10 | 6519.6889580648
11 | 6549.5235033402
12 | 6492.5477161292
(12 lignes)
Monetdb result:
+------+----------------------+
| c1 | m0 |
+======+======================+
| 12 | 1.000000 |
| 11 | 1.000000 |
| 10 | 2.000000 |
| 09 | 0.366667 |
| 08 | 1.000000 |
| 07 | 1.000000 |
| 06 | 1.000000 |
| 05 | 3.000000 |
| 04 | 4.000000 |
| 03 | 15.000000 |
| 02 | 9.964286 |
| 01 | 1.000000 |
+------+----------------------+
12 tuples (212.346ms)
As you see, all "m0" values are very different.
I have no idea of what is causing this but if I modify the query in
order to simplify the CASE...WHEN...ELSE part of the select, it produce
the expected result:
replacing sum((case when dwhinv.dwhinv___rfodomide = 'RH' and
dwhinv.dwhinv___rfoindide = 'tech_rh_effectif_rem_reel' then
dwhinv.dwhinvqte else 0 end)) as m0
by sum((case when dwhinv.dwhinv___rfoindide =
'tech_rh_effectif_rem_reel' then dwhinv.dwhinvqte else 0 end)) as m0
Its not exactly the same query but it must leeds to the same result due
to my test datas (its the case in postgres)
I join to this email two files with the TRACES, respectively for the
request with bad/correct result (ko.txt/ok.txt)
Config:
Ubuntu Server 11.04 x64
MonetDB v11.7.9 (Dec2011-SP2), MonetDB Database Server v1.6
(Dec2011-SP2), MonetDB Database Server Toolkit v1.0 (Dec2011-SP2)
Thank you very much for your help!!
--
*Matthieu Guamis*
*Logo Axège <http://www.axege.com/>* /Axège//
23,rue Saint-Simon
63000 Clermont-Ferrand/
Tél: +33 (0)4 63 05 95 40
Fax: +33 (0)4.73.70.65.29
Email: matthieu.guamis(a)axege.com <mailto:matthieu.guamis@axege.com>
We are seeing a very strange behavior from MonetDB from simple queries on a
single table.
In monthly_tou_avg_price table, we have monthly average aggregate data that
were inserted from a very large table (> 2 billion) with hourly data. We
have created the aggregate table to speed up certain queries. Strangely, we
are getting incorrect results from very simple queries on the aggregate
table.
Q1 below returns 9,890 records with yyyy_mm column having empty value for
the 2012-07 records. All of the 2012-07 records are missing the yyyy_mm
value. We ran the SQL statement that inserted those records and it does not
show any empty yyyy_mm values. If we check for empty string or null value
in yyyy_mm column (Q2 and Q3) against the aggregate table, we do not get
any hits. If we query for yyyy_mm= '2012-07' as in Q4, we get exactly 9890
records showing yyyy_mm populated with value 2012-07. How could this be
possible, Q1 showing empty yyyy_mm values for the same 9,890 records and Q4
showing 2012-07 yyyy_mm values?
Q1: select * from monthly_tou_avg_price where header_id = 6
Q2: select * from monthly_tou_avg_price where header_id = 6 and yyyy_mm = ''
Q3: select * from monthly_tou_avg_price where header_id = 6 and yyyy_mm is
null
Q4: select * from monthly_tou_avg_price where header_id = 6 and yyyy_mm =
'2012-07'
Q5 below returns empty value for yyyy_mm even though yyyy_mm column does
not have any empty or null values. Q6 does not return any empty or null
yyyy_mm values and contradicts Q5 result by returning 2012-07. Furthermore,
Q7 contradicts Q5 with 9,890 records with 2012-07 value in yyyy_mm column.
Q5: select distinct yyyy_mm from monthly_tou_avg_price where header_id = 6
Q6: select distinct yyyy_mm from monthly_tou_avg_price
Q7: select * from monthly_tou_avg_price where header_id = 6 and years =
2012 and months = 7
yyyy_mm L15
9890
2009-04 7574
2009-05 7574
2009-06 7750
2009-07 7786
2009-08 7788
2009-09 7828
2009-10 7814
2009-11 7844
2009-12 8322
2010-01 8317
2010-02 8363
2010-03 8361
2010-04 9214
2010-05 9124
2010-06 9156
2010-07 9165
2010-08 9195
2010-09 9197
2010-10 9198
2010-11 9234
2010-12 9232
2011-01 9246
2011-02 9238
2011-03 9696
2011-04 9672
2011-05 9670
2011-06 9668
2011-07 9656
2011-08 9698
2011-09 9682
2011-10 9724
2011-11 9706
2011-12 9768
2012-01 9766
2012-02 9764
2012-03 9832
2012-04 9824
2012-05 9826
2012-06 9902
We also ran an UPDATE statement to set yyyy_mm to 2012-07 for rows with
header_id = 6 and years = 2012 and months = 7. And, 9,890 records were
successfully updated. However, Q1 through Q8 returned same results as above
with no improvement.
We have not seen this kind of issues in other tables. This problem seems to
be limited to a table that is populated with an INSERT INTO statement. We
checked the result returned by the select portion of Q8 and yyyy_mm is
populated with 2012-07. But, the aggregate table seems to end up in a weird
state where it returns incorrect results.
Q8: insert into "monthly_tou_avg_price"
select a.header_id, extract(year from a.opr_date) as y, extract(month
from a.opr_date) as m,
b.yyyymm as yyyy_mm, b.svalue as TOU, a.resource_name,
round(avg(a.lmp_cong_prc),5), round(avg(a.lmp_ene_prc),5),
round(avg(a.lmp_loss_prc),5), round(avg(a.lmp_prc),5),
round(avg(a.ns_clr_prc),5),
round(avg(a.rd_clr_prc),5), round(avg(a.ru_clr_prc),5),
round(avg(a.sp_clr_prc),5), round(avg(a.shadow_prc),5),
round(avg(a.fuel_prc),5), count(lmp_cong_prc) as
actual_lmpcong_tou_count
from oasislive.price a
join oasislive.time b
on a.yyyymmddhh = b.yyyymmddhh
where a.opr_date between '2012-07-01' and '2012-07-31'
group by a.header_id, y, m, yyyy_mm, TOU, a.resource_name
order by y, m, a.header_id, TOU, a.resource_name;
We can see this INSERT INTO related issue with other queries. Q9 shows that
we do not have any values for 2011-03 ON period which is incorrect. In
fact, if we specify '2011-03' as in Q10, it shows that we do in fact have
values for the 2011-03 ON period!
Q9: select yyyy_mm, tou, avg(lmp_prc) from monthly_tou_avg_price group by
yyyy_mm, tou order by yyyy_mm, tou
yyyy_mm tou L6
.
.
2011-03 OFF 16.721532797520894
2011-03 ON (null)
2011-04 OFF 15.427080074671187
.
.
Q10: select yyyy_mm, tou, avg(lmp_prc) from monthly_tou_avg_price where
yyyy_mm = '2011-03' group by yyyy_mm, tou order by yyyy_mm, tou
yyyy_mm tou L11
2011-03 OFF 16.72153279752101
2011-03 ON 30.276486575893575
I apologize for the long-winded email. I just wasn't sure how to describe
what I was seeing.... Please let me know if there is any additional
information that I can provide you with. By the way, we are currently using
August 2011 -SP2 release, but saw this issue in older releases as well.
Best regards,
Henry
We are currently running MonetDB August 2011 SP2 release on CentOS
create table tt(id int, name varchar(20), value int);
create table t2(id int);
CREATE TRIGGER trg BEFORE INSERT ON tt
REFERENCING NEW as newrow
FOR EACH ROW
BEGIN ATOMIC
INSERT INTO t2 (id) values (newrow.id);
END
insert into tt values(1, 'a', 10);
insert into tt values(2, 'b', 20);
insert into tt values(3, 'a', 30);
insert into tt values(4, 'a', 40);
The result vales in t2 are:
After insert#1: EMPTY
After insert#2: {1}
After insert#3: {1, 1, 2}
After insert#4: {1, 1, 2,1,2,3}
Tested this in Mysql and the same trigger works well there.
Expected t2 values after insert#4: {1,2,3,4}
Regards,
Tapomay.
Hi,
I have setup a cluster of 3 machines with following tags:
mydb/1/master, mydb/2/master, mydb/3/master
On a fourth machine I have created a multiplex funnel as follows:
monetdb create -m monetdb+monetdb@*/mydb/1/master,monetdb+monetdb@*/mydb/2/master,monetdb+monetdb@*/mydb/3/master mydbmaster
My objective here is that I use this multiplex for only DDL/INSERTS/UPDATES.
This would ensure replication on all the 3 masters thus achieving a hacked form of master-master replication.
Then I could use any one of the masters for analytics querying.
However I am stuck with the following issues while handling fail-over.
1. I expect that if one of the boxes die, others keep getting the updates until I realize the failure. The I could lock the multiplex, sync the dbfarms to catch up the failing db and then release the multiplex.
2. If the box with tag mydb/3/master dies, then the updates go to 1 and 2. But if mydb/1/master dies the none of the dbs get any more updates. Looks like the multiplex returns on the first failure without trying rest of the dbs.
Any pointers as to where do I change this sequential failure logic?
Also what do you think about my replication approach?
Thanks and Regards,
Tapomay
Please ignore my earlier email. I accidently sent the email before it was
completed!
We are seeing a very strange behavior from MonetDB from simple queries on a
single table.
In monthly_tou_avg_price table, we have monthly average aggregate data that
were inserted from a very large table (> 2 billion) with hourly data. We
have created the aggregate table to speed up certain queries. Strangely, we
are getting incorrect results from very simple queries on the aggregate
table.
Q1 below returns 9,890 records with yyyy_mm column having empty value for
the 2012-07 records. All of the 2012-07 records are missing the yyyy_mm
value. We ran the SQL statement that inserted those records and it does not
show any empty yyyy_mm values. If we check for empty string or null value
in yyyy_mm column (Q2 and Q3) against the aggregate table, we do not get
any hits. If we query for yyyy_mm= '2012-07' as in Q4, we get exactly 9890
records showing yyyy_mm populated with value 2012-07. How could this be
possible, Q1 showing empty yyyy_mm values for the same 9,890 records and Q4
showing 2012-07 yyyy_mm values?
Q1: select * from monthly_tou_avg_price where header_id = 6
Q2: select * from monthly_tou_avg_price where header_id = 6 and yyyy_mm = ''
Q3: select * from monthly_tou_avg_price where header_id = 6 and yyyy_mm is
null
Q4: select * from monthly_tou_avg_price where header_id = 6 and yyyy_mm =
'2012-07'
Q5 below returns empty value for yyyy_mm even though yyyy_mm column does
not have any empty or null values. Q6 does not return any empty or null
yyyy_mm values and contradicts Q5 result by returning 2012-07. Furthermore,
Q7 contradicts Q5 with 9,890 records with 2012-07 value in yyyy_mm column.
Q5: select distinct yyyy_mm from monthly_tou_avg_price where header_id = 6
Q6: select distinct yyyy_mm from monthly_tou_avg_price
Q7: select * from monthly_tou_avg_price where header_id = 6 and years =
2012 and months = 7
We also ran an UPDATE statement to set yyyy_mm to 2012-07 for rows with
header_id = 6 and years = 2012 and months = 7. And, 9,890 records were
successfully updated. However, Q1 through Q8 returned same results as above
with no improvement.
We have not seen this kind of issues in other tables. This problem seems to
be limited to a table that is populated with an INSERT INTO statement. We
checked the result returned by the select portion of Q8 and yyyy_mm is
populated with 2012-07. But, the aggregate table seems to end up in a weird
state where it returns incorrect results.
Q8: insert into "monthly_tou_avg_price"
select a.header_id, extract(year from a.opr_date) as y, extract(month
from a.opr_date) as m,
b.yyyymm as yyyy_mm, b.svalue as TOU, a.resource_name,
round(avg(a.lmp_cong_prc),5), round(avg(a.lmp_ene_prc),5),
round(avg(a.lmp_loss_prc),5), round(avg(a.lmp_prc),5),
round(avg(a.ns_clr_prc),5),
round(avg(a.rd_clr_prc),5), round(avg(a.ru_clr_prc),5),
round(avg(a.sp_clr_prc),5), round(avg(a.shadow_prc),5),
round(avg(a.fuel_prc),5), count(lmp_cong_prc) as
actual_lmpcong_tou_count
from oasislive.price a
join oasislive.time b
on a.yyyymmddhh = b.yyyymmddhh
where a.opr_date between '2012-07-01' and '2012-07-31'
group by a.header_id, y, m, yyyy_mm, TOU, a.resource_name
order by y, m, a.header_id, TOU, a.resource_name;
We can see this INSERT INTO related issue with other queries. Q9 shows that
we do not have any values for 2011-03 ON period which is incorrect. In
fact, if we specify '2011-03' as in Q10, it shows that we do in fact have
values for the 2011-03 ON period!
Q9: select yyyy_mm, tou, avg(lmp_prc) from monthly_tou_avg_price group by
yyyy_mm, tou order by yyyy_mm, tou
yyyy_mm tou L6
.
.
2011-03 OFF 16.721532797520894
2011-03 ON (null)
2011-04 OFF 15.427080074671187
.
.
Q10: select yyyy_mm, tou, avg(lmp_prc) from monthly_tou_avg_price where
yyyy_mm = '2011-03' group by yyyy_mm, tou order by yyyy_mm, tou
yyyy_mm tou L11
2011-03 OFF 16.72153279752101
2011-03 ON 30.276486575893575
I apologize for the long-winded email. I just wasn't sure how to describe
what I was seeing.... Please let me know if there is any additional
information that I can provide you with. By the way, we are currently using
August 2011 -SP2 release, but saw this issue in older releases as well.
Best regards,
Henry
We are currently running MonetDB August 2011 SP2 release on CentOS
We are encountering a frequent error when writing to MonetDb repeatedly
using the Java MAPI socket. We are running the April 2012 release.
The error is:
[7/29/12 12:10:26 PM] Cory Isaacson:
000!TypeException:user.s0_2[2]:'calc.str' undefined in: _3:any :=
calc.str(_4:str)
22000!TypeException:user.s0_2[3]:'calc.str' undefined in: _5:any :=
calc.str(_6:str)
22000!TypeException:user.s0_2[4]:'calc.str' undefined in: _7:any :=
calc.str(_8:str)
22000!TypeException:user.s0_2[5]:'calc.str' undefined in: _9:any :=
calc.str(_10:str)
22000!TypeException:user.s0_2[6]:'calc.str' undefined in: _11:any :=
calc.str(_12:str)
The only solution is to stop the farm and restart the farm/database.
We have a single Java process writing to the database using MAPI, in
batches of 1000 rows at a time (or less), using the COPY INTO syntax
based on the example Java loader.We have a single Java process writing
to the database using MAPI, in batches of 1000 rows at a time (or less),
using the COPY INTO syntax based on the example Java loader.
Here are the important snippets of the Java code. You can see that we
are opening/closing a connection for each batch, and we have only a
single thread performing this function, no other processes are writing
to the database. We do have other connections reading from the database
(but in the most recent case we only had idle connections). This is
running on CentOS 5.4, with Java 1.6. There is no swap space configured
on this server (we read that this may be a requirement). The data
volumes are very small at this point, only a few million rows.
String query = "COPY " + (dataRowList.size() + 10) + "
RECORDS INTO " + tableName + " FROM STDIN USING DELIMITERS '\\t','\\n'
NULL AS '\\\\N';";
if(TRACE) logger.trace("commit: query : " + query);
// the leading 's' is essential, since it is a protocol
// marker that should not be omitted, likewise the
// trailing semicolon
out.write('s');
out.write(query);
out.newLine();
// Write all data rows.
for(String dataRow : dataRowList) {
if(TRACE) logger.trace("commit: Writing data row: " +
dataRow + " dataRow ends in newline: " + dataRow.endsWith("\n"));
// The dataRow is \n terminated, so no need to output
newLine to output stream.
out.write(dataRow);
}
out.writeLine(""); // need this one for synchronisation
over flush()
error = in.waitForPrompt();
if (error != null)
throw new Exception(error);
out.writeLine(""); // server wants more, we're going to
tell it, this is it
error = in.waitForPrompt();
if (error != null)
throw new Exception(error);
// // disconnect from server
// server.close();
} catch (IOException e) {
logger.error("Unable to connect", e);
throw new RuntimeException("Unable to connect", e);
} catch (Throwable th) {
logger.error("Commit error", th);
...
throw new RuntimeException("Commit error: tableName: "
+ tableName, th);
} finally {
try {
if(server != null) {
server.close();
server = null;
}
} catch (Throwable th) {
logger.error("Error closing MonetDb server connection",
th);
}
}
Thanks,
Jerry
We are encountering a frequent error when writing to MonetDb repeatedly
using the Java MAPI socket. We are running the April 2012 release.
The error is:
[7/29/12 12:10:26 PM] Cory Isaacson:
000!TypeException:user.s0_2[2]:'calc.str' undefined in: _3:any :=
calc.str(_4:str)
22000!TypeException:user.s0_2[3]:'calc.str' undefined in: _5:any :=
calc.str(_6:str)
22000!TypeException:user.s0_2[4]:'calc.str' undefined in: _7:any :=
calc.str(_8:str)
22000!TypeException:user.s0_2[5]:'calc.str' undefined in: _9:any :=
calc.str(_10:str)
22000!TypeException:user.s0_2[6]:'calc.str' undefined in: _11:any :=
calc.str(_12:str)
The only solution is to stop the farm and restart the farm/database.
We have a single Java process writing to the database using MAPI, in
batches of 1000 rows at a time (or less), using the COPY INTO syntax
based on the example Java loader.We have a single Java process writing
to the database using MAPI, in batches of 1000 rows at a time (or less),
using the COPY INTO syntax based on the example Java loader.
Here are the important snippets of the Java code. You can see that we
are opening/closing a connection for each batch, and we have only a
single thread performing this function, no other processes are writing
to the database. We do have other connections reading from the database
(but in the most recent case we only had idle connections). This is
running on CentOS 5.4, with Java 1.6. There is no swap space configured
on this server (we read that this may be a requirement). The data
volumes are very small at this point, only a few million rows.
String query = "COPY " + (dataRowList.size() + 10) + "
RECORDS INTO " + tableName + " FROM STDIN USING DELIMITERS '\\t','\\n'
NULL AS '\\\\N';";
if(TRACE) logger.trace("commit: query : " + query);
// the leading 's' is essential, since it is a protocol
// marker that should not be omitted, likewise the
// trailing semicolon
out.write('s');
out.write(query);
out.newLine();
// Write all data rows.
for(String dataRow : dataRowList) {
if(TRACE) logger.trace("commit: Writing data row: " +
dataRow + " dataRow ends in newline: " + dataRow.endsWith("\n"));
// The dataRow is \n terminated, so no need to output
newLine to output stream.
out.write(dataRow);
}
out.writeLine(""); // need this one for synchronisation
over flush()
error = in.waitForPrompt();
if (error != null)
throw new Exception(error);
out.writeLine(""); // server wants more, we're going to
tell it, this is it
error = in.waitForPrompt();
if (error != null)
throw new Exception(error);
// // disconnect from server
// server.close();
} catch (IOException e) {
logger.error("Unable to connect", e);
throw new RuntimeException("Unable to connect", e);
} catch (Throwable th) {
logger.error("Commit error", th);
...
throw new RuntimeException("Commit error: tableName: "
+ tableName, th);
} finally {
try {
if(server != null) {
server.close();
server = null;
}
} catch (Throwable th) {
logger.error("Error closing MonetDb server connection",
th);
}
}
Thanks,
Jerry
Hi all:
we are testing monetdb on a sql
select substring(fact_bk,1,1) as aa,sum(1) as recordcount from f_activity
group by aa
the table f_activity have about 5 million rows. when i run this sql,
monetdb need about 30 seconds . (if we use another no calculated column,
it need only 1-3 seconds)
when I run this sql on sql server, it needs only 7 seconds.
is monetdb not good at calculated column? or we have made some mistake?
please tell me the reason, thank you.
lixiaohua
Shanghai,China
Hi all:
we want to copy a text file into a remote server, we don't know how to
set the mapi_open = yes on windows. we try add --set "mapi_open=
yes" in the M5server.bat . but we don't know whether it take effect. how
can we know?
when we use jdbc or odbc to access a remote monetdb database. we try
to use
jdbc:monetdb://dvs001:5000/demo. or change the odbc host to an ip address:
192.168.1.100. we can't connect to the database. only we use
localhost can we connect to the database.
does monetdb support remote access in windows?
lixiaohua
Shanghai,China
Hi all:
we want to copy a txt file into a remote monetdb (windows).
if we copy into local monetdb via ipaddress, it's OK.
but if we copy the same txt file into a remote monetdb, it can't.
Error: could not open file 'd:\\share\\book1.csv': Operation not permitted
SQLState: 22000
ErrorCode: 0
what's the problem? is it a bug? or something we don't know?
lixiaohua
Shanghai,China