Hi Stefan:

I am very puzzled by the erratic behavior that we are seeing with both Aug2011-SP2 and the latest version of MonetDB. I would appreciate if you can share some of your insight on the erratic behavior we are seeing.

The problem we saw with Q1 through Q8 queries returned in Aug2011-SP2 environment after we inserted 3 batches of data. After 2 daily inserts (copy into), the data worked fine. After inserting the data for 8/4/2012, we started seeing 8/1/2012 data showing up with empty string in one of the columns again and returning incorrect query results. We restored the database from a working backup and started to insert the data for each daily set to observe where the failure occurs. After inserting 8/3/2012 data, we immediately saw some 50,000 records with empty string in one of the columns for 8/1/2012 data which looked fine earlier. We loaded 8/4/2012 data and saw no change in the problematic records. Ironically, after inserting 8/5/2012 data, the problematic records were reduced to just 164 records. I can't find any rationale for the behavior that we are seeing in our MonetDB.

You had suggested to run some tests using the latest release of MonetDB. We did see couple of issues as described in our previous email. Unfortunately, the latest version also has significant problems like the ones that we are seeing with the Aug2011-SP2 version. The query below returned 37 records with empty strings in yyyy_mm column. Ironically, MonetDB thinks they are "different" empty strings because they are in different rows!

 select distinct yyyy_mm from monthly_tou_avg_price where header_id = 6   order by yyyy_mm

yyyy_mm L1
9890
9902
9826
9824
9832
9764
9766
9768
9706
9724
9682
9698
9656
9668
9670
9672
9696
9238
9246
9232
9234
9198
9197
9195
9165
9156
9124
9214
8361
8363
8317
8322
7844
7814
7828
7788
7786
2009-04 7574
2009-05 7574
2009-06 7750

We have a very large dataset (largest table has 2.5 billion rows with 18 columns) that we are working with. But, we have a robust environment (VMs with 40G to 140G memory and plenty of hard disk space) to support the data volume. The Aug2011-SP2 has been very stable up until this week when everything seems to just fall apart.

Have we hit some sort of a limit in the data volume that MonetDB can handle?

Again, I apologize for the long-winded posts. Any suggestions for more information or a way for us to troubleshoot more effectively will be very much appreciated. Thank you.

Best regards,

Henry


On Thu, Aug 2, 2012 at 5:59 PM, Henry Addington <hs.addington@gmail.com> wrote:
Hi Stefan:

We have rebuilt our database using the latest MonetDB release; and the issue described with queries 1 through 8 has been resolved. We are not positive; but we think there was some sort of data corruption in the large timeseries data which got carried over to the aggregate table. We some just a handful of records in the timeseries table that would randomly return garbled characters or empty string. The reason we think the issue is not completely resolved in the latest release is because when we rebuilt our database in Aug2011-SP2 the problem also went away.

Here is the remaining issue. We ran Q9 and Q10 in the rebuilt Aug2011-SP2 release and the latest release. They show identical results as they are based on same data. However, the 2011-03 OFF aggregates (avg, min, max) are incorrect though the count seems to be ok. We have values in the 2011-03 OFF bucket and should have aggregates.

Q9: select yyyy_mm, tou, avg(lmp_prc) from monthly_tou_avg_price group by yyyy_mm, tou order by yyyy_mm, tou

From rebuilt Aug2011-SP2
yyyy_mm, tou,   avg(lmp_prc),            count(lmp_prc), min(lmp_prc), max(lmp_prc)
......
......
2011-03 OFF 16.72153279686157 15167 0.00000 34.88962
2011-03 ON (null) 15166         (null)   (null)
2011-04 OFF 15.427080074671187 15133 -24.89539 30.68835
2011-04 ON 32.05149548777285 15130 -34.29957 48.99472
.....
......

From July2012 Feature Release
yyyy_mm, tou,   avg(lmp_prc),            count(lmp_prc), min(lmp_prc), max(lmp_prc)
......
......
2011-03 OFF 16.721532797520897 1516700000 0.00000 34.88962
2011-03 ON (null) 1516600000 (null) (null)
2011-04 OFF 15.427080077975218 1513300000 -24.89539 30.68835
2011-04 ON 32.051495489755666 1513000000 -34.29957 48.99472
.......
.......

Q10: select yyyy_mm, tou, avg(lmp_prc) from monthly_tou_avg_price where yyyy_mm = '2011-03' group by yyyy_mm, tou order by yyyy_mm, tou

From rebuilt Aug2011-SP2
2011-03 OFF 16.721532796861684 15167 0.00000 34.88962
2011-03 ON 30.276486572596724 15166 -2.62440 54.96787

From July2012 Feature Release
yyyy_mm, tou,   avg(lmp_prc),            count(lmp_prc), min(lmp_prc), max(lmp_prc)
yyyy_mm tou L1 scale_up_L2 L3 L4
2011-03 OFF 16.72153279752101 1516700000 0.00000 34.88962
2011-03 ON 30.276486575893575 1516600000 -2.62440 54.96787

There are 2 things that I see are problems. First, the count() in the July2012 release appear to  10,000 times the actual count in both Q9 and Q10. Also, Q10 with the where clause shows that we do in fact have values that should show up in the aggregates in Q9.

Do you have any suggestions as to how I can provide you with more information to troubleshoot the issue? I know it will be difficult to duplicate what I am seeing. Thanks for all your hard work!

Best regards,

Henry

On Tue, Jul 31, 2012 at 1:11 AM, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:
Henry,

is there any chance you could upgrade from Aug2011-SP2 to a more recent release of MonetDB, preferably the latest Jul2012 ?

(Or at least test the problematic cases with this latest release?)

Stefan

----- Original Message -----
> We are seeing a very strange behavior from MonetDB from simple
> queries on a
> single table.
>
> In monthly_tou_avg_price table, we have monthly average aggregate
> data that
> were inserted from a very large table (> 2 billion) with hourly data.
> We
> have created the aggregate table to speed up certain queries.
> Strangely, we
> are getting incorrect results from very simple queries on the
> aggregate
> table.
>
> Q1 below returns 9,890 records with yyyy_mm column having empty value
> for
> the 2012-07 records. All of the 2012-07 records are missing the
> yyyy_mm
> value. We ran the SQL statement that inserted those records and it
> does not
> show any empty yyyy_mm values. If we check for empty string or null
> value
> in yyyy_mm column (Q2 and Q3) against the aggregate table, we do not
> get
> any hits. If we query for yyyy_mm= '2012-07' as in Q4, we get exactly
> 9890
> records showing yyyy_mm populated with value 2012-07. How could this
> be
> possible, Q1 showing empty yyyy_mm values for the same 9,890 records
> and Q4
> showing 2012-07 yyyy_mm values?
>
> Q1: select * from monthly_tou_avg_price where header_id = 6
> Q2: select * from monthly_tou_avg_price where header_id = 6 and
> yyyy_mm = ''
> Q3: select * from monthly_tou_avg_price where header_id = 6 and
> yyyy_mm is
> null
> Q4: select * from monthly_tou_avg_price where header_id = 6 and
> yyyy_mm =
> '2012-07'
>
> Q5 below returns empty value for yyyy_mm even though yyyy_mm column
> does
> not have any empty or null values. Q6 does not return any empty or
> null
> yyyy_mm values and contradicts Q5 result by returning 2012-07.
> Furthermore,
> Q7 contradicts Q5 with 9,890 records with 2012-07 value in yyyy_mm
> column.
>
> Q5: select distinct yyyy_mm from monthly_tou_avg_price where
> header_id = 6
> Q6: select distinct yyyy_mm from monthly_tou_avg_price
> Q7: select * from monthly_tou_avg_price where header_id = 6 and years
> =
> 2012 and months = 7
>
>
>
> yyyy_mm L15
> 9890
> 2009-04 7574
> 2009-05 7574
> 2009-06 7750
> 2009-07 7786
> 2009-08 7788
> 2009-09 7828
> 2009-10 7814
> 2009-11 7844
> 2009-12 8322
> 2010-01 8317
> 2010-02 8363
> 2010-03 8361
> 2010-04 9214
> 2010-05 9124
> 2010-06 9156
> 2010-07 9165
> 2010-08 9195
> 2010-09 9197
> 2010-10 9198
> 2010-11 9234
> 2010-12 9232
> 2011-01 9246
> 2011-02 9238
> 2011-03 9696
> 2011-04 9672
> 2011-05 9670
> 2011-06 9668
> 2011-07 9656
> 2011-08 9698
> 2011-09 9682
> 2011-10 9724
> 2011-11 9706
> 2011-12 9768
> 2012-01 9766
> 2012-02 9764
> 2012-03 9832
> 2012-04 9824
> 2012-05 9826
> 2012-06 9902
>
>
>
>
>
>
> We also ran an UPDATE statement to set yyyy_mm to 2012-07 for rows
> with
> header_id = 6 and years = 2012 and months = 7. And, 9,890 records
> were
> successfully updated. However, Q1 through Q8 returned same results as
> above
> with no improvement.
>
> We have not seen this kind of issues in other tables. This problem
> seems to
> be limited to a table that is populated with an INSERT INTO
> statement. We
> checked the result returned by the select portion of Q8 and yyyy_mm
> is
> populated with 2012-07. But, the aggregate table seems to end up in a
> weird
> state where it returns incorrect results.
>
> Q8: insert into "monthly_tou_avg_price"
>       select a.header_id, extract(year from a.opr_date) as y,
>       extract(month
> from a.opr_date) as m,
>                b.yyyymm as yyyy_mm, b.svalue as TOU, a.resource_name,
>                round(avg(a.lmp_cong_prc),5),
>                round(avg(a.lmp_ene_prc),5),
> round(avg(a.lmp_loss_prc),5), round(avg(a.lmp_prc),5),
> round(avg(a.ns_clr_prc),5),
>                round(avg(a.rd_clr_prc),5),
>                round(avg(a.ru_clr_prc),5),
> round(avg(a.sp_clr_prc),5), round(avg(a.shadow_prc),5),
>                round(avg(a.fuel_prc),5), count(lmp_cong_prc) as
> actual_lmpcong_tou_count
>        from oasislive.price a
>         join oasislive.time b
>          on a.yyyymmddhh = b.yyyymmddhh
>     where a.opr_date between '2012-07-01' and '2012-07-31'
>  group by a.header_id, y, m, yyyy_mm, TOU, a.resource_name
>  order by y, m, a.header_id, TOU, a.resource_name;
>
> We can see this INSERT INTO related issue with other queries. Q9
> shows that
> we do not have any values for 2011-03 ON period which is incorrect.
> In
> fact, if we specify '2011-03' as in Q10, it shows that we do in fact
> have
> values for the 2011-03 ON period!
>
> Q9: select yyyy_mm, tou, avg(lmp_prc) from monthly_tou_avg_price
> group by
> yyyy_mm, tou order by yyyy_mm, tou
>
> yyyy_mm tou L6
> .
> .
>
> 2011-03 OFF 16.721532797520894
> 2011-03 ON (null)
> 2011-04 OFF 15.427080074671187
> .
> .
>
> Q10: select yyyy_mm, tou, avg(lmp_prc) from monthly_tou_avg_price
> where
> yyyy_mm = '2011-03' group by yyyy_mm, tou order by yyyy_mm, tou
>
> yyyy_mm tou L11
> 2011-03 OFF 16.72153279752101
> 2011-03 ON 30.276486575893575
>
>
> I apologize for the long-winded email. I just wasn't sure how to
> describe
> what I was seeing.... Please let me know if there is any additional
> information that I can provide you with. By the way, we are currently
> using
> August 2011 -SP2 release, but saw this issue in older releases as
> well.
>
> Best regards,
>
> Henry
>
>
>
>
> We are currently running MonetDB August 2011 SP2 release on CentOS
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond.
> Discussions
> will include endpoint security, mobile security and the latest in
> malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> MonetDB-users mailing list
> MonetDB-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-users
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users