From monetdb@frederic.jolliton.com Thu Feb 18 13:06:57 2016 From: =?utf-8?q?Fr=C3=A9d=C3=A9ric?= Jolliton To: users-list@monetdb.org Subject: [Low level] Understanding duplicate entries in delta BAT (delete BAT) Date: Thu, 18 Feb 2016 12:06:54 +0000 Message-ID: <874md6ql29.fsf@fjolliton.rd.securactive.lan> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1847404638128221322==" --===============1847404638128221322== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi, As per MonetDB Solution suggestion, we are asking here about this topic. We are studying how MonetDB store its database on disk. So far we were able to understand pretty much all the files (BAT, journal, BBP.dir, heap, hash, imprints, and so on). We have custom tools to decode and dump them (experimental at this point). There is one thing however that is unclear: we thought that "delta BAT", which are the BAT assigned to an SQL tables to list the OIDs of rows that were deleted (named "D__" internaly), would contains only *uniques* values (since you can never delete twice the same row). But on several databases that we are running, we found that some of theses BATs contain duplicates. Lot of duplicates actually. Especially on D_sys__columns and D_sys__tables BATs (respectively for the sys._columns and sys._tables system tables). Some databases do not have this "issue" (while they all have the same schema, and process the same kind of data as the other ones). Can someone explain if duplicates are expected in theses BATs ? Here is an excerpt of D_sys__tables: # hexdump -C 02/210.tail 00000000 2e 00 00 00 00 00 00 00 2f 00 00 00 00 00 00 00 |......../...= ....| 00000010 30 00 00 00 00 00 00 00 31 00 00 00 00 00 00 00 |0.......1...= ....| 00000020 32 00 00 00 00 00 00 00 33 00 00 00 00 00 00 00 |2.......3...= ....| 00000030 34 00 00 00 00 00 00 00 35 00 00 00 00 00 00 00 |4.......5...= ....| 00000040 36 00 00 00 00 00 00 00 37 00 00 00 00 00 00 00 |6.......7...= ....| 00000050 42 00 00 00 00 00 00 00 43 00 00 00 00 00 00 00 |B.......C...= ....| 00000060 44 00 00 00 00 00 00 00 45 00 00 00 00 00 00 00 |D.......E...= ....| 00000070 46 00 00 00 00 00 00 00 47 00 00 00 00 00 00 00 |F.......G...= ....| 00000080 48 00 00 00 00 00 00 00 49 00 00 00 00 00 00 00 |H.......I...= ....| 00000090 4a 00 00 00 00 00 00 00 4b 00 00 00 00 00 00 00 |J.......K...= ....| 000000a0 4c 00 00 00 00 00 00 00 4d 00 00 00 00 00 00 00 |L.......M...= ....| 000000b0 4e 00 00 00 00 00 00 00 4f 00 00 00 00 00 00 00 |N.......O...= ....| 000000c0 50 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 |P.......P...= ....| 000000d0 51 00 00 00 00 00 00 00 50 00 00 00 00 00 00 00 |Q.......P...= ....| 000000e0 51 00 00 00 00 00 00 00 52 00 00 00 00 00 00 00 |Q.......R...= ....| 000000f0 50 00 00 00 00 00 00 00 51 00 00 00 00 00 00 00 |P.......Q...= ....| 00000100 52 00 00 00 00 00 00 00 53 00 00 00 00 00 00 00 |R.......S...= ....| 00000110 50 00 00 00 00 00 00 00 51 00 00 00 00 00 00 00 |P.......Q...= ....| 00000120 52 00 00 00 00 00 00 00 53 00 00 00 00 00 00 00 |R.......S...= ....| ... You can clearly see duplicates OID (the first one being 0x50 at 0xc8). Its entry in BBP.dir (split into sections): 136 32 tmp_210 tmpr_210 02/210 610523782 2 171807 0 0 171807 172032 0 0 0= 0 void 0 1 1793 0 0 0 0 0 1000651 0 0 0 oid 8 0 1024 24 25 27 1 46 773603235 1374456 1376256 1 We are using: MonetDB 5 server v11.21.14 (64-bit, 64-bit oids, 128-bit intege= rs). --=20 Fr=C3=A9d=C3=A9ric Jolliton S=C3=A9curactive --===============1847404638128221322==-- From guillaume.savary@securactive.net Wed Feb 24 09:41:34 2016 From: Guillaume Savary To: users-list@monetdb.org Subject: Re: [Low level] Understanding duplicate entries in delta BAT (delete BAT) Date: Wed, 24 Feb 2016 09:35:36 +0100 Message-ID: <56CD6B58.2040406@securactive.net> In-Reply-To: <874md6ql29.fsf@fjolliton.rd.securactive.lan> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5202876080058591343==" --===============5202876080058591343== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Le 18/02/2016 13:06, Frédéric Jolliton a écrit : > Can someone explain if duplicates are expected in theses BATs ? Hi all, Please,does somebody know if having duplicate entries in a DEL_bat is an issue or not ? At least it seems unnecessary. If yes, we propose to investigate deeper. Thanks. -- Guillaume Savary Securactive R&D --===============5202876080058591343==-- From Niels.Nes@cwi.nl Wed Feb 24 10:00:15 2016 From: Niels Nes To: users-list@monetdb.org Subject: Re: [Low level] Understanding duplicate entries in delta BAT (delete BAT) Date: Wed, 24 Feb 2016 10:00:11 +0100 Message-ID: <20160224090011.GA19697@niels.cwi.nl> In-Reply-To: <56CD6B58.2040406@securactive.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7655790618166282174==" --===============7655790618166282174== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit On Wed, Feb 24, 2016 at 09:35:36AM +0100, Guillaume Savary wrote: > Le 18/02/2016 13:06, Frédéric Jolliton a écrit : > >Can someone explain if duplicates are expected in theses BATs ? > > Hi all, > > Please,does somebody know if having duplicate entries in a DEL_bat is an > issue or not ? At least it seems unnecessary. > If yes, we propose to investigate deeper. > The D_bats hold row id's. As we currently don't recycle rows, it seems strange to have the same (allready deleted) row in there again. Niels > Thanks. > > -- > Guillaume Savary > Securactive R&D > > _______________________________________________ > users-list mailing list > users-list(a)monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list -- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098(a)sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes(a)cwi.nl --===============7655790618166282174==-- From pcoustillas@1g6.biz Mon Aug 8 23:03:56 2016 From: Pierre-Adrien Coustillas To: users-list@monetdb.org Subject: Re: [Low level] Understanding duplicate entries in delta BAT (delete BAT) Date: Mon, 08 Aug 2016 22:57:49 +0200 Message-ID: <1640602693.450884.1470689869521.JavaMail.zimbra@1g6.biz> In-Reply-To: <20160224090011.GA19697@niels.cwi.nl> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7393714338203419463==" --===============7393714338203419463== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi, I have a test script that causes duplicate table for 4 months. Since updating= Jun2016-SP1 I have no worries. Maybe this correction: https://www.monetdb.org/bugzilla/show_bug.cgi?id=3D4036 And you ? Pierre ----- Mail original ----- De: "Niels Nes" =C3=80: "Communication channel for MonetDB users" Envoy=C3=A9: Mercredi 24 F=C3=A9vrier 2016 10:00:11 Objet: Re: [Low level] Understanding duplicate entries in delta BAT (delete B= AT) On Wed, Feb 24, 2016 at 09:35:36AM +0100, Guillaume Savary wrote: > Le 18/02/2016 13:06, Fr=C3=A9d=C3=A9ric Jolliton a =C3=A9crit : > >Can someone explain if duplicates are expected in theses BATs ? >=20 > Hi all, >=20 > Please,does somebody know if having duplicate entries in a DEL_bat is an > issue or not ? At least it seems unnecessary. > If yes, we propose to investigate deeper. >=20 The D_bats hold row id's. As we currently don't recycle rows, it seems strange to have the same (allready deleted) row in there again. Niels > Thanks. >=20 > --=20 > Guillaume Savary > Securactive R&D >=20 > _______________________________________________ > users-list mailing list > users-list(a)monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list --=20 Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098(a)sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes(a)cwi.nl _______________________________________________ users-list mailing list users-list(a)monetdb.org https://www.monetdb.org/mailman/listinfo/users-list --===============7393714338203419463==--