Hi Panos,

Great assistance here!

One thing:
I can understand why Ariel tried to use OFFSET for that purpose, maybe nice to add to the documentation that skipped lines also should follow the format of the remainder!

Greetings,

Arjen

On Tue, 31 Mar 2020 at 10:51, Panagiotis Koutsourakis <panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,

The COPY INTO command recognizes a series of records of the form:

<data><field delimiter><data><field delimiter>...<data><record delimiter>

For example the command

  COPY INTO tbl FROM '/path/to/file' USING DELIMITERS '\t','\n','"';

will recognize lines of the form:

1       "MIA"   7       "MIA|115331571990001804"        2000-08-01

but nothing else. Specifically the first 222 lines in the monet_list.txt file be rejected by COPY INTO. You need to delete those lines before you try to load data with copy into.

Take a look at the documentation of COPY INTO in the MonetDB web page [1].

Hope this helps,
Panos.

[1] https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/CSV_bulk_loads

On 3/30/20 7:42 PM, Ariel Abadi wrote:
> Hi Panos!
>
> Enclosed you can find the file (truncated) that I used and the statement I
> run on mclient.
>
> COPY OFFSET 223 INTO "sb_traf"."nrm_20000818" FROM '/tmp/monet_list.txt'
> USING DELIMITERS '\t','\n','"';
>
>
> Thks
> Ariel
>
>
> On Mon, Mar 30, 2020 at 8:07 AM Panagiotis Koutsourakis <
> panagiotis.koutsourakis@monetdbsolutions.com> wrote:
>
>> Hi Ariel,
>>
>> The first issue is not totally surprising to me. My best guess is that
>> reading from a file is using buffered input while reading from stdin is
>> not. See for example [1] and [2]. I am not sure there is a lot we can do at
>> the MonetDB side about this.
>>
>> About the second issue could you please post the exact COPY INTO command
>> and the first few lines of the file you are trying to load so that we can
>> figure out the exact issue?
>>
>> Best regards,
>> Panos.
>>
>> [1]
>> https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-much-slower-in-c-than-python
>> [2]
>> https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/17233
>>
>> On 3/26/20 11:51 AM, Ariel Abadi wrote:
>>> Hi!
>>>
>>> After a big battle against the DB, I found 2 issues.
>>>
>>> 1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally
>>> different in terms of performance. While the first one last 24 minutes,
>> the
>>> second one last only *90 secs.*
>>> 2) As I had generated all those files thru msqldump (which it states the
>>> "CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run the
>>> COPY *XXXX OFFSET *RECORDS (in order to avoid the create table
>>> statement)... and the query, because of the word "STDIN" in the file. It
>>> was not OFFSETTING all the rows I was asking for.
>>>
>>> I hope I've been clear... if not please let me know!
>>>
>>> Ariel
>>>
>>> On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten <martin.kersten@cwi.nl>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I am not sure what issue you are now dealing with, but since you are a
>>>> company
>>>> that seems to use (or considering) MonetDB in your commercial offerings,
>>>> I gladly refer you to MonetDBSolutions(*), where we can give you the
>>>> commercial
>>>> support needed to diagnose your issues.
>>>>
>>>> regards Martin
>>>>
>>>> (*) https://www.monetdbsolutions.com/
>>>>
>>>> On 24/03/2020 23:55, Ariel Abadi wrote:
>>>>> Thanks Martin for your prompt response
>>>>> But I did and msqldump from other database, and now the upload is being
>>>> done thru the BULK LOAD.
>>>>>
>>>>> /COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING
>>>> DELIMITERS '\t','\n','"';/
>>>>>
>>>>>
>>>>> Any other idea ? :'(
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten <martin.kersten@cwi.nl
>>>> <mailto:martin.kersten@cwi.nl>> wrote:
>>>>>
>>>>>     Hi,
>>>>>
>>>>>     given the numbers I suspect you are loading them as SQL insert
>>>> statements under autocommit mode,
>>>>>     Please read the documentation on Bulk loading.
>>>>>
>>>>>     regards,Martin
>>>>>     Sent from my iPad
>>>>>
>>>>>>     On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com <mailto:
>>>> aabadi@starbi.com>> wrote:
>>>>>>
>>>>>>     
>>>>>>     Hi!
>>>>>>     Maybe somebody can help me, please!
>>>>>>
>>>>>>     I'm trying to upload into Monet a file with 14MM records. Is
>> taking
>>>> more than *24 minutes.*
>>>>>>     It is a fresh new database.
>>>>>>
>>>>>>     Enclosed you will find the server configuration.
>>>>>>
>>>>>>     Monetdb Version Nov2019-SP3
>>>>>>
>>>>>>
>>>>>>     *[root@mn tmp]$ lscpu*
>>>>>>     Architecture:          x86_64
>>>>>>     CPU op-mode(s):        32-bit, 64-bit
>>>>>>     Byte Order:            Little Endian
>>>>>>     CPU(s):                40
>>>>>>     On-line CPU(s) list:   0-39
>>>>>>     Thread(s) per core:    2
>>>>>>     Core(s) per socket:    10
>>>>>>     Socket(s):             2
>>>>>>     NUMA node(s):          2
>>>>>>     Vendor ID:             GenuineIntel
>>>>>>     CPU family:            6
>>>>>>     Model:                 85
>>>>>>     Model name:            Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
>>>>>>     Stepping:              4
>>>>>>     CPU MHz:               800.479
>>>>>>     BogoMIPS:              4400.00
>>>>>>     Virtualization:        VT-x
>>>>>>     L1d cache:             32K
>>>>>>     L1i cache:             32K
>>>>>>     L2 cache:              1024K
>>>>>>     L3 cache:              14080K
>>>>>>     NUMA node0 CPU(s):
>>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
>>>>>>     NUMA node1 CPU(s):
>>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>>>>>>     Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
>>>> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
>> pbe
>>>> syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
>>>> rep_good nopl xtopology nonstop_tsc
>>>>>>     cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
>>>> tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
>>>> popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
>>>> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3
>>>>>>     invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow
>>>> vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep
>>>> bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap
>>>> clflushopt clwb intel_pt avx512cd avx512bw
>>>>>>     avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc
>>>> cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear
>>>> flush_l1d
>>>>>>
>>>>>>
>>>>>>     *[root@mn tmp]$ cat /proc/meminfo*
>>>>>>
>>>>>>     MemTotal:       83886080 kB
>>>>>>     MemFree:        10535812 kB
>>>>>>     MemAvailable:   80159744 kB
>>>>>>     Buffers:               0 kB
>>>>>>     Cached:         69623932 kB
>>>>>>     SwapCached:            0 kB
>>>>>>     Active:          6625436 kB
>>>>>>     Inactive:       63737872 kB
>>>>>>     Active(anon):     324456 kB
>>>>>>     Inactive(anon):   417184 kB
>>>>>>     Active(file):    6300980 kB
>>>>>>     Inactive(file): 63320688 kB
>>>>>>     Unevictable:           0 kB
>>>>>>     Mlocked:            5320 kB
>>>>>>     SwapTotal:       8388604 kB
>>>>>>     SwapFree:        8381988 kB
>>>>>>     Dirty:               220 kB
>>>>>>     Writeback:             0 kB
>>>>>>     AnonPages:      13626052 kB
>>>>>>     Mapped:           186368 kB
>>>>>>     Shmem:              1800 kB
>>>>>>     KReclaimable:    4134960 kB
>>>>>>     Slab:               0 kB
>>>>>>     SReclaimable:          0 kB
>>>>>>     SUnreclaim:            0 kB
>>>>>>     KernelStack:       16720 kB
>>>>>>     PageTables:        56500 kB
>>>>>>     NFS_Unstable:          0 kB
>>>>>>     Bounce:                0 kB
>>>>>>     WritebackTmp:          0 kB
>>>>>>     CommitLimit:    74123472 kB
>>>>>>     Committed_AS:   18337812 kB
>>>>>>     VmallocTotal:   34359738367 kB
>>>>>>     VmallocUsed:      415836 kB
>>>>>>     VmallocChunk:          0 kB
>>>>>>     Percpu:            39552 kB
>>>>>>     HardwareCorrupted:     0 kB
>>>>>>     AnonHugePages:         0 kB
>>>>>>     ShmemHugePages:        0 kB
>>>>>>     ShmemPmdMapped:        0 kB
>>>>>>     CmaTotal:              0 kB
>>>>>>     CmaFree:               0 kB
>>>>>>     HugePages_Total:       0
>>>>>>     HugePages_Free:        0
>>>>>>     HugePages_Rsvd:        0
>>>>>>     HugePages_Surp:        0
>>>>>>     Hugepagesize:       2048 kB
>>>>>>     Hugetlb:               0 kB
>>>>>>     DirectMap4k:      816396 kB
>>>>>>     DirectMap2M:    82513920 kB
>>>>>>     DirectMap1G:    52428800 kB
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>     _______________________________________________
>>>>>>     users-list mailing list
>>>>>>     users-list@monetdb.org <mailto:users-list@monetdb.org>
>>>>>>     https://www.monetdb.org/mailman/listinfo/users-list
>>>>>     _______________________________________________
>>>>>     users-list mailing list
>>>>>     users-list@monetdb.org <mailto:users-list@monetdb.org>
>>>>>     https://www.monetdb.org/mailman/listinfo/users-list
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users-list mailing list
>>>>> users-list@monetdb.org
>>>>> https://www.monetdb.org/mailman/listinfo/users-list
>>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> users-list mailing list
>>> users-list@monetdb.org
>>> https://www.monetdb.org/mailman/listinfo/users-list
>>>
>>
>> _______________________________________________
>> users-list mailing list
>> users-list@monetdb.org
>> https://www.monetdb.org/mailman/listinfo/users-list
>>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list


--
====================================================================
ICIS, office M1.00.05                             Radboud University
Mercator 1                                        Faculty of Science
Toernooiveld 212                                      arjen@cs.ru.nl
NL-6525 EC Nijmegen, The Netherlands              +31-(0)24-365 2354
===================== http://www.informagus.nl/ ====================