Hey Imad,

One of the nice things about UTF-8 is that normal ASCII characters are valid UTF-8. Hence “normal strings” in C are already valid UTF-8. Try simply returning the output from sprintf, like this:

str
UDFyearbracket(str *ret, const date *v)
{
if (*v == date_nil) {
*ret = GDKstrdup(str_nil);
} else {
int year;
char *buf;
fromdate(*v, NULL, NULL, &year);
buf = (char *) GDKmalloc(15);
sprintf(buf, "%d", year);
*ret = buf;
}
return MAL_SUCCEED;
}

Regards,

Mark

On 29 Dec 2016, at 14:35, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:

Hi Sjoerd,

I tried to used iconv with no luck, I am getting always empty string. I assumed the encoding that i am getting from sprintf are in "ISO-8859-1"
Can you please take a look at the following implementation:

str
UDFyearbracket(str *ret, const date *v)
{
if (*v == date_nil) {
*ret = GDKstrdup(str_nil);
} else {
iconv_t cv = iconv_open("UTF-8", "ISO-8859-1");
int factor = 4;
size_t fromlen, tolen;
int year;
char *buf;
char *retChar = (char *)*ret;
fromdate(*v, NULL, NULL, &year);
buf = (char *) GDKmalloc(15);
sprintf(buf, "%d", year);
fromlen = strlen(buf);
tolen = factor * fromlen + 1;
retChar = (char *) GDKmalloc(tolen);
iconv(cv, &buf, &fromlen, &retChar, &tolen);
iconv_close(cv);
}
return MAL_SUCCEED;
}

Thanks



On Thu, Dec 29, 2016 at 1:08 PM, Sjoerd Mullender <sjoerd@monetdb.org> wrote:
Since the MonetDB server is UTF-8 *only*, you should *never* have
non-UTF-8 strings inside the server.  If you have strings in some other
encoding, they should be converted to UTF-8 by whatever client program
you're using.  mclient has options to do this (-e option).
If you want to do conversions yourself, take a look at the iconv related
code in common/stream/stream.c.  Also, the console_read and
console_write functions in that file can give you inspiration.  They
convert Windows wide characters (16-bit encodings of Unicode code
points) to and from UTF-8.  This would be close to converting ints to UTF-8.

On 12/29/2016 01:10 AM, imad hajj chahine wrote:
> Hi again Sjoerd,
>
> After digging in the code I found the GDKstrFromStr, does this function
> handle conversion from a normal string to UTF8_string?
> Is this the correct syntax to use the function:
>
> str
> UDFyearbracket(str *ret, const date *v)
> {
> if (*v == date_nil) {
> *ret = GDKstrdup(str_nil);
> } else {
> int year;
> fromdate(*v, NULL, NULL, &year);
> *ret = (str) GDKmalloc(15);
> sprintf(*ret, "%d", year);
> GDKstrFromStr((unsigned char *)*ret, (unsigned char *)*ret, 15);
> }
> return MAL_SUCCEED;
> }
>
> Thank you.
>
> On Wed, Dec 28, 2016 at 11:40 PM, imad hajj chahine
> <imad.hajj.chahine@gmail.com <mailto:imad.hajj.chahine@gmail.com>> wrote:
>
>     Thank you Sjoerd,
>
>     Any idea how to convert an integer to UTF-8 string, does sprintf
>     come with a variation that can handle UTF-8?
>
>     Thank you.
>
>     On Wed, Dec 28, 2016 at 11:08 PM, Sjoerd Mullender
>     <sjoerd@monetdb.org <mailto:sjoerd@monetdb.org>> wrote:
>
>         See https://dev.monetdb.org/hg/MonetDB-extend/
>         <https://dev.monetdb.org/hg/MonetDB-extend/> for a tutorial on
>         how to
>         create a UDF in C.  You can use the URL to clone from.
>
>         On 12/28/2016 09:28 PM, Alberto Ferrari wrote:
>         > Imad, I hope your success with this. Please comment if you get it, and
>         > then, could those new functions incorporate to future version of Monet?
>         > Or maybe easily compiled to current? So in the future users may suggest
>         > new useful functions (shame about SQL UDF performance)
>         >
>         > Regards!
>         >
>         > 2016-12-28 14:48 GMT-03:00 imad hajj chahine
>         > <imad.hajj.chahine@gmail.com
>         <mailto:imad.hajj.chahine@gmail.com>
>         <mailto:imad.hajj.chahine@gmail.com
>         <mailto:imad.hajj.chahine@gmail.com>>>:
>         >
>         >     Hi,
>         >
>         >     After reviewing all the other alternatives like SQL and Python UDF,
>         >     I was either stuck on performance with SQL UDF or on usability with
>         >     Python UDF (unable to use with aggregation, and not such great
>         >     performance with dates),
>         >
>         >     so I decided to go the hard way with C functions, as a bonus it will
>         >     give me the possibility to change the functionalities without
>         >     worrying about dependencies, which was not the case in other languages.
>         >
>         >     The purpose is to create a set of formatting functions for Year,
>         >     Quarter, Month, Week and Day brackets, and of course i need to
>         >     create the bulk version of each function for performance.
>         >
>         >     Starting from the MTIMEdate_extract_year_bulk, now i have the simple
>         >     function working, and successfully calling it from mclient:
>         >     /
>         >     /
>         >     /str/
>         >     /UDFyearbracket(str *ret, const date *v)/
>         >     /{/
>         >     /if (*v == date_nil) {/
>         >     /*ret = GDKstrdup(str_nil);/
>         >     /} else {/
>         >     /int year;/
>         >     /fromdate(*v, NULL, NULL, &year);/
>         >     /*ret = (str) GDKmalloc(15);/
>         >     /sprintf(*ret, "%d", year);/
>         >     /}/
>         >     /return MAL_SUCCEED;/
>         >     /}/
>         >
>         >
>         >     For the bulk version i get an error in the log: gdk_atoms.c:1345:
>         >     strPut: Assertion `(v[i] & 0x80) == 0' failed.
>         >     /str/
>         >     /UDFBATyearbracket(bat *ret, const bat *bid)/
>         >     /{/
>         >     /BAT *b, *bn;/
>         >     /BUN i,n;/
>         >     /str *y;/
>         >     /const date *t;/
>         >     /
>         >     /
>         >     /if ((b = BATdescriptor(*bid)) == NULL)/
>         >     /throw(MAL, "UDF.BATyearbracket", "Cannot access
>         descriptor");/
>         >     /n = BATcount(b);/
>         >     /
>         >     /
>         >     /bn = COLnew(b->hseqbase, TYPE_str, BATcount(b), TRANSIENT);/
>         >     /if (bn == NULL) {/
>         >     /BBPunfix(b->batCacheid);/
>         >     /throw(MAL, "UDF.BATyearbracket", "memory allocation
>         failure");/
>         >     /}/
>         >     /bn->tnonil = 1;/
>         >     /bn->tnil = 0;/
>         >     /
>         >     /
>         >     /t = (const date *) Tloc(b, 0);/
>         >     /y = (str *) Tloc(bn, 0);/
>         >     /for (i = 0; i < n; i++) {/
>         >     /if (*t == date_nil) {/
>         >     /*y = GDKstrdup(str_nil);/
>         >     /} else/
>         >     /UDFyearbracket(y, t);/
>         >     /if (strcmp(*y, str_nil) == 0) {/
>         >     /bn->tnonil = 0;/
>         >     /bn->tnil = 1;/
>         >     /}/
>         >     /y++;/
>         >     /t++;/
>         >     /}/
>         >     /
>         >     /
>         >     /BATsetcount(bn, (BUN) (y - (str *) Tloc(bn, 0)));/
>         >     /
>         >     /
>         >     /bn->tsorted = BATcount(bn)<2;/
>         >     /bn->trevsorted = BATcount(bn)<2;/
>         >     /
>         >     /
>         >     /BBPkeepref(*ret = bn->batCacheid);/
>         >     /BBPunfix(b->batCacheid);/
>         >     /return MAL_SUCCEED;/
>         >     /}/
>         >
>         >     PS: I am not a c expert but i can find my way with basic operations
>         >     and pointers.
>         >
>         >     Any help or suggestions is appreciated.
>         >
>         >     Thank you.
>         >
>         >     _______________________________________________
>         >     users-list mailing list
>         >     users-list@monetdb.org <mailto:users-list@monetdb.org>
>         <mailto:users-list@monetdb.org <mailto:users-list@monetdb.org>>
>         >     https://www.monetdb.org/mailman/listinfo/users-list
>         <https://www.monetdb.org/mailman/listinfo/users-list>
>         >     <https://www.monetdb.org/mailman/listinfo/users-list
>         <https://www.monetdb.org/mailman/listinfo/users-list>>
>         >
>         >
>         >
>         >
>         > _______________________________________________
>         > users-list mailing list
>         > users-list@monetdb.org <mailto:users-list@monetdb.org>
>         > https://www.monetdb.org/mailman/listinfo/users-list
>         <https://www.monetdb.org/mailman/listinfo/users-list>
>         >
>
>         --
>         Sjoerd Mullender
>
>
>         _______________________________________________
>         users-list mailing list
>         users-list@monetdb.org <mailto:users-list@monetdb.org>
>         https://www.monetdb.org/mailman/listinfo/users-list
>         <https://www.monetdb.org/mailman/listinfo/users-list>
>
>
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>

--
Sjoerd Mullender


_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list


_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list