Documentation > USER GUIDE > SQL Functions > String Functions

String Functions

Character string operators

These apply to MonetDB SQL character string types: character, character varying, varchar, clob and subtypes: json and url.

In MonetDB all character strings are UTF-8 encoded.

Operator	Description	Example	Result
\|\|	concatenate two strings. Returns null when one of the strings is null	'Monet' \|\| 'DB'	MonetDB
\|\|	concatenation of a string with a non-string input	'Value: ' \|\| 42	Value: 42

Character string functions

Function	Return type	Description	Example	Result
ascii(s)	int	Unicode code point of the first character of the string s	ascii(R'€')	8364
asciify(s)	same as input	Replace all non-ASCII characters in string s with ASCII equivalents	asciify(R'a € sign')	a EUR sign
char_length(s)	int	number of UTF-8 characters in UTF-8 string s	char_length(R'2€')	2
character_length(s)	int	number of UTF-8 characters in UTF-8 string s	character_length(R'2€')	2
charindex(substr, s)	int	start position of substr in string s or 0 when not found	charindex('gh', 'highigh')	3
charindex(substr, s, i)	int	start position of substr in string s starting at offset i or 0 when not found	charindex('gh', 'highigh', 4)	6
code(int)	char	Unicode character for the integer code point	code(223)	ß
concat(s1, s2)	same as input	concatenate strings s1 and s2. returns null when either s1 or s2 is null	concat('Front ', 242)	Front 242
contains(s1, s2)	boolean	whether string s1 constains string s2. returns null when either s1 or s2 is null	contains('abcd', 'bc')	true
contains(s1, s2, icase)	boolean	whether string s1 constains string s2 matching case insenstive when icase is true, case sensitive when icase is false. returns null when either s1 or s2 is null	contains('abcd', 'BC', true)	true
endswith(s1, s2)	boolean	whether string s1 ends with string s2. returns null when either s1 or s2 is null	endswith('abc', 'bc')	true
endswith(s1, s2, icase)	boolean	whether string s1 ends with string s2 matching case insenstive when icase is true, case sensitive when icase is false. returns null when either s1 or s2 is null	endswith('abc', 'BC', true)	true
greatest(s1, s2)	same as input	alphabetically ordered last string value of strings s1 and s2. (same as sql_max(s1, s2))	greatest('ab', 'ac')	ac
insert(s1, int pos, int nr, s2)	same as input arg 4	insert string s2 into string s1 after position pos replacing nr characters in string s1. when pos is negative it counts from the end of the string. nr must be >= 0.	insert('abcdef', 2, 1, 'XY')	abXYdef
lcase(s)	same as input	convert string s to lower case	lcase('MonetDB')	monetdb
least(s1, s2)	same as input	alphabetically ordered first string value of strings s1 and s2. (same as sql_min(s1, s2))	least('ab', 'ac')	ab
left(s, int n)	same as input arg 1	return first n characters in the string	left('MonetDB', 3)	Mon
length(s)	int	count number of UTF-8 characters in UTF-8 string s	length(R'2€')	2
locate(s1, s2)	int	locate string s1 in s2, when found return start position else 0	locate('DB', 'MonetDB')	6
locate(s1, s2, int pos)	int	locate string s1 in s2 starting from position pos, when found return start position else 0	locate('DB', 'DBMonetDB', 3)	8
lower(s)	same as input	convert string s to lower case	lower('MonetDB')	monetdb
lpad(s, int len)	same as input arg 1	fill up the string to length len by prepending spaces. If the string is already longer than length then it is truncated (on the right).	lpad('abc', 6)	abc
lpad(s1, int len, s2)	same as input arg 1	fill up the string s1 to length len by prepending the characters s2. If the string s1 is already longer than length then it is truncated (on the right).	lpad('abc', 6, 'DB')	DBDabc
ltrim(s)	varchar	remove leading whitespace characters from string s	ltrim(E' \t\fab \tdf ')	ab df
ltrim(s1, s2)	varchar	remove leading s2 strings from string s1	ltrim('aabcdaa', 'a')	bcdaa
sys.md5(s)	clob(32)	calculates the MD5 hash of string, returning the result as 32 character hex number	sys.md5('abc')	900150983cd24fb0 d6963f7d28e17f72
sys.ms_stuff(varchar(32) s1, int pos, int nr, varchar(32) s2)	varchar(32)	insert string s2 into string s1 at position pos replacing nr characters in string s1. If pos <= 0 or > length of s1 then an empty string is returned. nr may be negative.	sys.ms_stuff('abcdef', 2, 1, 'XY')	aXYcdef
octet_length(s)	int	number of bytes in UTF-8 string s	octet_length(R'2€')	4
patindex(pattern, s)	int	search for pattern in string s, when found return the position of the last matched character, when not found return 0	patindex('%bc%', 'abcd')	3
position(s1 IN s2)	int	search for string s1 in string s2, when found return the position in s2 of the first match, when not found return 0	position('DB' IN 'MonetDB')	6
repeat(s, nr)	same as input arg 1	repeat string s the specified nr of times	repeat('kiss,', 3)	kiss,kiss,kiss,
replace(s, srch, repl)	same as input arg 1	replace occurrences of string srch in string s by string repl	replace('abc', 'b', 'X')	aXc
right(s, int n)	same as input arg 1	return last n characters in the string	right('MonetDB', 4)	etDB
rpad(s, int)	same as input arg 1	fill up the string to length len by appending spaces. If the string is already longer than length then it is truncated (on the right).	rpad('abc', 6) \|\| 'End'	abc End
rpad(s1, int len, s2)	same as input arg 1	fill up the string s1 to length len by appending the characters s2. If the string s1 is already longer than length then it is truncated (on the right).	rpad('abc', 6, 'DB')	abcDBD
rtrim(s)	varchar	remove trailing whitespace characters from string s	rtrim(E'ab df \t\n')	ab df
rtrim(s1, s2)	varchar	remove trailing s2 strings from string s1	rtrim('aabcdaa', 'a')	aabcd
space(nr)	clob(nr)	generates a string with nr spaces	select 's' \|\| space(4) \|\| 'e'	s e
splitpart(s, sep, n)	same as input arg 1	extract substring from string s by spliting on separator string sep the field before nth occurrence. n must be greater than zero	splitpart('a\|a\|bc\|cd', '\|', 3)	bc
sql_max(s1, s2)	same as input	alphabetically ordered last string value of strings s1 and s2	sql_max('ab', 'ac')	ac
sql_min(s1, s2)	same as input	alphabetically ordered first string value of strings s1 and s2	sql_min('ab', 'ac')	ab
startswith(s1, s2)	boolean	whether string s1 starts with string s2. returns null when either s1 or s2 is null	startswith('bcd', 'bc')	true
startswith(s1, s2, icase)	boolean	whether string s1 starts with string s2 matching case insenstive when icase is true, case sensitive when icase is false. returns null when either s1 or s2 is null	startswith('bcd', 'BC', true)	true
substr(s, startpos)	same as input arg 1	extract substring starting from position startpos	substr('abCde', 3)	Cde
substr(s, startpos, count)	clob(count)	extract substring starting from position startpos for count characters	substr('abCde', 2, 3)	bCd
substring(s, startpos)	same as input arg 1	extract substring starting from position startpos	substring('abcde', 3)	cde
substring(s, startpos, count)	clob(count)	extract substring starting from position startpos for count characters	substring('abcde', 2, 2)	bc
trim(s)	varchar	remove leading and trailing whitespace characters from string s	trim(E' \t\fab df \t\n')	ab df
trim(s1, s2)	varchar	remove leading and trailing s2 strings from string s1	trim('aabcdaa', 'a')	bcd
"truncate"(s, n_chars)	same as input arg 1	truncate the string s to a maximum of n_chars characters. Note: the function name truncate need to be surrounded by double quotes.	"truncate"('Truncate', 4)	Trun
ucase(s)	same as input	convert string s to upper case	ucase('MonetDB')	MONETDB
upper(s)	same as input	convert string s to upper case	upper('MonetDB')	MONETDB

Text analysis functions

Function	Return type	Description	Example	Result
dameraulevenshtein(s1, s2, int insdel, int rep, int trans)	int	Compute the 'Damerau-Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions/deletions, substitutions and transpositions) required to change one word into the other. Variable costs of insertions/deletions, substitutions and transpositions, with `insdel`, `rep` and `trans` respectively	dameraulevenshtein('MonetDB', 'DB3', 4, 2, 1)	8
dameraulevenshtein(s1, s2)	int	Compute the 'Damerau-Levenshtein distance' where insertion/deletion, replacement and transposition costs are all 1	dameraulevenshtein('MonetDB', 'DB3')	6
difference(s1, s2)	int	converts two strings s1 and s2 to their soundex codes and reports the number of different code positions. Since soundex codes have four characters, the result ranges from zero to four, with zero being an exact match and four being no match.	difference('MonetDB', 'DB3')	4
editdistance(s1, s2)	int	compute the 'distance' between two strings returning an integer between 0 (equal) and n (largest distance) where n is the length of largest string	editdistance('MonetDB', 'DB3')	6
editdistance2(s1, s2)	int	compute the 'distance2' between two strings returning an integer between 0 (equal) and n (largest distance) where n is the length of largest string	editdistance2('MonetDB', 'DB3')	6
jarowinkler(s1, s2)	double	compute the 'Jaro-Winkler similarity' between two strings returning a double between 0 (exact match) and 1 (no similarity)	jarowinkler('MonetDB', 'DB free')	0.428571429
levenshtein(s1, s2, int insdel, int rep)	int	Compute the 'Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions/deletions or substitutions) required to change one word into the other. Variable costs of insertions/deletions or substitutions, with `insdel` and `rep` respectively.	levenshtein('MonetDB', 'DB3', 4, 2)	9
levenshtein(s1, s2)	int	Compute the 'Levenshtein distance' where insertion/deletion and replacement costs are all 1	levenshtein('MonetDB', 'DB3')	6
levenshtein(s1, s2, int insdel, int rep, int trans)	int	compute the 'Damerau-Levenshtein distance' between two strings returning the minimum number of single-character edits (insertions, deletions, substitutions and transpositions) required to change one word into the other.	levenshtein('MonetDB', 'DB3', 4, 2, 1)	8
qgramnormalize(s)	same as input	normalisation of a string s. Converts letters to uppercase and removes all characters not in a..zA..Z0..9 or space	qgramnormalize(R'Prize €50!')	PRIZE 50
similarity(s1, s2)	double	computes the similarity between string s1 and s2 and returns a number between 0 (no similarity) and 1 (exact match). Warning: this function is marked as deprecated and will be removed in the next feature release.	similarity('MonetDB', 'DB free')	0.285714286
soundex(s)	varchar(4)	computes the Soundex code from string s using a phonetic algorithm	soundex('MonetDB')	M533

Tip: The text distance and similarity functions are grouped in the 'txtsim' module. To view them in your MonetDB server use query:

SELECT * FROM sys.functions WHERE mod = 'txtsim' AND type = 1 ORDER BY name;