Unicode string formatting

Did you know, if one of values in string formatting expression with % operator is unicode, then result string will also be unicode?

>>> "Hello, %s" % u"Alex"
u'Hello, Alex'
>>> "Hello, %s" % u"Алексей"
u'Hello, \u0410\u043b\u0435\u043a\u0441\u0435\u0439'

I used to work with .format string method and its behavior is more attractive to me: type of source string is saved and if some parameter contains non-ascii symbols, UnicodeEncodeError exception is raised.

>>> "Hello, {0}".format(u"Alex")
'Hello, Alex'
>>> "Hello, {0}".format(u"Алексей")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-6: ordinal not in range(128)
>>> u"Hello, {0}".format(u"Алексей")
u'Hello, \u0410\u043b\u0435\u043a\u0441\u0435\u0439'

Is it a big deal, what string type is returned? Well, sometimes yes. For example when working with urlparse.parse_qs, type of string make sense.

So it is better to keep in mind, that code like:

>>> "Hello, %s" % value

can return a unicode string.

Some links:

Published: June 20, 2013
Bookmark and Share
Comments powered by Disqus