ubelt.util_str module¶

Functions for working with text and strings.

The ensure_unicode() function does its best to coerce Python 2/3 bytes and text into a consistent unicode text representation.

The codeblock() and paragraph() wrap multiline strings to help write text blocks without hindering the surrounding code indentation.

The hzcat() function horizontally concatenates multiline text.

The indent() prefixes all lines in a text block with a given prefix. By default that prefix is 4 spaces.

ubelt.util_str.indent(text, prefix=' ')[source]¶

Indents a block of text

Parameters

text (str) – text to indent
prefix (str, default = ‘ ‘) – prefix to add to each line

Returns

indented text

Return type

str

Example

>>> import ubelt as ub
>>> NL = chr(10)  # newline character
>>> text = 'Lorem ipsum' + NL + 'dolor sit amet'
>>> prefix = '    '
>>> result = ub.indent(text, prefix)
>>> assert all(t.startswith(prefix) for t in result.split(NL))

ubelt.util_str.codeblock(text)[source]¶

Create a block of text that preserves all newlines and relative indentation

Wraps multiline string blocks and returns unindented code. Useful for templated code defined in indented parts of code.

Parameters: text (str) – typically a multiline string
Returns: the unindented string
Return type: str

Example

>>> import ubelt as ub
>>> # Simulate an indented part of code
>>> if True:
>>>     # notice the indentation on this will be normal
>>>     codeblock_version = ub.codeblock(
...             '''
...             def foo():
...                 return 'bar'
...             '''
...         )
>>>     # notice the indentation and newlines on this will be odd
>>>     normal_version = ('''
...         def foo():
...             return 'bar'
...     ''')
>>> assert normal_version != codeblock_version
>>> print('Without codeblock')
>>> print(normal_version)
>>> print('With codeblock')
>>> print(codeblock_version)

ubelt.util_str.paragraph(text)[source]¶

Wraps multi-line strings and restructures the text to remove all newlines, heading, trailing, and double spaces.

Useful for writing log messages

Parameters: text (str) – typically a multiline string
Returns: the reduced text block
Return type: str

Example

>>> import ubelt as ub
>>> text = (
>>>     '''
>>>     Lorem ipsum dolor sit amet, consectetur adipiscing
>>>     elit, sed do eiusmod tempor incididunt ut labore et
>>>     dolore magna aliqua.
>>>     ''')
>>> out = ub.paragraph(text)
>>> assert chr(10) in text
>>> assert chr(10) not in out
>>> print('text = {!r}'.format(text))
>>> print('out = {!r}'.format(out))

ubelt.util_str.hzcat(args, sep='')[source]¶

Horizontally concatenates strings preserving indentation

Concatenates a list of objects ensuring that the next item in the list is all the way to the right of any previous items.

Parameters

args (List[str]) – strings to concatenate
sep (str, default=’’) – separator

Example1:

>>> import ubelt as ub
>>> B = ub.repr2([[1, 2], [3, 457]], nl=1, cbr=True, trailsep=False)
>>> C = ub.repr2([[5, 6], [7, 8]], nl=1, cbr=True, trailsep=False)
>>> args = ['A = ', B, ' * ', C]
>>> print(ub.hzcat(args))
A = [[1, 2],   * [[5, 6],
     [3, 457]]    [7, 8]]

Example2:

>>> import ubelt as ub
>>> import unicodedata
>>> aa = unicodedata.normalize('NFD', 'á')  # a unicode char with len2
>>> B = ub.repr2([['θ', aa], [aa, aa, aa]], nl=1, si=True, cbr=True, trailsep=False)
>>> C = ub.repr2([[5, 6], [7, 'θ']], nl=1, si=True, cbr=True, trailsep=False)
>>> args = ['A', '=', B, '*', C]
>>> print(ub.hzcat(args, sep='｜'))
A｜=｜[[θ, á],   ｜*｜[[5, 6],
 ｜ ｜ [á, á, á]]｜ ｜ [7, θ]]

ubelt.util_str.ensure_unicode(text)[source]¶

Casts bytes into utf8 (mostly for python2 compatibility)

Parameters: text (str | bytes) – text to ensure is decoded as unicode
Returns: str

References

[SO_12561063] http://stackoverflow.com/questions/12561063/extract-data-from-file

Example

>>> from ubelt.util_str import *
>>> import codecs  # NOQA
>>> assert ensure_unicode('my ünicôdé strįng') == 'my ünicôdé strįng'
>>> assert ensure_unicode('text1') == 'text1'
>>> assert ensure_unicode('text1'.encode('utf8')) == 'text1'
>>> assert ensure_unicode('ï»¿text1'.encode('utf8')) == 'ï»¿text1'
>>> assert (codecs.BOM_UTF8 + 'text»¿'.encode('utf8')).decode('utf8')