ubelt.util_str module¶
Functions for working with text and strings.
The ensure_unicode()
function does its best to coerce Python 2/3 bytes
and text into a consistent unicode text representation.
The codeblock()
and paragraph()
wrap multiline strings to help
write text blocks without hindering the surrounding code indentation.
The hzcat()
function horizontally concatenates multiline text.
The indent()
prefixes all lines in a text block with a given prefix. By
default that prefix is 4 spaces.
- ubelt.util_str.indent(text, prefix=' ')[source]¶
Indents a block of text
- Parameters
text (str) – text to indent
prefix (str, default = ‘ ‘) – prefix to add to each line
- Returns
indented text
- Return type
Example
>>> import ubelt as ub >>> NL = chr(10) # newline character >>> text = 'Lorem ipsum' + NL + 'dolor sit amet' >>> prefix = ' ' >>> result = ub.indent(text, prefix) >>> assert all(t.startswith(prefix) for t in result.split(NL))
- ubelt.util_str.codeblock(text)[source]¶
Create a block of text that preserves all newlines and relative indentation
Wraps multiline string blocks and returns unindented code. Useful for templated code defined in indented parts of code.
- Parameters
text (str) – typically a multiline string
- Returns
the unindented string
- Return type
Example
>>> import ubelt as ub >>> # Simulate an indented part of code >>> if True: >>> # notice the indentation on this will be normal >>> codeblock_version = ub.codeblock( ... ''' ... def foo(): ... return 'bar' ... ''' ... ) >>> # notice the indentation and newlines on this will be odd >>> normal_version = (''' ... def foo(): ... return 'bar' ... ''') >>> assert normal_version != codeblock_version >>> print('Without codeblock') >>> print(normal_version) >>> print('With codeblock') >>> print(codeblock_version)
- ubelt.util_str.paragraph(text)[source]¶
Wraps multi-line strings and restructures the text to remove all newlines, heading, trailing, and double spaces.
Useful for writing log messages
- Parameters
text (str) – typically a multiline string
- Returns
the reduced text block
- Return type
Example
>>> import ubelt as ub >>> text = ( >>> ''' >>> Lorem ipsum dolor sit amet, consectetur adipiscing >>> elit, sed do eiusmod tempor incididunt ut labore et >>> dolore magna aliqua. >>> ''') >>> out = ub.paragraph(text) >>> assert chr(10) in text >>> assert chr(10) not in out >>> print('text = {!r}'.format(text)) >>> print('out = {!r}'.format(out))
- ubelt.util_str.hzcat(args, sep='')[source]¶
Horizontally concatenates strings preserving indentation
Concatenates a list of objects ensuring that the next item in the list is all the way to the right of any previous items.
- Parameters
args (List[str]) – strings to concatenate
sep (str, default=’’) – separator
- Example1:
>>> import ubelt as ub >>> B = ub.repr2([[1, 2], [3, 457]], nl=1, cbr=True, trailsep=False) >>> C = ub.repr2([[5, 6], [7, 8]], nl=1, cbr=True, trailsep=False) >>> args = ['A = ', B, ' * ', C] >>> print(ub.hzcat(args)) A = [[1, 2], * [[5, 6], [3, 457]] [7, 8]]
- Example2:
>>> import ubelt as ub >>> import unicodedata >>> aa = unicodedata.normalize('NFD', 'á') # a unicode char with len2 >>> B = ub.repr2([['θ', aa], [aa, aa, aa]], nl=1, si=True, cbr=True, trailsep=False) >>> C = ub.repr2([[5, 6], [7, 'θ']], nl=1, si=True, cbr=True, trailsep=False) >>> args = ['A', '=', B, '*', C] >>> print(ub.hzcat(args, sep='|')) A|=|[[θ, á], |*|[[5, 6], | | [á, á, á]]| | [7, θ]]
- ubelt.util_str.ensure_unicode(text)[source]¶
Casts bytes into utf8 (mostly for python2 compatibility)
- Parameters
text (str | bytes) – text to ensure is decoded as unicode
- Returns
str
References
[SO_12561063] http://stackoverflow.com/questions/12561063/extract-data-from-file
Example
>>> from ubelt.util_str import * >>> import codecs # NOQA >>> assert ensure_unicode('my ünicôdé strįng') == 'my ünicôdé strįng' >>> assert ensure_unicode('text1') == 'text1' >>> assert ensure_unicode('text1'.encode('utf8')) == 'text1' >>> assert ensure_unicode('text1'.encode('utf8')) == 'text1' >>> assert (codecs.BOM_UTF8 + 'text»¿'.encode('utf8')).decode('utf8')