A feature complete Python text case conversion library.
Create and activate a virtual environment and then install textcase
:
You can convert strings into a case using the textcase.convert
function:
from textcase import case, convert
print(convert("ronnie james dio", case.SNAKE)) # ronnie_james_dio
print(convert("Ronnie_James_dio", case.CONSTANT)) # RONNIE_JAMES_DIO
print(convert("RONNIE_JAMES_DIO", case.KEBAB)) # ronnie-james-dio
print(convert("RONNIE-JAMES-DIO", case.CAMEL)) # ronnieJamesDio
print(convert("ronnie-james-dio", case.PASCAL)) # RonnieJamesDio
print(convert("RONNIE JAMES DIO", case.LOWER)) # ronnie james dio
print(convert("ronnie james dio", case.UPPER)) # RONNIE JAMES DIO
print(convert("ronnie-james-dio", case.TITLE)) # Ronnie James Dio
print(convert("ronnie james dio", case.SENTENCE)) # Ronnie james dio
By default, textcase.convert
and textcase.converter.CaseConverter.convert
will split along a set of default word boundaries, that is
- underscores
_
, - hyphens
-
, - spaces
- changes in capitalization from lowercase to uppercase
aA
, - adjacent digits and letters
a1
,1a
,A1
,1A
, - and acroynms
AAa
(as inHTTPRequest
).
For more precision, you can specify boundaries to split based on the word boundaries of a particular case. For example, splitting from snake case will only use underscores as word boundaries:
from textcase import boundary, case, convert
print(convert("2020-04-16_my_cat_cali", case.TITLE)) # 2020 04 16 My Cat Cali
print(convert("2020-04-16_my_cat_cali", case.TITLE, (boundary.UNDERSCORE,))) # 2020-04-16 My Cat Cali
This library can detect acronyms in camel-like strings. It also ignores any leading, trailing, or duplicate delimiters:
from textcase import case, convert
print(convert("IOStream", case.SNAKE)) # io_stream
print(convert("myJSONParser", case.SNAKE)) # my_json_parser
print(convert("__weird--var _name-", case.SNAKE)) # weird_var_name
It also works non-ascii characters. However, no inferences on the language itself is made. For instance, the digraph ij in Dutch will not be capitalized, because it is represented as two distinct Unicode characters. However, æ would be capitalized:
from textcase import case, convert
print(convert("GranatÄpfel", case.KEBAB)) # granat-äpfel
print(convert("ПЕРСПЕКТИВА24", case.TITLE)) # Перспектива 24
print(convert("ὈΔΥΣΣΕΎΣ", case.LOWER)) # ὀδυσσεύς
By default, characters followed by digits and vice-versa are considered word boundaries. In addition, any special ASCII characters (besides _
and -
) are ignored:
from textcase import case, convert
print(convert("E5150", case.SNAKE)) # e_5150
print(convert("10,000Days", case.SNAKE)) # 10,000_days
print(convert("Hello, world!", case.UPPER)) # HELLO, WORLD!
print(convert("ONE\nTWO\nTHREE", case.TITLE)) # One\ntwo\nthree
You can also test what case a string is in:
from textcase import case, is_case
print(is_case("css-class-name", case.KEBAB)) # True
print(is_case("css-class-name", case.SNAKE)) # False
print(is_case("UPPER_CASE_VAR", case.SNAKE)) # False
It can be difficult to determine how to split a string into words. That is why this case provides the textcase.convert
and textcase.converter.CaseConverter.convert
functionality, but sometimes that isn’t enough to meet a specific use case.
Say an identifier has the word 2D
, such as scale2D
. No exclusive usage of textcase.convert
or textcase.converter.CaseConverter.convert
will be enough to solve the problem. In this case we can further specify which boundaries to split the string on. This library provides some patterns for achieving this specificity. We can specify what boundaries we want to split on using instances of the textcase.boundary.Boundary
class:
from textcase import boundary, case, convert
# Not quite what we want
print(convert("scale2D", case.SNAKE, case.CAMEL.boundaries)) # scale_2_d
# Write boundaries explicitly
print(convert("scale2D", case.SNAKE, (boundary.LOWER_DIGIT,))) # scale_2d
This library provides a number of constants for boundaries associated with common cases. But you can create your own boundary to split on other criteria:
from textcase import case, convert
from textcase.boundary import Boundary
# Not quite what we want
print(convert("coolers.revenge", case.TITLE)) # Coolers.revenge
# Define custom boundary
DOT = Boundary(
satisfies=lambda text: text.startswith("."),
length=1,
)
print(convert("coolers.revenge", case.TITLE, (DOT,))) # Coolers Revenge
# Define complex custom boundary
AT_LETTER = Boundary(
satisfies=lambda text: (len(text) > 1 and text[0] == "@") and (text[1] == text[1].lower()),
start=1,
length=0,
)
print(convert("name@domain", case.TITLE, (AT_LETTER,))) # Name@ Domain
To learn more about building a boundary from scratch, take a look at the textcase.boundary.Boundary
class.
Simular to textcase.boundary.Boundary
, there is textcase.case.Case
that exposes the three components necessary for case conversion. This allows you to define a custom case that behaves appropriately in the textcase.convert
and textcase.converter.CaseConverter.convert
functions:
from textcase import convert
from textcase.boundary import Boundary
from textcase.case import Case
from textcase.pattern import lower
# Define custom boundary
DOT = Boundary(
satisfies=lambda text: text.startswith("."),
length=1,
)
# Define custom case
DOT_CASE = Case(
boundaries=(DOT,),
pattern=lower,
delimiter=".",
)
print(convert("Dot case var", DOT_CASE)) # dot.case.var
And because we defined boundary conditions, this means textcase.is_case
should also behave as expected:
from textcase import is_case
from textcase.boundary import Boundary
from textcase.case import Case
from textcase.pattern import lower
# Define custom boundary
DOT = Boundary(
satisfies=lambda text: text.startswith("."),
length=1,
)
# Define custom case
DOT_CASE = Case(
boundaries=(DOT,),
pattern=lower,
delimiter=".",
)
print(is_case("dot.case.var", DOT_CASE)) # True
print(is_case("Dot case var", DOT_CASE)) # False
Case conversion takes place in two parts. The first splits an identifier into a series of words, and the second joins the words back together. Each of these are steps are defined using the textcase.converter.CaseConverter.from_case
and textcase.converter.CaseConverter.to_case
functions respectively.
CaseConverter
is a class that encapsulates the boundaries used for splitting and the pattern and delimiter for mutating and joining. The convert method will apply the boundaries, pattern, and delimiter appropriately. This lets you define the parameters for case conversion upfront:
from textcase import CaseConverter, case, pattern
converter = CaseConverter()
converter.pattern = pattern.camel
converter.delimiter = "_"
print(converter.convert("My Special Case")) # my_Special_Case
converter.from_case(case.CAMEL)
converter.to_case(case.SNAKE)
print(converter.convert("mySpecialCase")) # my_special_case
For more details on how strings are converted, see the docs for textcase.converter.CaseConverter
.
Constants | |
---|---|
textcase.boundary.UNDERSCORE |
Splits on _ , consuming the character on segmentation. |
textcase.boundary.HYPHEN |
Splits on - , consuming the character on segmentation. |
textcase.boundary.SPACE |
Splits on space, consuming the character on segmentation. |
textcase.boundary.LOWER_UPPER |
Splits where a lowercase letter is followed by an uppercase letter. |
textcase.boundary.UPPER_LOWER |
Splits where an uppercase letter is followed by a lowercase letter. This is seldom used. |
textcase.boundary.ACRONYM |
Splits where two uppercase letters are followed by a lowercase letter, identifying acronyms. |
textcase.boundary.LOWER_DIGIT |
Splits where a lowercase letter is followed by a digit. |
textcase.boundary.UPPER_DIGIT |
Splits where an uppercase letter is followed by a digit. |
textcase.boundary.DIGIT_LOWER |
Splits where a digit is followed by a lowercase letter. |
textcase.boundary.DIGIT_UPPER |
Splits where a digit is followed by an uppercase letter. |
textcase.boundary.DEFAULT_BOUNDARIES |
Default boundaries used for splitting strings into words, including underscores, hyphens, spaces, and capitalization changes. |
textcase.case.SNAKE |
Snake case strings are delimited by underscores _ and are all lowercase. |
textcase.case.CONSTANT |
Constant case strings are delimited by underscores _ and are all uppercase. |
textcase.case.KEBAB |
Kebab case strings are delimited by hyphens - and are all lowercase. |
textcase.case.CAMEL |
Camel case strings are lowercase, but for every word except the first the first letter is capitalized. |
textcase.case.PASCAL |
Pascal case strings are lowercase, but for every word the first letter is capitalized. |
textcase.case.LOWER |
Lowercase strings are delimited by spaces and all characters are lowercase. |
textcase.case.UPPER |
Uppercase strings are delimited by spaces and all characters are uppercase. |
textcase.case.TITLE |
Title case strings are delimited by spaces. Only the leading character of each word is uppercase. |
textcase.case.SENTENCE |
Sentence case strings are delimited by spaces. Only the leading character of the first word is uppercase. |