Show HN: Textcase:一个用于文本大小写转换的Python库
Show HN: Textcase: A Python Library for Text Case Conversion

原始链接: https://github.com/zobweyt/textcase

`textcase` 是一个 Python 库,用于在不同大小写格式之间转换文本 (snake_case, CONSTANT_CASE, kebab-case, camelCase, PascalCase, lowercase, UPPERCASE, Title Case, Sentence case)。它提供一个 `convert` 函数,允许灵活的大小写转换。该库使用默认边界(下划线、连字符、空格、大小写变化)智能地将字符串分割成单词,但是您可以根据具体需要自定义这些边界。`textcase` 有效地处理首字母缩写词、非 ASCII 字符和分隔符。您可以使用 `is_case` 检查字符串是否匹配特定的大小写格式。对于高级场景,可以使用 `Boundary` 和 `Case` 类定义自定义边界和大小写格式,从而精确控制分割和连接逻辑。`CaseConverter` 类提供了一种更结构化的方法,它封装了边界、模式和分隔符,用于可重复使用的大小写转换配置。该库为常用边界和大小写格式提供了有用的常量,简化了转换过程。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 Show HN:Textcase:一个用于文本大小写转换的 Python 库 (github.com/zobweyt) 4 分,zobweyt 发布,1 小时前 | 隐藏 | 过去 | 收藏 | 讨论 加入我们,参加 6 月 16-17 日在旧金山举办的 AI 初创公司学校! 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们 搜索:
相关文章

原文

Coveralls PyPI - Version PyPI - Python Version PyPI - Types PyPI - Wheel

A feature complete Python text case conversion library.

Create and activate a virtual environment and then install textcase:

You can convert strings into a case using the textcase.convert function:

from textcase import case, convert

print(convert("ronnie james dio", case.SNAKE))     # ronnie_james_dio
print(convert("Ronnie_James_dio", case.CONSTANT))  # RONNIE_JAMES_DIO
print(convert("RONNIE_JAMES_DIO", case.KEBAB))     # ronnie-james-dio
print(convert("RONNIE-JAMES-DIO", case.CAMEL))     # ronnieJamesDio
print(convert("ronnie-james-dio", case.PASCAL))    # RonnieJamesDio
print(convert("RONNIE JAMES DIO", case.LOWER))     # ronnie james dio
print(convert("ronnie james dio", case.UPPER))     # RONNIE JAMES DIO
print(convert("ronnie-james-dio", case.TITLE))     # Ronnie James Dio
print(convert("ronnie james dio", case.SENTENCE))  # Ronnie james dio

By default, textcase.convert and textcase.converter.CaseConverter.convert will split along a set of default word boundaries, that is

  • underscores _,
  • hyphens -,
  • spaces ,
  • changes in capitalization from lowercase to uppercase aA,
  • adjacent digits and letters a1, 1a, A1, 1A,
  • and acroynms AAa (as in HTTPRequest).

For more precision, you can specify boundaries to split based on the word boundaries of a particular case. For example, splitting from snake case will only use underscores as word boundaries:

from textcase import boundary, case, convert

print(convert("2020-04-16_my_cat_cali", case.TITLE))                          # 2020 04 16 My Cat Cali
print(convert("2020-04-16_my_cat_cali", case.TITLE, (boundary.UNDERSCORE,)))  # 2020-04-16 My Cat Cali

This library can detect acronyms in camel-like strings. It also ignores any leading, trailing, or duplicate delimiters:

from textcase import case, convert

print(convert("IOStream", case.SNAKE))             # io_stream
print(convert("myJSONParser", case.SNAKE))         # my_json_parser
print(convert("__weird--var _name-", case.SNAKE))  # weird_var_name

It also works non-ascii characters. However, no inferences on the language itself is made. For instance, the digraph ij in Dutch will not be capitalized, because it is represented as two distinct Unicode characters. However, æ would be capitalized:

from textcase import case, convert

print(convert("GranatÄpfel", case.KEBAB))    # granat-äpfel
print(convert("ПЕРСПЕКТИВА24", case.TITLE))  # Перспектива 24
print(convert("ὈΔΥΣΣΕΎΣ", case.LOWER))       # ὀδυσσεύς

By default, characters followed by digits and vice-versa are considered word boundaries. In addition, any special ASCII characters (besides _ and -) are ignored:

from textcase import case, convert

print(convert("E5150", case.SNAKE))              # e_5150
print(convert("10,000Days", case.SNAKE))         # 10,000_days
print(convert("Hello, world!", case.UPPER))      # HELLO, WORLD!
print(convert("ONE\nTWO\nTHREE", case.TITLE))    # One\ntwo\nthree

You can also test what case a string is in:

from textcase import case, is_case

print(is_case("css-class-name", case.KEBAB))  # True
print(is_case("css-class-name", case.SNAKE))  # False
print(is_case("UPPER_CASE_VAR", case.SNAKE))  # False

It can be difficult to determine how to split a string into words. That is why this case provides the textcase.convert and textcase.converter.CaseConverter.convert functionality, but sometimes that isn’t enough to meet a specific use case.

Say an identifier has the word 2D, such as scale2D. No exclusive usage of textcase.convert or textcase.converter.CaseConverter.convert will be enough to solve the problem. In this case we can further specify which boundaries to split the string on. This library provides some patterns for achieving this specificity. We can specify what boundaries we want to split on using instances of the textcase.boundary.Boundary class:

from textcase import boundary, case, convert

# Not quite what we want
print(convert("scale2D", case.SNAKE, case.CAMEL.boundaries))    # scale_2_d

# Write boundaries explicitly
print(convert("scale2D", case.SNAKE, (boundary.LOWER_DIGIT,)))  # scale_2d

This library provides a number of constants for boundaries associated with common cases. But you can create your own boundary to split on other criteria:

from textcase import case, convert
from textcase.boundary import Boundary

# Not quite what we want
print(convert("coolers.revenge", case.TITLE))  # Coolers.revenge

# Define custom boundary
DOT = Boundary(
    satisfies=lambda text: text.startswith("."),
    length=1,
)

print(convert("coolers.revenge", case.TITLE, (DOT,)))  # Coolers Revenge

# Define complex custom boundary
AT_LETTER = Boundary(
    satisfies=lambda text: (len(text) > 1 and text[0] == "@") and (text[1] == text[1].lower()),
    start=1,
    length=0,
)

print(convert("name@domain", case.TITLE, (AT_LETTER,)))  # Name@ Domain

To learn more about building a boundary from scratch, take a look at the textcase.boundary.Boundary class.

Simular to textcase.boundary.Boundary, there is textcase.case.Case that exposes the three components necessary for case conversion. This allows you to define a custom case that behaves appropriately in the textcase.convert and textcase.converter.CaseConverter.convert functions:

from textcase import convert
from textcase.boundary import Boundary
from textcase.case import Case
from textcase.pattern import lower

# Define custom boundary
DOT = Boundary(
    satisfies=lambda text: text.startswith("."),
    length=1,
)

# Define custom case
DOT_CASE = Case(
    boundaries=(DOT,),
    pattern=lower,
    delimiter=".",
)

print(convert("Dot case var", DOT_CASE))  # dot.case.var

And because we defined boundary conditions, this means textcase.is_case should also behave as expected:

from textcase import is_case
from textcase.boundary import Boundary
from textcase.case import Case
from textcase.pattern import lower

# Define custom boundary
DOT = Boundary(
    satisfies=lambda text: text.startswith("."),
    length=1,
)

# Define custom case
DOT_CASE = Case(
    boundaries=(DOT,),
    pattern=lower,
    delimiter=".",
)

print(is_case("dot.case.var", DOT_CASE))  # True
print(is_case("Dot case var", DOT_CASE))  # False

Case conversion takes place in two parts. The first splits an identifier into a series of words, and the second joins the words back together. Each of these are steps are defined using the textcase.converter.CaseConverter.from_case and textcase.converter.CaseConverter.to_case functions respectively.

CaseConverter is a class that encapsulates the boundaries used for splitting and the pattern and delimiter for mutating and joining. The convert method will apply the boundaries, pattern, and delimiter appropriately. This lets you define the parameters for case conversion upfront:

from textcase import CaseConverter, case, pattern

converter = CaseConverter()
converter.pattern = pattern.camel
converter.delimiter = "_"

print(converter.convert("My Special Case"))  # my_Special_Case

converter.from_case(case.CAMEL)
converter.to_case(case.SNAKE)

print(converter.convert("mySpecialCase"))  # my_special_case

For more details on how strings are converted, see the docs for textcase.converter.CaseConverter.

Constants
textcase.boundary.UNDERSCORE Splits on _, consuming the character on segmentation.
textcase.boundary.HYPHEN Splits on -, consuming the character on segmentation.
textcase.boundary.SPACE Splits on space, consuming the character on segmentation.
textcase.boundary.LOWER_UPPER Splits where a lowercase letter is followed by an uppercase letter.
textcase.boundary.UPPER_LOWER Splits where an uppercase letter is followed by a lowercase letter. This is seldom used.
textcase.boundary.ACRONYM Splits where two uppercase letters are followed by a lowercase letter, identifying acronyms.
textcase.boundary.LOWER_DIGIT Splits where a lowercase letter is followed by a digit.
textcase.boundary.UPPER_DIGIT Splits where an uppercase letter is followed by a digit.
textcase.boundary.DIGIT_LOWER Splits where a digit is followed by a lowercase letter.
textcase.boundary.DIGIT_UPPER Splits where a digit is followed by an uppercase letter.
textcase.boundary.DEFAULT_BOUNDARIES Default boundaries used for splitting strings into words, including underscores, hyphens, spaces, and capitalization changes.
textcase.case.SNAKE Snake case strings are delimited by underscores _ and are all lowercase.
textcase.case.CONSTANT Constant case strings are delimited by underscores _ and are all uppercase.
textcase.case.KEBAB Kebab case strings are delimited by hyphens - and are all lowercase.
textcase.case.CAMEL Camel case strings are lowercase, but for every word except the first the first letter is capitalized.
textcase.case.PASCAL Pascal case strings are lowercase, but for every word the first letter is capitalized.
textcase.case.LOWER Lowercase strings are delimited by spaces and all characters are lowercase.
textcase.case.UPPER Uppercase strings are delimited by spaces and all characters are uppercase.
textcase.case.TITLE Title case strings are delimited by spaces. Only the leading character of each word is uppercase.
textcase.case.SENTENCE Sentence case strings are delimited by spaces. Only the leading character of the first word is uppercase.
联系我们 contact @ memedata.com