```Pandas 3.0``` **Pandas 3.0**
Pandas 3.0

原始链接: https://pandas.pydata.org/community/blog/pandas-3.0.html

## pandas 3.0.0 发布:重大更新与破坏性变更 pandas 3.0.0 现已发布,带来了显著的性能改进和一致性,但也包含潜在的破坏性变更。主要更新包括:**专用字符串数据类型 (str) 作为默认类型**,取代了旧的字符串 `object` dtype – 提升性能和类型安全性(最佳速度需要可选的 `pyarrow` 安装)。 一个重大变化是 **Copy-on-Write (CoW)** 的实现,确保可预测的复制/视图行为并消除 `SettingWithCopyWarning`。这意味着链式赋值将不再有效,修改必须一步完成(推荐使用 `.loc`)。 此外,日期时间分辨率现在默认为微秒,并且新的 `pd.col` 语法提供了在 `DataFrame.assign` 中简化可调用对象创建的功能。 **升级需要谨慎:** 之前版本中已弃用的功能已被移除。建议先升级到 2.3 并解决任何警告,然后再迁移到 3.0。查阅发布说明以获取完整的向后不兼容变更列表以及字符串 dtype 和 CoW 功能的迁移指南。 使用 `pip install --upgrade pandas==3.0.*` 或 `conda install -c conda-forge pandas=3.0` 安装。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 Pandas 3.0 (pydata.org) 33 分,由 jonbaer 1小时前发布 | 隐藏 | 过去 | 收藏 | 1 条评论 optimalsolver 3分钟前 [–] 领先的大语言模型多久能吸收更新的文档?因为我肯定不会去阅读。回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

We're excited to announce the release of pandas 3.0.0. This major long-awaited release brings significant improvements to pandas, but also features some potentially breaking changes.

Highlights of pandas 3.0

pandas 3.0 introduces several major enhancements:

  • Dedicated string data type by default: string columns are now inferred as the new str dtype instead of object, providing better performance and type safety
  • Consistent copy/view behaviour with Copy-on-Write (CoW) (a.k.a. getting rid of the SettingWithCopyWarning): more predictable and consistent behavior for all operations, with improved performance through avoiding unnecessary copies
  • New default resolution for datetime-like data: no longer defaulting to nanoseconds, but generally microseconds (or the resolution of the input), when constructing datetime or timedelta data (avoiding out-of-bounds errors for dates with a year before 1678 or after 2262)
  • New pd.col syntax: initial support for pd.col() as a simplified syntax for creating callables in DataFrame.assign

Further, pandas 3.0 includes a lot of other improvements and bug fixes. You can find the complete list of changes in the release notes.

Upgrading to pandas 3.0

The pandas 3.0 release removed functionality that was deprecated in previous releases (see here for an overview). It is recommended to first upgrade to pandas 2.3 and to ensure your code is working without warnings, before upgrading to pandas 3.0.

Further, as a major release, pandas 3.0 includes some breaking changes that may require updates to your code. The two most significant changes are the new string dtype and the copy/view behaviour changes, detailed below. An overview of all potentially breaking changes can be found in the Backwards incompatible API changes section of the release notes.

1. Dedicated string data type by default

Starting with pandas 3.0, string columns are automatically inferred as str dtype instead of the numpy object (which can store any Python object).

Example:

# Old behavior (pandas < 3.0)
>>> ser = pd.Series(["a", "b"])
>>> ser
0    a
1    b
dtype: object  # <-- numpy object dtype

# New behavior (pandas 3.0)
>>> ser = pd.Series(["a", "b"])
>>> ser.dtype
>>> ser
0    a
1    b
dtype: str  # <-- new string dtype

This change improves performance and type safety, but may require code updates, especially for library code that currently looks for "object" dtype when expecting string data.

For more details, see the migration guide for the new string data type.

This new data type will use the pyarrow library under the hood, if installed, to provide the performance improvements. Therefore we strongly recommend to install pyarrow alongside pandas (but pyarrow is not a required dependency installed by default).

2. Consistent copy/view behaviour with Copy-on-Write (CoW)

Copy-on-Write is now the default and only mode in pandas 3.0. This makes behavior more consistent and predictable, and avoids a lot of defensive copying (improving performance), but requires updates to certain coding patterns.

The most impactfull change is that chained assignment will no longer work. As a result, the SettingWithCopyWarning is also removed (since there is no longer ambiguity whether it would work or not), and defensive .copy() calls to silence the warning are no longer needed.

Example:

# Old behavior (pandas < 3.0) - chained assignment
df["foo"][df["bar"] > 5] =   # This might modify df (unpredictable)

# New behavior (pandas 3.0) - must do the modification in one step (e.g. with .loc)
df.loc[df["bar"] > 5, "foo"] = 100

In general, any result of an indexing operation or method now always behaves as if it were a copy, so modifications of the result won't affect the original DataFrame.

For more details, see the Copy-on-Write migration guide.

Obtaining pandas 3.0

You can install the latest pandas 3.0 release from PyPI:

python -m pip install --upgrade pandas==3.0.*

Or from conda-forge using conda/mamba:

conda install -c conda-forge pandas=3.0

Running into an issue or regression?

Please report any problem you encounter with the release on the pandas issue tracker.

Thanks to all the contributors who made this release possible!

联系我们 contact @ memedata.com