柚子快報邀請碼778899分享:【pandas庫】常用函數(shù)總結(jié)
柚子快報邀請碼778899分享:【pandas庫】常用函數(shù)總結(jié)
文章目錄
1、pd.read_csv()2、Dataframe.drop()3、pd.get_dummies()
pandas官方文檔:https://pandas.pydata.org/pandas-docs/stable/index.html
1、pd.read_csv()
pd.read_csv()是用于讀取 CSV(Comma Separated Values,逗號分隔值)文件并將其轉(zhuǎn)換為 DataFrame 對象。CSV 是一種常見的數(shù)據(jù)存儲格式,其中數(shù)據(jù)以純文本形式存儲,每行表示一條記錄,每個字段之間用逗號(或其他分隔符)分隔。簡單使用:
pd.read_csv(file_path, sep)
1) file_path: 文件路徑
2) sep: csv文件的分隔符,默認為逗號
更復(fù)雜的使用方法:詳見https://blog.csdn.net/weixin_47139649/article/details/126744842
read_csv(
reader: FilePathOrBuffer, *,
sep: str = ...,
delimiter: str | None = ...,
header: int | Sequence[int] | str = ...,
names: Sequence[str] | None = ...,
index_col: int | str | Sequence | Literal[False] | None = ...,
usecols: int | str | Sequence | None = ...,
squeeze: bool = ...,
prefix: str | None = ...,
mangle_dupe_cols: bool = ...,
dtype: str | Mapping[str, Any] | None = ...,
engine: str | None = ...,
converters: Mapping[int | str, (*args, **kwargs) -> Any] | None = ...,
true_values: Sequence[Scalar] | None = ...,
false_values: Sequence[Scalar] | None = ...,
skipinitialspace: bool = ...,
skiprows: Sequence | int | (*args, **kwargs) -> Any | None = ...,
skipfooter: int = ..., nrows: int | None = ..., na_values=...,
keep_default_na: bool = ..., na_filter: bool = ...,
verbose: bool = ..., skip_blank_lines: bool = ...,
parse_dates: bool | List[int] | List[str] = ...,
infer_datetime_format: bool = ...,
keep_date_col: bool = ...,
date_parser: (*args, **kwargs) -> Any | None = ...,
dayfirst: bool = ..., cache_dates: bool = ...,
iterator: Literal[True],
chunksize: int | None = ...,
compression: str | None = ...,
thousands: str | None = ...,
decimal: str | None = ...,
lineterminator: str | None = ...,
quotechar: str = ...,
quoting: int = ...,
doublequote: bool = ...,
escapechar: str | None = ...,
comment: str | None = ...,
encoding: str | None = ...,
dialect: str | None = ...,
error_bad_lines: bool = ...,
warn_bad_lines: bool = ...,
delim_whitespace: bool = ...,
low_memory: bool = ...,
memory_map: bool = ...,
float_precision: str | None = ...)
2、Dataframe.drop()
用于刪除 DataFrame 或 Series 中的指定行、列或元素。
DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors=‘raise’)
1) labels: 指定要刪除的列名或者行索引,可以是單個值(int/str)或者list
2) axis: 指定刪除方向(行或列),0 或 ‘index’ : 刪除行;1 or ‘columns’: 刪除列
3) index: 用于指定要刪除的行索引(index=labels 等效于 labels, axis=0)
4) columns: 用于指定要刪除的列名(columns=labels 等效于 labels, axis=1)
5) inplace: bool類型,True表示原地修改,F(xiàn)alse表示返回一個新的DataFrame,默認為False
例如:
import pandas as pd
# 創(chuàng)建一個簡單的 DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# 刪除列 'A'
df_dropped = df.drop('A', axis=1)
# 這與下面的用法是等效的
df_dropped_equiv = df.drop(columns='A')
# 刪除索引為 1 的行
df_dropped_row = df.drop(1, axis=0)
# 這與下面的用法是等效的
df_dropped_row_equiv = df.drop(index=1)
3、pd.get_dummies()
pd.get_dummies()是將類別變量轉(zhuǎn)換為one-hot變量,進行one-hot編碼,一般用于數(shù)據(jù)的預(yù)處理,在推薦系統(tǒng)中將類別變量轉(zhuǎn)換為one-hot變量后,可繼續(xù)進行embedding
pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False)[source]
1) data: 待轉(zhuǎn)換的類別變量,可以是Series, or DataFrame
2) prefix: str類型,是生成的新列的前綴,可見如下例子
例如:
import pandas as pd
data = pd.DataFrame({
'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
'C': np.random.randn(8),
'D': np.random.randn(8)
})
dummy_data = pd.get_dummies(data['A'], prefix='A')
'''
結(jié)果 dummy_data 將是:
A_bar A_foo
0 0 1
1 1 0
2 0 1
3 1 0
4 0 1
5 1 0
6 0 1
7 0 1
'''
柚子快報邀請碼778899分享:【pandas庫】常用函數(shù)總結(jié)
推薦文章
本文內(nèi)容根據(jù)網(wǎng)絡(luò)資料整理,出于傳遞更多信息之目的,不代表金鑰匙跨境贊同其觀點和立場。
轉(zhuǎn)載請注明,如有侵權(quán),聯(lián)系刪除。