設定和排序 MultiIndex
此示例顯示如何使用列資料在 pandas.DataFrame
中設定 MultiIndex
。
In [1]: df = pd.DataFrame([['one', 'A', 100], ['two', 'A', 101], ['three', 'A', 102],
...: ['one', 'B', 103], ['two', 'B', 104], ['three', 'B', 105]],
...: columns=['c1', 'c2', 'c3'])
In [2]: df
Out[2]:
c1 c2 c3
0 one A 100
1 two A 101
2 three A 102
3 one B 103
4 two B 104
5 three B 105
In [3]: df.set_index(['c1', 'c2'])
Out[3]:
c3
c1 c2
one A 100
two A 101
three A 102
one B 103
two B 104
three B 105
你可以在設定後立即對索引進行排序:
In [4]: df.set_index(['c1', 'c2']).sort_index()
Out[4]:
c3
c1 c2
one A 100
B 103
three A 102
B 105
two A 101
B 104
具有排序索引將導致第一級上的查詢效率略高:
In [5]: df_01 = df.set_index(['c1', 'c2'])
In [6]: %timeit df_01.loc['one']
1000 loops, best of 3: 607 µs per loop
In [7]: df_02 = df.set_index(['c1', 'c2']).sort_index()
In [8]: %timeit df_02.loc['one']
1000 loops, best of 3: 413 µs per loop
設定索引後,你可以對特定記錄或記錄組執行查詢:
In [9]: df_indexed = df.set_index(['c1', 'c2']).sort_index()
In [10]: df_indexed.loc['one']
Out[10]:
c3
c2
A 100
B 103
In [11]: df_indexed.loc['one', 'A']
Out[11]:
c3 100
Name: (one, A), dtype: int64
In [12]: df_indexed.xs((slice(None), 'A'))
Out[12]:
c3
c1
one 100
three 102
two 101