17  pandas 快速入门

本讲义基于 pandas 官方文档 10 Minutes to pandas 编写,结合中文读者习惯进行注释与讲解。

17.1 目录

  • 对象创建
  • 查看数据
  • 选择数据
  • 缺失值处理
  • 运算
  • 数据导入导出
  • 索引
  • 分组
  • 连接
  • 绘图
  • 时间序列
  • Categorical
  • Plotting
  • Getting data in/out

17.2 导入 pandas 和 numpy

import pandas as pd
import numpy as np

17.3 对象创建

17.3.1 创建 Series

s = pd.Series([1, 3, 5, np.nan, 6, 8])
s
0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

17.3.2 创建 DataFrame

dates = pd.date_range("20130101", periods=6)
dates
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list("ABCD"))
df
A B C D
2013-01-01 -0.520585 0.481536 -0.351331 -1.362496
2013-01-02 0.312901 1.507186 -0.652097 1.137791
2013-01-03 -0.873238 -1.893483 1.224852 -0.119387
2013-01-04 -0.488614 -0.179749 0.023712 1.182711
2013-01-05 0.237155 0.709768 -0.467694 -0.855296
2013-01-06 -0.257989 -1.401546 1.692584 -0.928318

17.3.3 由 dict 创建 DataFrame

df2 = pd.DataFrame({
    "A": 1.,
    "B": pd.Timestamp('20130102'),
    "C": pd.Series(1, index=list(range(4)), dtype="float32"),
    "D": np.array([3] * 4, dtype="int32"),
    "E": pd.Categorical(["test", "train", "test", "train"]),
    "F": "foo"
})
df2
A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo

17.3.4 查看各列数据类型

df2.dtypes
A          float64
B    datetime64[s]
C          float32
D            int32
E         category
F           object
dtype: object

17.4 查看数据

17.4.1 查看头尾行

df.head()
A B C D
2013-01-01 -0.520585 0.481536 -0.351331 -1.362496
2013-01-02 0.312901 1.507186 -0.652097 1.137791
2013-01-03 -0.873238 -1.893483 1.224852 -0.119387
2013-01-04 -0.488614 -0.179749 0.023712 1.182711
2013-01-05 0.237155 0.709768 -0.467694 -0.855296
df.tail(3)
A B C D
2013-01-04 -0.488614 -0.179749 0.023712 1.182711
2013-01-05 0.237155 0.709768 -0.467694 -0.855296
2013-01-06 -0.257989 -1.401546 1.692584 -0.928318

17.4.2 显示索引、列名和底层 numpy 数据

df.index
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')
df.columns
Index(['A', 'B', 'C', 'D'], dtype='object')
df.to_numpy()
array([[-0.52058527,  0.48153608, -0.35133101, -1.36249633],
       [ 0.31290141,  1.50718565, -0.65209664,  1.13779112],
       [-0.87323811, -1.8934827 ,  1.22485201, -0.11938684],
       [-0.48861365, -0.17974916,  0.02371156,  1.18271133],
       [ 0.23715514,  0.70976824, -0.46769407, -0.8552958 ],
       [-0.25798935, -1.40154591,  1.69258428, -0.92831789]])

17.4.3 描述性统计汇总

df.describe()
A B C D
count 6.000000 6.000000 6.000000 6.000000
mean -0.265062 -0.129381 0.245004 -0.157499
std 0.462917 1.302976 0.977027 1.096220
min -0.873238 -1.893483 -0.652097 -1.362496
25% -0.512592 -1.096097 -0.438603 -0.910062
50% -0.373301 0.150893 -0.163810 -0.487341
75% 0.113369 0.652710 0.924567 0.823497
max 0.312901 1.507186 1.692584 1.182711

17.4.4 数据转置

df.T
2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06
A -0.520585 0.312901 -0.873238 -0.488614 0.237155 -0.257989
B 0.481536 1.507186 -1.893483 -0.179749 0.709768 -1.401546
C -0.351331 -0.652097 1.224852 0.023712 -0.467694 1.692584
D -1.362496 1.137791 -0.119387 1.182711 -0.855296 -0.928318

17.4.5 按轴排序

df.sort_index(axis=1, ascending=False)
D C B A
2013-01-01 -1.362496 -0.351331 0.481536 -0.520585
2013-01-02 1.137791 -0.652097 1.507186 0.312901
2013-01-03 -0.119387 1.224852 -1.893483 -0.873238
2013-01-04 1.182711 0.023712 -0.179749 -0.488614
2013-01-05 -0.855296 -0.467694 0.709768 0.237155
2013-01-06 -0.928318 1.692584 -1.401546 -0.257989
df.sort_values(by="B")
A B C D
2013-01-03 -0.873238 -1.893483 1.224852 -0.119387
2013-01-06 -0.257989 -1.401546 1.692584 -0.928318
2013-01-04 -0.488614 -0.179749 0.023712 1.182711
2013-01-01 -0.520585 0.481536 -0.351331 -1.362496
2013-01-05 0.237155 0.709768 -0.467694 -0.855296
2013-01-02 0.312901 1.507186 -0.652097 1.137791

17.5 选择数据

17.5.1 选择单列

df["A"]
2013-01-01   -0.520585
2013-01-02    0.312901
2013-01-03   -0.873238
2013-01-04   -0.488614
2013-01-05    0.237155
2013-01-06   -0.257989
Freq: D, Name: A, dtype: float64

17.5.2 通过切片选取多行(行标签)

df[0:3]
A B C D
2013-01-01 -0.520585 0.481536 -0.351331 -1.362496
2013-01-02 0.312901 1.507186 -0.652097 1.137791
2013-01-03 -0.873238 -1.893483 1.224852 -0.119387
df["20130102":"20130104"]
A B C D
2013-01-02 0.312901 1.507186 -0.652097 1.137791
2013-01-03 -0.873238 -1.893483 1.224852 -0.119387
2013-01-04 -0.488614 -0.179749 0.023712 1.182711

17.5.3 使用标签(loc)选择

  • 精确选取行/列
df.loc[dates[0]]
A   -0.520585
B    0.481536
C   -0.351331
D   -1.362496
Name: 2013-01-01 00:00:00, dtype: float64
df.loc[:, ["A", "B"]]
A B
2013-01-01 -0.520585 0.481536
2013-01-02 0.312901 1.507186
2013-01-03 -0.873238 -1.893483
2013-01-04 -0.488614 -0.179749
2013-01-05 0.237155 0.709768
2013-01-06 -0.257989 -1.401546
df.loc["20130102":"20130104", ["A", "B"]]
A B
2013-01-02 0.312901 1.507186
2013-01-03 -0.873238 -1.893483
2013-01-04 -0.488614 -0.179749
df.loc[dates[0], ["A", "B"]]
A   -0.520585
B    0.481536
Name: 2013-01-01 00:00:00, dtype: float64
df.loc[dates[0], "A"]
-0.5205852732718299
df.at[dates[0], "A"]
-0.5205852732718299

17.5.4 按位置(iloc)选取

  • 类似于 numpy 的索引
df.iloc[3]
A   -0.488614
B   -0.179749
C    0.023712
D    1.182711
Name: 2013-01-04 00:00:00, dtype: float64
df.iloc[3:5, 0:2]
A B
2013-01-04 -0.488614 -0.179749
2013-01-05 0.237155 0.709768
df.iloc[[1,2,4], [0,2]]
A C
2013-01-02 0.312901 -0.652097
2013-01-03 -0.873238 1.224852
2013-01-05 0.237155 -0.467694
df.iloc[1:3, :]
df.iloc[:, 1:3]
B C
2013-01-01 0.481536 -0.351331
2013-01-02 1.507186 -0.652097
2013-01-03 -1.893483 1.224852
2013-01-04 -0.179749 0.023712
2013-01-05 0.709768 -0.467694
2013-01-06 -1.401546 1.692584
df.iat[1, 1]
1.5071856526966947

17.5.5 布尔索引

  • 条件筛选
df[df["A"] > 0]
A B C D
2013-01-02 0.312901 1.507186 -0.652097 1.137791
2013-01-05 0.237155 0.709768 -0.467694 -0.855296
df[df > 0]
A B C D
2013-01-01 NaN 0.481536 NaN NaN
2013-01-02 0.312901 1.507186 NaN 1.137791
2013-01-03 NaN NaN 1.224852 NaN
2013-01-04 NaN NaN 0.023712 1.182711
2013-01-05 0.237155 0.709768 NaN NaN
2013-01-06 NaN NaN 1.692584 NaN
df2 = df.copy()
df2["E"] = ["one", "one", "two", "three", "four", "three"]
df2
A B C D E
2013-01-01 -0.520585 0.481536 -0.351331 -1.362496 one
2013-01-02 0.312901 1.507186 -0.652097 1.137791 one
2013-01-03 -0.873238 -1.893483 1.224852 -0.119387 two
2013-01-04 -0.488614 -0.179749 0.023712 1.182711 three
2013-01-05 0.237155 0.709768 -0.467694 -0.855296 four
2013-01-06 -0.257989 -1.401546 1.692584 -0.928318 three
df2[df2["E"].isin(["two", "four"])]
A B C D E
2013-01-03 -0.873238 -1.893483 1.224852 -0.119387 two
2013-01-05 0.237155 0.709768 -0.467694 -0.855296 four

17.6 缺失值处理

pandas 使用 np.nan 代表缺失值。 可以用 dropna 删除含有缺失值的行,或者用 fillna 填充。

df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ["E"])
df1.loc[dates[0]:dates[1], "E"] = 1
df1
A B C D E
2013-01-01 -0.520585 0.481536 -0.351331 -1.362496 1.0
2013-01-02 0.312901 1.507186 -0.652097 1.137791 1.0
2013-01-03 -0.873238 -1.893483 1.224852 -0.119387 NaN
2013-01-04 -0.488614 -0.179749 0.023712 1.182711 NaN
df1.dropna(how="any")
A B C D E
2013-01-01 -0.520585 0.481536 -0.351331 -1.362496 1.0
2013-01-02 0.312901 1.507186 -0.652097 1.137791 1.0
df1.fillna(value=5)
A B C D E
2013-01-01 -0.520585 0.481536 -0.351331 -1.362496 1.0
2013-01-02 0.312901 1.507186 -0.652097 1.137791 1.0
2013-01-03 -0.873238 -1.893483 1.224852 -0.119387 5.0
2013-01-04 -0.488614 -0.179749 0.023712 1.182711 5.0
pd.isna(df1)
A B C D E
2013-01-01 False False False False False
2013-01-02 False False False False False
2013-01-03 False False False False True
2013-01-04 False False False False True

17.7 运算

pandas 支持类似 numpy 的各类算术运算,且会自动对齐不同索引。

df.mean()
A   -0.265062
B   -0.129381
C    0.245004
D   -0.157499
dtype: float64
df.mean(axis=1)
2013-01-01   -0.438219
2013-01-02    0.576445
2013-01-03   -0.415314
2013-01-04    0.134515
2013-01-05   -0.094017
2013-01-06   -0.223817
Freq: D, dtype: float64
s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)
df.sub(s, axis="index")
A B C D
2013-01-01 NaN NaN NaN NaN
2013-01-02 NaN NaN NaN NaN
2013-01-03 -1.873238 -2.893483 0.224852 -1.119387
2013-01-04 -3.488614 -3.179749 -2.976288 -1.817289
2013-01-05 -4.762845 -4.290232 -5.467694 -5.855296
2013-01-06 NaN NaN NaN NaN

17.8 Apply 方法

  • 自定义函数应用于行或列
df.apply(np.cumsum)
A B C D
2013-01-01 -0.520585 0.481536 -0.351331 -1.362496
2013-01-02 -0.207684 1.988722 -1.003428 -0.224705
2013-01-03 -1.080922 0.095239 0.221424 -0.344092
2013-01-04 -1.569536 -0.084510 0.245136 0.838619
2013-01-05 -1.332380 0.625258 -0.222558 -0.016677
2013-01-06 -1.590370 -0.776288 1.470026 -0.944994
df.apply(lambda x: x.max() - x.min())
A    1.186140
B    3.400668
C    2.344681
D    2.545208
dtype: float64

17.9 直方图统计

s = pd.Series(np.random.randint(0, 7, size=10))
s.value_counts()
6    5
5    2
3    1
1    1
4    1
Name: count, dtype: int64

17.10 字符串方法

  • Series 字符串方法自动适配缺失值。
s = pd.Series(["A", "B", "C", "Aaba", "Baca", np.nan, "CABA", "dog", "cat"])
s.str.lower()
0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object

17.11 合并

17.11.1 拼接

df = pd.DataFrame(np.random.randn(10, 4))
pieces = [df[:3], df[3:7], df[7:]]
pd.concat(pieces)
0 1 2 3
0 0.149451 -1.716458 1.540956 -1.478438
1 -0.788580 0.942535 0.860695 1.277974
2 1.529843 0.978873 1.681723 0.128001
3 -1.031891 -0.645625 -0.101714 -0.894796
4 -1.341019 -0.400039 0.293592 -1.754958
5 -0.018467 -0.688119 0.063590 0.081667
6 0.042298 0.433534 0.509931 0.708851
7 0.631839 1.985047 0.612073 0.114068
8 1.433401 -0.619146 1.581860 -1.031054
9 1.029629 0.266897 -0.649012 0.456147

17.11.2 merge(类似 SQL join)

left = pd.DataFrame({'key': ['foo', 'bar'], 'lval': [1, 2]})
right = pd.DataFrame({'key': ['foo', 'bar'], 'rval': [4, 5]})
pd.merge(left, right, on='key')
key lval rval
0 foo 1 4
1 bar 2 5

17.12 分组 groupby

  • split-apply-combine
  • 统计、聚合、转换
df = pd.DataFrame({
    'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
    'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
    'C': np.random.randn(8),
    'D': np.random.randn(8)
})
df.groupby('A').sum()
B C D
A
bar onethreetwo -2.510037 1.483046
foo onetwotwoonethree 2.931827 -2.711364
df.groupby(['A', 'B']).sum()
C D
A B
bar one -1.279562 -0.243084
three -0.183928 -0.697683
two -1.046547 2.423813
foo one 1.735463 -3.088498
three 0.289137 0.215364
two 0.907227 0.161770

17.13 Reshaping

  • Stack/unstack
  • Pivot tables
tuples = list(zip(*[['bar', 'bar', 'baz', 'baz',
                    'foo', 'foo', 'qux', 'qux'],
                   ['one', 'two', 'one', 'two',
                    'one', 'two', 'one', 'two']]))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])
df2 = df[:4]
df2
A B
first second
bar one 1.185040 0.718365
two 0.680732 0.058235
baz one 1.726562 -0.828623
two -0.362360 0.716046
stacked = df2.stack()
stacked
first  second   
bar    one     A    1.185040
               B    0.718365
       two     A    0.680732
               B    0.058235
baz    one     A    1.726562
               B   -0.828623
       two     A   -0.362360
               B    0.716046
dtype: float64
stacked.unstack()
A B
first second
bar one 1.185040 0.718365
two 0.680732 0.058235
baz one 1.726562 -0.828623
two -0.362360 0.716046
stacked.unstack(1)
second one two
first
bar A 1.185040 0.680732
B 0.718365 0.058235
baz A 1.726562 -0.362360
B -0.828623 0.716046
stacked.unstack(0)
first bar baz
second
one A 1.185040 1.726562
B 0.718365 -0.828623
two A 0.680732 -0.362360
B 0.058235 0.716046

17.13.1 Pivot tables(透视表)

df = pd.DataFrame({
    "A": ["one", "one", "two", "three"] * 3,
    "B": ["A", "B", "C"] * 4,
    "C": np.random.randn(12),
    "D": np.random.randn(12)
})
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])
C -1.269507 -1.163112 -0.359637 -0.353797 -0.100903 0.162224 0.385348 0.504784 0.538580 0.647929 1.090241 2.017582
A B
one A NaN NaN -0.374013 NaN 0.191885 NaN NaN NaN NaN NaN NaN NaN
B NaN NaN NaN NaN NaN NaN -0.837664 NaN NaN NaN -0.53223 NaN
C 0.069504 NaN NaN NaN NaN NaN NaN NaN 1.184217 NaN NaN NaN
three A NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.025075 NaN NaN
B NaN NaN NaN NaN NaN NaN NaN -0.050209 NaN NaN NaN NaN
C NaN NaN NaN NaN NaN -0.813397 NaN NaN NaN NaN NaN NaN
two A NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.881732
B NaN -1.001614 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
C NaN NaN NaN -0.216438 NaN NaN NaN NaN NaN NaN NaN NaN

17.14 时间序列

rng = pd.date_range('1/1/2012', periods=100, freq='S')
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts.resample('5Min').sum()
C:\Users\Administrator\AppData\Local\Temp\ipykernel_56080\2503940574.py:1: FutureWarning: 'S' is deprecated and will be removed in a future version, please use 's' instead.
  rng = pd.date_range('1/1/2012', periods=100, freq='S')
2012-01-01    24577
Freq: 5min, dtype: int32
rng = pd.date_range('3/6/2012 00:00', periods=5, freq='D')
ts = pd.Series(np.random.randn(len(rng)), rng)
ts_utc = ts.tz_localize('UTC')
ts_utc
2012-03-06 00:00:00+00:00   -0.960792
2012-03-07 00:00:00+00:00   -2.119477
2012-03-08 00:00:00+00:00   -0.038033
2012-03-09 00:00:00+00:00   -0.125605
2012-03-10 00:00:00+00:00   -0.893517
Freq: D, dtype: float64
ts_utc.tz_convert('US/Eastern')
2012-03-05 19:00:00-05:00   -0.960792
2012-03-06 19:00:00-05:00   -2.119477
2012-03-07 19:00:00-05:00   -0.038033
2012-03-08 19:00:00-05:00   -0.125605
2012-03-09 19:00:00-05:00   -0.893517
Freq: D, dtype: float64

17.14.1 时间区间和频率转换

rng = pd.date_range('1/1/2012', periods=5, freq='M')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts
C:\Users\Administrator\AppData\Local\Temp\ipykernel_56080\3446765465.py:1: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
  rng = pd.date_range('1/1/2012', periods=5, freq='M')
2012-01-31   -0.054797
2012-02-29    1.442388
2012-03-31    0.846941
2012-04-30    0.750070
2012-05-31   -0.466189
Freq: ME, dtype: float64
ps = ts.to_period()
ps
2012-01   -0.054797
2012-02    1.442388
2012-03    0.846941
2012-04    0.750070
2012-05   -0.466189
Freq: M, dtype: float64
ps.to_timestamp()
2012-01-01   -0.054797
2012-02-01    1.442388
2012-03-01    0.846941
2012-04-01    0.750070
2012-05-01   -0.466189
Freq: MS, dtype: float64
prng = pd.period_range('1990Q1', '2000Q4', freq='Q-NOV')
ts = pd.Series(np.random.randn(len(prng)), index=prng)
ts.head()
1990Q1   -0.275693
1990Q2   -1.237097
1990Q3    0.617560
1990Q4    1.263013
1991Q1   -1.938149
Freq: Q-NOV, dtype: float64
ts.index = ts.index.asfreq('M', how='end')
ts.head()
1990-02   -0.275693
1990-05   -1.237097
1990-08    0.617560
1990-11    1.263013
1991-02   -1.938149
Freq: M, dtype: float64

17.15 Categoricals

df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6],
                   "raw_grade": ["a", "b", "b", "a", "a", "e"]})
df["grade"] = df["raw_grade"].astype("category")
df["grade"]
0    a
1    b
2    b
3    a
4    a
5    e
Name: grade, dtype: category
Categories (3, object): ['a', 'b', 'e']
df["grade"] = df["grade"].cat.set_categories(["very good", "good", "very bad"])
df["grade"]
0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN
Name: grade, dtype: category
Categories (3, object): ['very good', 'good', 'very bad']
df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium", "good", "very good"])
df["grade"]
0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN
Name: grade, dtype: category
Categories (5, object): ['very bad', 'bad', 'medium', 'good', 'very good']
df.sort_values(by="grade")
id raw_grade grade
0 1 a NaN
1 2 b NaN
2 3 b NaN
3 4 a NaN
4 5 a NaN
5 6 e NaN
df.groupby("grade").size()
C:\Users\Administrator\AppData\Local\Temp\ipykernel_56080\3951628473.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  df.groupby("grade").size()
grade
very bad     0
bad          0
medium       0
good         0
very good    0
dtype: int64

17.16 绘图

pandas 集成了 Matplotlib,可直接用 .plot() 快速画图。

import matplotlib.pyplot as plt
import warnings

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
    ts = ts.cumsum()
    ts.plot()
    plt.show()

df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list("ABCD"))
df = df.cumsum()
df.plot()
plt.show()
c:\ProgramData\anaconda3\Lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 26376 (\N{CJK UNIFIED IDEOGRAPH-6708}) missing from font(s) DejaVu Sans.
  fig.canvas.print_figure(bytes_io, **kw)
c:\ProgramData\anaconda3\Lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 26376 (\N{CJK UNIFIED IDEOGRAPH-6708}) missing from font(s) DejaVu Sans.
  fig.canvas.print_figure(bytes_io, **kw)

17.17 数据读写

  • 支持多种格式导入导出:CSV, HDF5, Excel, SQL, JSON 等
# 写入 CSV
df.to_csv('foo.csv')
# 读取 CSV
pd.read_csv('foo.csv').head()
Unnamed: 0 A B C D
0 2000-01-01 -0.332295 0.167448 1.819971 0.262984
1 2000-01-02 -1.154339 -0.795782 1.476187 -0.507846
2 2000-01-03 -0.130460 0.149241 1.797390 0.417354
3 2000-01-04 -1.573087 0.423807 1.720803 -0.492286
4 2000-01-05 -1.842675 0.541527 1.728333 -1.907499
# 写入 Excel
df.to_excel('foo.xlsx', sheet_name='Sheet1')
# 读取 Excel
pd.read_excel('foo.xlsx', 'Sheet1').head()
Unnamed: 0 A B C D
0 2000-01-01 -0.332295 0.167448 1.819971 0.262984
1 2000-01-02 -1.154339 -0.795782 1.476187 -0.507846
2 2000-01-03 -0.130460 0.149241 1.797390 0.417354
3 2000-01-04 -1.573087 0.423807 1.720803 -0.492286
4 2000-01-05 -1.842675 0.541527 1.728333 -1.907499