Descriptive Statistics¶
In [ ]:
import warnings
warnings.filterwarnings('ignore')
Load 'tips.csv' Data¶
In [ ]:
import seaborn as sns
DF = sns.load_dataset('tips')
I. pandas¶
1) DataFrame Information¶
In [ ]:
DF.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 244 entries, 0 to 243 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 total_bill 244 non-null float64 1 tip 244 non-null float64 2 sex 244 non-null category 3 smoker 244 non-null category 4 day 244 non-null category 5 time 244 non-null category 6 size 244 non-null int64 dtypes: category(4), float64(2), int64(1) memory usage: 7.4 KB
In [ ]:
DF.head()
Out[ ]:
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
2) .describe( )¶
In [ ]:
DF.tip.describe()
Out[ ]:
count 244.000000 mean 2.998279 std 1.383638 min 1.000000 25% 2.000000 50% 2.900000 75% 3.562500 max 10.000000 Name: tip, dtype: float64
3) .sum( )¶
In [ ]:
%precision 5
DF.tip.sum()
Out[ ]:
731.5799999999999
4) .mean( )¶
In [ ]:
DF.tip.mean()
Out[ ]:
2.99827868852459
5) .min( )¶
In [ ]:
DF.tip.min()
Out[ ]:
1.0
6) .quantile(q = 0.25)¶
In [ ]:
DF.tip.quantile(q = 0.25)
Out[ ]:
2.0
7) .median( )¶
In [ ]:
DF.tip.median()
Out[ ]:
2.9
8) .quantile(q = 0.75)¶
In [ ]:
DF.tip.quantile(q = 0.75)
Out[ ]:
3.5625
9) .max( )¶
In [ ]:
DF.tip.max()
Out[ ]:
10.0
10) .var(ddof = 0)¶
Dynamic Degrees Of Freedom(default = 1)
0: 모집단
1: 표본집단
In [ ]:
DF.tip.var(ddof = 0)
Out[ ]:
1.9066085124966412
In [ ]:
DF.tip.var(ddof = 1)
Out[ ]:
1.914454638062471
11) .std(ddof = 0)¶
In [ ]:
DF.tip.std(ddof = 0)
Out[ ]:
1.3807999538298954
In [ ]:
DF.tip.std(ddof = 1)
Out[ ]:
1.3836381890011822
II. numpy¶
In [ ]:
import numpy as np
1) Casting to Array¶
In [ ]:
AR = np.array(DF.tip)
In [ ]:
AR[:5]
Out[ ]:
array([1.01, 1.66, 3.5 , 3.31, 3.61])
2) .sum( )¶
In [ ]:
AR.sum()
Out[ ]:
731.5799999999999
3) .mean( )¶
In [ ]:
AR.mean()
Out[ ]:
2.99827868852459
4) .min( )¶
In [ ]:
AR.min()
Out[ ]:
1.0
5) .max( )¶
In [ ]:
AR.max()
Out[ ]:
10.0
6) .var(ddof = 0)¶
- default = 0
In [ ]:
AR.var(ddof = 0)
Out[ ]:
1.9066085124966412
7) .std(ddof = 0)¶
In [ ]:
AR.std(ddof = 0)
Out[ ]:
1.3807999538298954
III. scipy¶
In [ ]:
import scipy as sp
1) .sum( )¶
In [ ]:
sp.sum(DF['tip'])
Out[ ]:
731.5799999999999
2) .mean( )¶
In [ ]:
sp.mean(DF['tip'])
Out[ ]:
2.99827868852459
3) .amin( )¶
In [ ]:
sp.amin(DF['tip'])
Out[ ]:
1.0
4) .median( )¶
In [ ]:
sp.median(DF['tip'])
Out[ ]:
2.9
5) .amax( )¶
In [ ]:
sp.amax(DF['tip'])
Out[ ]:
10.0
6) .var(ddof = 0)¶
- default = 0
In [ ]:
sp.var(DF['tip'], ddof = 0)
Out[ ]:
1.9066085124966412
7) .std(ddof = 0)¶
In [ ]:
sp.std(DF['tip'], ddof = 0)
Out[ ]:
1.3807999538298954
8) .stats.mode( )¶
- DF.tip.value_counts( )
In [ ]:
DF.tip.value_counts()
Out[ ]:
2.00 33 3.00 23 4.00 12 5.00 10 2.50 10 .. 4.34 1 1.56 1 5.20 1 2.60 1 1.75 1 Name: tip, Length: 123, dtype: int64
In [ ]:
sp.stats.mode(DF['tip'])
Out[ ]:
ModeResult(mode=array([2.]), count=array([33]))
값 '2'가 33번으로 가장 많이 나옴 (대표값/최빈값)
'# Coding > 데이터 분석을 위한 Python' 카테고리의 다른 글
Python 상관관계 분석 (0) | 2023.10.02 |
---|---|
Python 통계 가설 검정 (0) | 2023.10.02 |
Python 씨본 (0) | 2023.10.02 |
Python 데이터 전처리 (0) | 2023.10.02 |
Python 판다스 (0) | 2023.10.02 |