Data Analysis (Pandas)#

Pandas merupakan salah satu library Python yang sangat berguna untuk menangani data tabular seperti file CSV, Excel. Pada bagian ini akan ditunjukkan dasar pandas antara lain: -Import pandas -Read a CSV/Excel file -Explore data -Simple plots and data visualizations -Write a CSV/Excel file

Dataset#

PC Games Sales Dataset, diambil dari kaggle yang merupakan kumpulan data game pc terlaris.

Import Pandas#

Untuk menggunakan libary pandas, terlebih dahulu harus import.

import pandas as pd

Read CSV/Excel File#

namafile="../GamesSales.xlsx"
dataframe=pd.read_excel(namafile)

Explore data#

Display Data#

Data dapat ditampilkan berupa dataframe. Catatan: nilai NaN merupakan data kosong (missing data).

dataframe
Name Sales Series Release Genre Developer Publisher
0 7 Days to Die 2.0 7 Days 2016-06-01 Survival horror The Fun Pimps The Fun Pimps
1 Age of Empires 3.0 Age of Empires 1997-10-01 Real-time strategy Ensemble Studios Microsoft
2 Age of Empires II: The Age of Kings 2.0 Age of Empires 1999-09-01 Real-time strategy Ensemble Studios Microsoft
3 Age of Empires III 2.0 Age of Empires 2005-10-01 Real-time strategy Ensemble Studios Microsoft
4 Age of Mythology 1.0 Age of Empires 2002-10-01 Real-time strategy Ensemble Studios Microsoft
... ... ... ... ... ... ... ...
172 Where in the World Is Carmen Sandiego? 4.0 Carmen Sandiego 1985-06-01 Educational Broderbund Broderbund
173 Who Wants to Be a Millionaire? 1.0 NaN 1999-11-01 Trivia game Jellyvision Disney Interactive Studios
174 Wing Commander 3: Heart of the Tiger 1.0 Wing Commander 1994-12-01 Space combat simulation Origin Systems Electronic Arts
175 World of Warcraft 14.0 Warcraft 2004-11-01 MMORPG Blizzard Entertainment Blizzard Entertainment
176 Zoo Tycoon 1.0 Zoo Tycoon 2001-10-01 Business simulation Microsoft Blue Fang Games

177 rows × 7 columns

Ada beberapa hal penting tentang DataFrame yang ditampilkan di sini:

  1. Indeks Angka yang dicetak di kolom paling kiri dari DataFrame disebut Indeks Pandas. Secara default, Indeks adalah urutan angka yang dimulai dengan nol.

  2. Baris x Kolom Pandas akan menampilkan berapa banyak baris dan kolom dalam kumpulan data ini di bagian bawah output (177 rows x 7 columns).

Shape#

untuk memeriksa berapa banyak baris dan kolom pada dataset dapat menggunakan method .shape

dataframe.shape
(177, 7)

Data Types#

Dalam pandas akan otomatis ditetapkan ke kolom ketika membaca file, dan dalam pandas dapat memeriksa tipe data setiap kolom dengan menggunakan method .dypes.

Pandas Tipe Data object = string float64 = float datetime64 = date time int64 = integer

dataframe.dtypes
Name                 object
Sales               float64
Series               object
Release      datetime64[ns]
Genre                object
Developer            object
Publisher            object
dtype: object

Display First n Rows/Last n Rows#

Untuk melihat n baris pertama dalam DataFrame, dapat menggunakan method bernama .head(). Untuk melihat n baris terakhir dalam DataFrame, dapat menggunakan method bernama .tail().

dataframe.head(2)
Name Sales Series Release Genre Developer Publisher
0 7 Days to Die 2.0 7 Days 2016-06-01 Survival horror The Fun Pimps The Fun Pimps
1 Age of Empires 3.0 Age of Empires 1997-10-01 Real-time strategy Ensemble Studios Microsoft
dataframe.tail(3)
Name Sales Series Release Genre Developer Publisher
174 Wing Commander 3: Heart of the Tiger 1.0 Wing Commander 1994-12-01 Space combat simulation Origin Systems Electronic Arts
175 World of Warcraft 14.0 Warcraft 2004-11-01 MMORPG Blizzard Entertainment Blizzard Entertainment
176 Zoo Tycoon 1.0 Zoo Tycoon 2001-10-01 Business simulation Microsoft Blue Fang Games

Get Info#

Untuk mendapatkan info semua kolom dalam DataFrame dapat menggunakan method .info()

dataframe.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 177 entries, 0 to 176
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Name       177 non-null    object        
 1   Sales      177 non-null    float64       
 2   Series     141 non-null    object        
 3   Release    177 non-null    datetime64[ns]
 4   Genre      177 non-null    object        
 5   Developer  177 non-null    object        
 6   Publisher  177 non-null    object        
dtypes: datetime64[ns](1), float64(1), object(5)
memory usage: 9.8+ KB

Calculate Summary Statistics#

Untuk menghitung ringkasan statistik pada setiap kolom di DataFrame dapat menggunakan method .describe()

dataframe.describe()
Sales
count 177.000000
mean 3.116949
std 4.937466
min 1.000000
25% 1.000000
50% 1.500000
75% 3.000000
max 42.000000

Select Columns#

Secara teknis satu kolom dataframe termasuk series object

dataframe['Name']
0                               7 Days to Die
1                              Age of Empires
2         Age of Empires II: The Age of Kings
3                          Age of Empires III
4                            Age of Mythology
                        ...                  
172    Where in the World Is Carmen Sandiego?
173            Who Wants to Be a Millionaire?
174      Wing Commander 3: Heart of the Tiger
175                         World of Warcraft
176                                Zoo Tycoon
Name: Name, Length: 177, dtype: object

Jika ingin menampilkan kolom sebagai dataframe harus menggunakan dua tanda kurung siku.

dataframe[['Name']]
Name
0 7 Days to Die
1 Age of Empires
2 Age of Empires II: The Age of Kings
3 Age of Empires III
4 Age of Mythology
... ...
172 Where in the World Is Carmen Sandiego?
173 Who Wants to Be a Millionaire?
174 Wing Commander 3: Heart of the Tiger
175 World of Warcraft
176 Zoo Tycoon

177 rows × 1 columns

dataframe[['Name','Publisher']]
Name Publisher
0 7 Days to Die The Fun Pimps
1 Age of Empires Microsoft
2 Age of Empires II: The Age of Kings Microsoft
3 Age of Empires III Microsoft
4 Age of Mythology Microsoft
... ... ...
172 Where in the World Is Carmen Sandiego? Broderbund
173 Who Wants to Be a Millionaire? Disney Interactive Studios
174 Wing Commander 3: Heart of the Tiger Electronic Arts
175 World of Warcraft Blizzard Entertainment
176 Zoo Tycoon Blue Fang Games

177 rows × 2 columns

Count Values#

dataframe['Publisher'].value_counts()
Electronic Arts           19
Blizzard Entertainment    10
Paradox Interactive        7
Activision                 6
GT Interactive             5
                          ..
Tripwire Interactive       1
Warhorse Studios           1
Impressions Game           1
Amanita Design             1
Blue Fang Games            1
Name: Publisher, Length: 96, dtype: int64

Untuk memilih 10 nilai teratas yang paling sering muncul dapat dikombinasikan menggunakan value_counts() dengan python list slicing

dataframe['Publisher'].value_counts()[:10]
Electronic Arts            19
Blizzard Entertainment     10
Paradox Interactive         7
Activision                  6
GT Interactive              5
Microsoft                   4
Gathering of Developers     4
Capcom                      4
Virgin Interactive          3
THQ                         3
Name: Publisher, dtype: int64

Filter Data#

Dalam Pandas DataFrame dapat memfilter nilai pada kolom tertentu sesuai yang ingin ditampilkan.

dataframe[dataframe['Publisher'] == 'Electronic Arts']
Name Sales Series Release Genre Developer Publisher
5 American McGee's Alice 1.0 Alice 2000-10-01 Action-adventure, platformer Rogue Entertainment Electronic Arts
13 Battlefield 1942 2.0 Battlefield 2002-09-01 First-person shooter EA DICE Electronic Arts
14 Battlefield Vietnam 1.0 Battlefield 2004-03-01 First-person shooter EA DICE Electronic Arts
26 Command & Conquer 3: Tiberium Wars 1.0 Command & Conquer 2007-03-01 Real-time strategy EA Los Angeles Electronic Arts
28 Command & Conquer: Red Alert 2 1.0 Command & Conquer 2000-10-01 Real-time strategy Westwood Pacific Electronic Arts
29 Command & Conquer: Tiberian Sun 1.0 Command & Conquer 1999-08-01 Real-time strategy Westwood Studios Electronic Arts
36 Crysis 3.0 Crysis 2007-11-01 First-person shooter Crytek Electronic Arts
37 Crysis Warhead 1.0 Crysis 2008-09-01 First-person shooter Crytek Budapest Electronic Arts
75 Harry Potter and the Philosopher's Stone 1.0 Harry Potter 2001-11-01 Action-adventure KnowWonder Electronic Arts
107 Populous 4.0 Populous 1989-06-01 God game Bullfrog Productions Electronic Arts
127 SimCity 2.0 SimCity 2013-03-01 City-building Electronic Arts Electronic Arts
128 SimCity 3000 5.0 SimCity 1999-01-01 City-building Maxis Electronic Arts
131 Spore 2.0 Spore 2008-09-01 God game Maxis Electronic Arts
153 The Sims 11.0 The Sims 2000-02-01 Life simulation Maxis Electronic Arts
154 The Sims 2 6.0 The Sims 2004-09-01 Life simulation Maxis Electronic Arts
155 The Sims 3 7.0 The Sims 2009-06-01 Life simulation Maxis Electronic Arts
160 Theme Park 3.0 Theme Park 2005-06-01 Construction and management simulation Bullfrog Productions Electronic Arts
171 Warhammer Online: Age of Reckoning 1.0 Warhammer 2008-09-01 MMORPG Mythic Entertainment Electronic Arts
174 Wing Commander 3: Heart of the Tiger 1.0 Wing Commander 1994-12-01 Space combat simulation Origin Systems Electronic Arts

Missing Data#

Analisis data pastinya menjadi familiar dengan missing value. Missing value adalah hilangnya beberapa data yang diperoleh. Untuk mendapatkan data yang hilang dapat menggunakan method .isna() / .notna(). Panda memiliki cara untuk menangani data yang hilang, seperti yang sudah diketahui, nilai kosong di file yang telah diimport akan muncul sebagai NaN di Pandas DataFrame.

dataframe['Series'].isna().value_counts()
False    141
True      36
Name: Series, dtype: int64

Add Columns#

Untuk menambahkan kolom baru, cukup memasukan nama kolom baru dalam tanda kurung siku

dataframe['dev'] = dataframe['Developer']
dataframe
Name Sales Series Release Genre Developer Publisher dev
0 7 Days to Die 2.0 7 Days 2016-06-01 Survival horror The Fun Pimps The Fun Pimps The Fun Pimps
1 Age of Empires 3.0 Age of Empires 1997-10-01 Real-time strategy Ensemble Studios Microsoft Ensemble Studios
2 Age of Empires II: The Age of Kings 2.0 Age of Empires 1999-09-01 Real-time strategy Ensemble Studios Microsoft Ensemble Studios
3 Age of Empires III 2.0 Age of Empires 2005-10-01 Real-time strategy Ensemble Studios Microsoft Ensemble Studios
4 Age of Mythology 1.0 Age of Empires 2002-10-01 Real-time strategy Ensemble Studios Microsoft Ensemble Studios
... ... ... ... ... ... ... ... ...
172 Where in the World Is Carmen Sandiego? 4.0 Carmen Sandiego 1985-06-01 Educational Broderbund Broderbund Broderbund
173 Who Wants to Be a Millionaire? 1.0 NaN 1999-11-01 Trivia game Jellyvision Disney Interactive Studios Jellyvision
174 Wing Commander 3: Heart of the Tiger 1.0 Wing Commander 1994-12-01 Space combat simulation Origin Systems Electronic Arts Origin Systems
175 World of Warcraft 14.0 Warcraft 2004-11-01 MMORPG Blizzard Entertainment Blizzard Entertainment Blizzard Entertainment
176 Zoo Tycoon 1.0 Zoo Tycoon 2001-10-01 Business simulation Microsoft Blue Fang Games Microsoft

177 rows × 8 columns

Drop Columns#

Untuk menghapus kolom dalam dataframe dapat menggunakan method .drop()

dataframe = dataframe.drop(columns="dev")
dataframe
Name Sales Series Release Genre Developer Publisher
0 7 Days to Die 2.0 7 Days 2016-06-01 Survival horror The Fun Pimps The Fun Pimps
1 Age of Empires 3.0 Age of Empires 1997-10-01 Real-time strategy Ensemble Studios Microsoft
2 Age of Empires II: The Age of Kings 2.0 Age of Empires 1999-09-01 Real-time strategy Ensemble Studios Microsoft
3 Age of Empires III 2.0 Age of Empires 2005-10-01 Real-time strategy Ensemble Studios Microsoft
4 Age of Mythology 1.0 Age of Empires 2002-10-01 Real-time strategy Ensemble Studios Microsoft
... ... ... ... ... ... ... ...
172 Where in the World Is Carmen Sandiego? 4.0 Carmen Sandiego 1985-06-01 Educational Broderbund Broderbund
173 Who Wants to Be a Millionaire? 1.0 NaN 1999-11-01 Trivia game Jellyvision Disney Interactive Studios
174 Wing Commander 3: Heart of the Tiger 1.0 Wing Commander 1994-12-01 Space combat simulation Origin Systems Electronic Arts
175 World of Warcraft 14.0 Warcraft 2004-11-01 MMORPG Blizzard Entertainment Blizzard Entertainment
176 Zoo Tycoon 1.0 Zoo Tycoon 2001-10-01 Business simulation Microsoft Blue Fang Games

177 rows × 7 columns

Sort Columns#

Kolom dalam dataframe dapat diurutkan menggunakan method .sort_values sesuai dengan parameter by= yang menunjukkan nama kolom yang ingin diurutkan disertai tanda kutip. Misalkan data kolom “Sales” diurutkan berdasarkan nilai yang paling besar ke terkecil seperti dibawah ini.

dataframe.sort_values(by='Sales', ascending=False)
Name Sales Series Release Genre Developer Publisher
105 PlayerUnknown's Battlegrounds 42.0 NaN 2017-12-01 Battle royale PUBG Studios Krafton
95 Minecraft 33.0 Minecraft 2011-11-01 Sandbox, survival Mojang Studios Mojang Studios
67 Garry's Mod 20.0 NaN 2006-11-01 Sandbox Facepunch Studios Valve
50 Diablo III 20.0 Diablo 2012-05-01 Action role-playing Blizzard Entertainment Blizzard Entertainment
146 Terraria 17.2 NaN 2011-05-01 Action-adventure Re-Logic Re-Logic
... ... ... ... ... ... ... ...
41 Danganronpa: Trigger Happy Havoc 1.0 Danganronpa 2016-02-01 Visual novel, adventure Spike Chunsoft Spike Chunsoft
45 Daryl F. Gates' Police Quest: SWAT 1.0 Police Quest 1995-09-01 Interactive movie Sierra Online Sierra Online
47 Deer Hunter 1.0 Deer Hunter 1997-11-01 Sports Sunstorm Interactive WizardWorks
103 Phantasmagoria 1.0 Phantasmagoria 1995-07-01 Interactive movie Sierra Online Sierra Online
176 Zoo Tycoon 1.0 Zoo Tycoon 2001-10-01 Business simulation Microsoft Blue Fang Games

177 rows × 7 columns

Check for Duplicates#

Untuk mengecek duplikasi yang ada dalam database dapat menggunakan method .duplicated()

dataframe[dataframe.duplicated(keep=False)]
Name Sales Series Release Genre Developer Publisher
135 StarCraft II: Heart of the Swarm 1.0 StarCraft 2013-03-01 Real-time strategy Blizzard Entertainment Blizzard Entertainment
136 StarCraft II: Heart of the Swarm 1.0 StarCraft 2013-03-01 Real-time strategy Blizzard Entertainment Blizzard Entertainment
137 StarCraft II: Legacy of the Void 1.0 StarCraft 2015-11-01 Real-time strategy Blizzard Entertainment Blizzard Entertainment
138 StarCraft II: Legacy of the Void 1.0 StarCraft 2015-11-01 Real-time strategy Blizzard Entertainment Blizzard Entertainment

Calculate Columns#

Dalam pandas dapat dilakukan perhitungan berbeda pada kolom dengan fungsi bawaan, dan perhitungan ini akan mengabaikan nilai NaN. Antara lain:

Method Perhitungan Pandas

Penjelasan

.count()

Jumlah data dalam tabel

.sum()

Jumlah nilai

.mean()

Nilai rata-rata

.median()

Nilai Median/nilai tengah

.min()

Nilai minimum/terkecil

.max()

Nilai maksimum/terbesar

.std()

standard deviation/simpangan baku

contohnya:

dataframe['Sales'].mean()
3.116949152542373
dataframe['Sales'].max()
42.0

Clean and Transform Data#

Pembersihan data merupakan proses di mana data yang tidak akurat, diformat dengan kurang baik, atau berantakan akan diatur dan diperbaiki. Transformasi data adalah proses mengubah data dari satu format atau struktur ke format atau struktur lainnya. Beberapa method untuk cleaning and transforming data yang dapat digunakan dalam pandas antara lain:

Pandas .str Methods#

Method String Pandas

Penjelasan

df[‘column_name’].str.lower()

membuat string di setiap baris menjadi huruf kecil

df[‘column_name’].str.upper()

membuat string di setiap baris menjadi huruf besar

df[‘column_name’].str.title()

Mengonversi karakter pertama setiap kata menjadi huruf besar dan sisanya menjadi huruf kecil.

df[‘column_name’].str.replace(‘old string’, ‘new string’)

Ganti setiap kemunculan pola/regex di Seri/Indeks.

df[‘column_name’].str.contains(‘pattern/string’)

Mengecek apakah setiap baris berisi pola/string tersebut

df[‘column_name’].str.split(‘delim’)

Mengembalikan substring yang dipisahkan oleh batas yang diberikan

df[‘column_name’].str.join(list)

Kebalikan dari .split(), ini menggabungkan elemen dalam list berupa string

Contoh:

dataframe['Series'].str.upper()
0               7 DAYS
1       AGE OF EMPIRES
2       AGE OF EMPIRES
3       AGE OF EMPIRES
4       AGE OF EMPIRES
            ...       
172    CARMEN SANDIEGO
173                NaN
174     WING COMMANDER
175           WARCRAFT
176         ZOO TYCOON
Name: Series, Length: 177, dtype: object
dataframe[dataframe['Genre'].str.contains('strategy')].head(10)
Name Sales Series Release Genre Developer Publisher
1 Age of Empires 3.0 Age of Empires 1997-10-01 Real-time strategy Ensemble Studios Microsoft
2 Age of Empires II: The Age of Kings 2.0 Age of Empires 1999-09-01 Real-time strategy Ensemble Studios Microsoft
3 Age of Empires III 2.0 Age of Empires 2005-10-01 Real-time strategy Ensemble Studios Microsoft
4 Age of Mythology 1.0 Age of Empires 2002-10-01 Real-time strategy Ensemble Studios Microsoft
21 Civilization II 1.0 Civilization 1996-02-01 Turn-based strategy, 4X MicroProse MicroProse
22 Civilization III 2.0 Civilization 2001-10-01 Turn-based strategy, 4X Firaxis Games Infogrames
23 Civilization IV 3.0 Civilization 2005-10-01 Turn-based strategy, 4X Firaxis Games 2K Games & Aspyr
24 Civilization V 8.0 Civilization 2010-09-01 Turn-based strategy, 4X Firaxis Games 2K Games & Aspyr
25 Command & Conquer 3.0 Command & Conquer 1995-08-01 Real-time strategy Westwood Studios Virgin Interactive
26 Command & Conquer 3: Tiberium Wars 1.0 Command & Conquer 2007-03-01 Real-time strategy EA Los Angeles Electronic Arts

Applying Functions#

Dengan menggunakan method .apply(), dapat menjalankan sebuah fungsi pada setiap baris dalam kolom pada dataframe.

def splitText(text):
    splitdulu = text.split()
    return splitdulu

dataframe['Name'].apply(splitText)
0                                  [7, Days, to, Die]
1                                  [Age, of, Empires]
2        [Age, of, Empires, II:, The, Age, of, Kings]
3                             [Age, of, Empires, III]
4                                [Age, of, Mythology]
                            ...                      
172    [Where, in, the, World, Is, Carmen, Sandiego?]
173             [Who, Wants, to, Be, a, Millionaire?]
174      [Wing, Commander, 3:, Heart, of, the, Tiger]
175                             [World, of, Warcraft]
176                                     [Zoo, Tycoon]
Name: Name, Length: 177, dtype: object

Merge Dataframe#

Jika ada dua kumpulan dataset yang terbagi, dalam pandas dapat juga digabungkan menjadi satu dengan menggunakan method .merge().

namafile="Games Sales- Dev.xlsx"
dataframe=pd.read_excel(namafile)
namafile2="Games Sales- Publisher.xlsx"
dataframe2=pd.read_excel(namafile2)

dataframe
Name Sales Series Release Genre Developer
0 7 Days to Die 2.0 7 Days 2016-06-01 Survival horror The Fun Pimps
1 Age of Empires 3.0 Age of Empires 1997-10-01 Real-time strategy Ensemble Studios
2 Age of Empires II: The Age of Kings 2.0 Age of Empires 1999-09-01 Real-time strategy Ensemble Studios
3 Age of Empires III 2.0 Age of Empires 2005-10-01 Real-time strategy Ensemble Studios
4 Age of Mythology 1.0 Age of Empires 2002-10-01 Real-time strategy Ensemble Studios
... ... ... ... ... ... ...
172 Where in the World Is Carmen Sandiego? 4.0 Carmen Sandiego 1985-06-01 Educational Broderbund
173 Who Wants to Be a Millionaire? 1.0 NaN 1999-11-01 Trivia game Jellyvision
174 Wing Commander 3: Heart of the Tiger 1.0 Wing Commander 1994-12-01 Space combat simulation Origin Systems
175 World of Warcraft 14.0 Warcraft 2004-11-01 MMORPG Blizzard Entertainment
176 Zoo Tycoon 1.0 Zoo Tycoon 2001-10-01 Business simulation Microsoft

177 rows × 6 columns

dataframe2
Name Publisher
0 7 Days to Die The Fun Pimps
1 Age of Empires Microsoft
2 Age of Empires II: The Age of Kings Microsoft
3 Age of Empires III Microsoft
4 Age of Mythology Microsoft
... ... ...
172 Where in the World Is Carmen Sandiego? Broderbund
173 Who Wants to Be a Millionaire? Disney Interactive Studios
174 Wing Commander 3: Heart of the Tiger Electronic Arts
175 World of Warcraft Blizzard Entertainment
176 Zoo Tycoon Blue Fang Games

177 rows × 2 columns

Karena dua dataset memiliki satu kolom yang sama yaitu kolom “Name”, maka dapat digabung berdasarkan kolom “Name” tersebut

dataframe.merge(dataframe2,on='Name')
Name Sales Series Release Genre Developer Publisher
0 7 Days to Die 2.0 7 Days 2016-06-01 Survival horror The Fun Pimps The Fun Pimps
1 Age of Empires 3.0 Age of Empires 1997-10-01 Real-time strategy Ensemble Studios Microsoft
2 Age of Empires II: The Age of Kings 2.0 Age of Empires 1999-09-01 Real-time strategy Ensemble Studios Microsoft
3 Age of Empires III 2.0 Age of Empires 2005-10-01 Real-time strategy Ensemble Studios Microsoft
4 Age of Mythology 1.0 Age of Empires 2002-10-01 Real-time strategy Ensemble Studios Microsoft
... ... ... ... ... ... ... ...
176 Where in the World Is Carmen Sandiego? 4.0 Carmen Sandiego 1985-06-01 Educational Broderbund Broderbund
177 Who Wants to Be a Millionaire? 1.0 NaN 1999-11-01 Trivia game Jellyvision Disney Interactive Studios
178 Wing Commander 3: Heart of the Tiger 1.0 Wing Commander 1994-12-01 Space combat simulation Origin Systems Electronic Arts
179 World of Warcraft 14.0 Warcraft 2004-11-01 MMORPG Blizzard Entertainment Blizzard Entertainment
180 Zoo Tycoon 1.0 Zoo Tycoon 2001-10-01 Business simulation Microsoft Blue Fang Games

181 rows × 7 columns

Plots and Data Visualizations#

Panda memudahkan dalam pembuatan plot dan visualisasi data. Kita dapat membuat plot sederhana dengan menambahkan .plot() ke objek DataFrame yang memiliki data numerik yang sesuai.

dataframe['Publisher'].value_counts()[:10].plot(kind='bar', title='Game Sales:\nMost Frequent "Publisher"')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3801             try:
-> 3802                 return self._engine.get_loc(casted_key)
   3803             except KeyError as err:

~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Publisher'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_16644\1727840546.py in <cell line: 0>()
----> 1 dataframe['Publisher'].value_counts()[:10].plot(kind='bar', title='Game Sales:\nMost Frequent "Publisher"')

~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3805             if self.columns.nlevels > 1:
   3806                 return self._getitem_multilevel(key)
-> 3807             indexer = self.columns.get_loc(key)
   3808             if is_integer(indexer):
   3809                 indexer = [indexer]

~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3802                 return self._engine.get_loc(casted_key)
   3803             except KeyError as err:
-> 3804                 raise KeyError(key) from err
   3805             except TypeError:
   3806                 # If we have a listlike key, _check_indexing_error will raise

KeyError: 'Publisher'
dataframe['Publisher'].value_counts()[:10].plot(kind='pie', title='Game Sales:\nMost Frequent "Publisher"')
<AxesSubplot:title={'center':'Game Sales:\nMost Frequent "Publisher"'}, ylabel='Publisher'>
../_images/defeca9c1305c81955b421703d871aad418c736084179160b1d064f5e698393f.png

Write CSV File#

Untuk mengeluarkan file CSV baru, dapat menggunakan method .to_csv diikuti dengan nama file dalam tanda kutip. Sedangkan untuk excel dapat menggunakan .to_excel

dataframe.to_csv("newGameSales.csv", encoding='utf-8', index=False)