Data Analysis (Pandas)#
Pandas merupakan salah satu library Python yang sangat berguna untuk menangani data tabular seperti file CSV, Excel. Pada bagian ini akan ditunjukkan dasar pandas antara lain: -Import pandas -Read a CSV/Excel file -Explore data -Simple plots and data visualizations -Write a CSV/Excel file
Dataset#
PC Games Sales Dataset, diambil dari kaggle yang merupakan kumpulan data game pc terlaris.
Import Pandas#
Untuk menggunakan libary pandas, terlebih dahulu harus import.
import pandas as pd
Read CSV/Excel File#
namafile="../GamesSales.xlsx"
dataframe=pd.read_excel(namafile)
Explore data#
Display Data#
Data dapat ditampilkan berupa dataframe. Catatan: nilai NaN merupakan data kosong (missing data).
dataframe
Name | Sales | Series | Release | Genre | Developer | Publisher | |
---|---|---|---|---|---|---|---|
0 | 7 Days to Die | 2.0 | 7 Days | 2016-06-01 | Survival horror | The Fun Pimps | The Fun Pimps |
1 | Age of Empires | 3.0 | Age of Empires | 1997-10-01 | Real-time strategy | Ensemble Studios | Microsoft |
2 | Age of Empires II: The Age of Kings | 2.0 | Age of Empires | 1999-09-01 | Real-time strategy | Ensemble Studios | Microsoft |
3 | Age of Empires III | 2.0 | Age of Empires | 2005-10-01 | Real-time strategy | Ensemble Studios | Microsoft |
4 | Age of Mythology | 1.0 | Age of Empires | 2002-10-01 | Real-time strategy | Ensemble Studios | Microsoft |
... | ... | ... | ... | ... | ... | ... | ... |
172 | Where in the World Is Carmen Sandiego? | 4.0 | Carmen Sandiego | 1985-06-01 | Educational | Broderbund | Broderbund |
173 | Who Wants to Be a Millionaire? | 1.0 | NaN | 1999-11-01 | Trivia game | Jellyvision | Disney Interactive Studios |
174 | Wing Commander 3: Heart of the Tiger | 1.0 | Wing Commander | 1994-12-01 | Space combat simulation | Origin Systems | Electronic Arts |
175 | World of Warcraft | 14.0 | Warcraft | 2004-11-01 | MMORPG | Blizzard Entertainment | Blizzard Entertainment |
176 | Zoo Tycoon | 1.0 | Zoo Tycoon | 2001-10-01 | Business simulation | Microsoft | Blue Fang Games |
177 rows × 7 columns
Ada beberapa hal penting tentang DataFrame yang ditampilkan di sini:
Indeks Angka yang dicetak di kolom paling kiri dari DataFrame disebut Indeks Pandas. Secara default, Indeks adalah urutan angka yang dimulai dengan nol.
Baris x Kolom Pandas akan menampilkan berapa banyak baris dan kolom dalam kumpulan data ini di bagian bawah output (177 rows x 7 columns).
Shape#
untuk memeriksa berapa banyak baris dan kolom pada dataset dapat menggunakan method .shape
dataframe.shape
(177, 7)
Data Types#
Dalam pandas akan otomatis ditetapkan ke kolom ketika membaca file, dan dalam pandas dapat memeriksa tipe data setiap kolom dengan menggunakan method .dypes.
Pandas Tipe Data object = string float64 = float datetime64 = date time int64 = integer
dataframe.dtypes
Name object
Sales float64
Series object
Release datetime64[ns]
Genre object
Developer object
Publisher object
dtype: object
Display First n Rows/Last n Rows#
Untuk melihat n baris pertama dalam DataFrame, dapat menggunakan method bernama .head(). Untuk melihat n baris terakhir dalam DataFrame, dapat menggunakan method bernama .tail().
dataframe.head(2)
Name | Sales | Series | Release | Genre | Developer | Publisher | |
---|---|---|---|---|---|---|---|
0 | 7 Days to Die | 2.0 | 7 Days | 2016-06-01 | Survival horror | The Fun Pimps | The Fun Pimps |
1 | Age of Empires | 3.0 | Age of Empires | 1997-10-01 | Real-time strategy | Ensemble Studios | Microsoft |
dataframe.tail(3)
Name | Sales | Series | Release | Genre | Developer | Publisher | |
---|---|---|---|---|---|---|---|
174 | Wing Commander 3: Heart of the Tiger | 1.0 | Wing Commander | 1994-12-01 | Space combat simulation | Origin Systems | Electronic Arts |
175 | World of Warcraft | 14.0 | Warcraft | 2004-11-01 | MMORPG | Blizzard Entertainment | Blizzard Entertainment |
176 | Zoo Tycoon | 1.0 | Zoo Tycoon | 2001-10-01 | Business simulation | Microsoft | Blue Fang Games |
Get Info#
Untuk mendapatkan info semua kolom dalam DataFrame dapat menggunakan method .info()
dataframe.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 177 entries, 0 to 176
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 177 non-null object
1 Sales 177 non-null float64
2 Series 141 non-null object
3 Release 177 non-null datetime64[ns]
4 Genre 177 non-null object
5 Developer 177 non-null object
6 Publisher 177 non-null object
dtypes: datetime64[ns](1), float64(1), object(5)
memory usage: 9.8+ KB
Calculate Summary Statistics#
Untuk menghitung ringkasan statistik pada setiap kolom di DataFrame dapat menggunakan method .describe()
dataframe.describe()
Sales | |
---|---|
count | 177.000000 |
mean | 3.116949 |
std | 4.937466 |
min | 1.000000 |
25% | 1.000000 |
50% | 1.500000 |
75% | 3.000000 |
max | 42.000000 |
Select Columns#
Secara teknis satu kolom dataframe termasuk series object
dataframe['Name']
0 7 Days to Die
1 Age of Empires
2 Age of Empires II: The Age of Kings
3 Age of Empires III
4 Age of Mythology
...
172 Where in the World Is Carmen Sandiego?
173 Who Wants to Be a Millionaire?
174 Wing Commander 3: Heart of the Tiger
175 World of Warcraft
176 Zoo Tycoon
Name: Name, Length: 177, dtype: object
Jika ingin menampilkan kolom sebagai dataframe harus menggunakan dua tanda kurung siku.
dataframe[['Name']]
Name | |
---|---|
0 | 7 Days to Die |
1 | Age of Empires |
2 | Age of Empires II: The Age of Kings |
3 | Age of Empires III |
4 | Age of Mythology |
... | ... |
172 | Where in the World Is Carmen Sandiego? |
173 | Who Wants to Be a Millionaire? |
174 | Wing Commander 3: Heart of the Tiger |
175 | World of Warcraft |
176 | Zoo Tycoon |
177 rows × 1 columns
dataframe[['Name','Publisher']]
Name | Publisher | |
---|---|---|
0 | 7 Days to Die | The Fun Pimps |
1 | Age of Empires | Microsoft |
2 | Age of Empires II: The Age of Kings | Microsoft |
3 | Age of Empires III | Microsoft |
4 | Age of Mythology | Microsoft |
... | ... | ... |
172 | Where in the World Is Carmen Sandiego? | Broderbund |
173 | Who Wants to Be a Millionaire? | Disney Interactive Studios |
174 | Wing Commander 3: Heart of the Tiger | Electronic Arts |
175 | World of Warcraft | Blizzard Entertainment |
176 | Zoo Tycoon | Blue Fang Games |
177 rows × 2 columns
Count Values#
dataframe['Publisher'].value_counts()
Electronic Arts 19
Blizzard Entertainment 10
Paradox Interactive 7
Activision 6
GT Interactive 5
..
Tripwire Interactive 1
Warhorse Studios 1
Impressions Game 1
Amanita Design 1
Blue Fang Games 1
Name: Publisher, Length: 96, dtype: int64
Untuk memilih 10 nilai teratas yang paling sering muncul dapat dikombinasikan menggunakan value_counts() dengan python list slicing
dataframe['Publisher'].value_counts()[:10]
Electronic Arts 19
Blizzard Entertainment 10
Paradox Interactive 7
Activision 6
GT Interactive 5
Microsoft 4
Gathering of Developers 4
Capcom 4
Virgin Interactive 3
THQ 3
Name: Publisher, dtype: int64
Filter Data#
Dalam Pandas DataFrame dapat memfilter nilai pada kolom tertentu sesuai yang ingin ditampilkan.
dataframe[dataframe['Publisher'] == 'Electronic Arts']
Name | Sales | Series | Release | Genre | Developer | Publisher | |
---|---|---|---|---|---|---|---|
5 | American McGee's Alice | 1.0 | Alice | 2000-10-01 | Action-adventure, platformer | Rogue Entertainment | Electronic Arts |
13 | Battlefield 1942 | 2.0 | Battlefield | 2002-09-01 | First-person shooter | EA DICE | Electronic Arts |
14 | Battlefield Vietnam | 1.0 | Battlefield | 2004-03-01 | First-person shooter | EA DICE | Electronic Arts |
26 | Command & Conquer 3: Tiberium Wars | 1.0 | Command & Conquer | 2007-03-01 | Real-time strategy | EA Los Angeles | Electronic Arts |
28 | Command & Conquer: Red Alert 2 | 1.0 | Command & Conquer | 2000-10-01 | Real-time strategy | Westwood Pacific | Electronic Arts |
29 | Command & Conquer: Tiberian Sun | 1.0 | Command & Conquer | 1999-08-01 | Real-time strategy | Westwood Studios | Electronic Arts |
36 | Crysis | 3.0 | Crysis | 2007-11-01 | First-person shooter | Crytek | Electronic Arts |
37 | Crysis Warhead | 1.0 | Crysis | 2008-09-01 | First-person shooter | Crytek Budapest | Electronic Arts |
75 | Harry Potter and the Philosopher's Stone | 1.0 | Harry Potter | 2001-11-01 | Action-adventure | KnowWonder | Electronic Arts |
107 | Populous | 4.0 | Populous | 1989-06-01 | God game | Bullfrog Productions | Electronic Arts |
127 | SimCity | 2.0 | SimCity | 2013-03-01 | City-building | Electronic Arts | Electronic Arts |
128 | SimCity 3000 | 5.0 | SimCity | 1999-01-01 | City-building | Maxis | Electronic Arts |
131 | Spore | 2.0 | Spore | 2008-09-01 | God game | Maxis | Electronic Arts |
153 | The Sims | 11.0 | The Sims | 2000-02-01 | Life simulation | Maxis | Electronic Arts |
154 | The Sims 2 | 6.0 | The Sims | 2004-09-01 | Life simulation | Maxis | Electronic Arts |
155 | The Sims 3 | 7.0 | The Sims | 2009-06-01 | Life simulation | Maxis | Electronic Arts |
160 | Theme Park | 3.0 | Theme Park | 2005-06-01 | Construction and management simulation | Bullfrog Productions | Electronic Arts |
171 | Warhammer Online: Age of Reckoning | 1.0 | Warhammer | 2008-09-01 | MMORPG | Mythic Entertainment | Electronic Arts |
174 | Wing Commander 3: Heart of the Tiger | 1.0 | Wing Commander | 1994-12-01 | Space combat simulation | Origin Systems | Electronic Arts |
Missing Data#
Analisis data pastinya menjadi familiar dengan missing value. Missing value adalah hilangnya beberapa data yang diperoleh. Untuk mendapatkan data yang hilang dapat menggunakan method .isna() / .notna(). Panda memiliki cara untuk menangani data yang hilang, seperti yang sudah diketahui, nilai kosong di file yang telah diimport akan muncul sebagai NaN di Pandas DataFrame.
dataframe['Series'].isna().value_counts()
False 141
True 36
Name: Series, dtype: int64
Add Columns#
Untuk menambahkan kolom baru, cukup memasukan nama kolom baru dalam tanda kurung siku
dataframe['dev'] = dataframe['Developer']
dataframe
Name | Sales | Series | Release | Genre | Developer | Publisher | dev | |
---|---|---|---|---|---|---|---|---|
0 | 7 Days to Die | 2.0 | 7 Days | 2016-06-01 | Survival horror | The Fun Pimps | The Fun Pimps | The Fun Pimps |
1 | Age of Empires | 3.0 | Age of Empires | 1997-10-01 | Real-time strategy | Ensemble Studios | Microsoft | Ensemble Studios |
2 | Age of Empires II: The Age of Kings | 2.0 | Age of Empires | 1999-09-01 | Real-time strategy | Ensemble Studios | Microsoft | Ensemble Studios |
3 | Age of Empires III | 2.0 | Age of Empires | 2005-10-01 | Real-time strategy | Ensemble Studios | Microsoft | Ensemble Studios |
4 | Age of Mythology | 1.0 | Age of Empires | 2002-10-01 | Real-time strategy | Ensemble Studios | Microsoft | Ensemble Studios |
... | ... | ... | ... | ... | ... | ... | ... | ... |
172 | Where in the World Is Carmen Sandiego? | 4.0 | Carmen Sandiego | 1985-06-01 | Educational | Broderbund | Broderbund | Broderbund |
173 | Who Wants to Be a Millionaire? | 1.0 | NaN | 1999-11-01 | Trivia game | Jellyvision | Disney Interactive Studios | Jellyvision |
174 | Wing Commander 3: Heart of the Tiger | 1.0 | Wing Commander | 1994-12-01 | Space combat simulation | Origin Systems | Electronic Arts | Origin Systems |
175 | World of Warcraft | 14.0 | Warcraft | 2004-11-01 | MMORPG | Blizzard Entertainment | Blizzard Entertainment | Blizzard Entertainment |
176 | Zoo Tycoon | 1.0 | Zoo Tycoon | 2001-10-01 | Business simulation | Microsoft | Blue Fang Games | Microsoft |
177 rows × 8 columns
Drop Columns#
Untuk menghapus kolom dalam dataframe dapat menggunakan method .drop()
dataframe = dataframe.drop(columns="dev")
dataframe
Name | Sales | Series | Release | Genre | Developer | Publisher | |
---|---|---|---|---|---|---|---|
0 | 7 Days to Die | 2.0 | 7 Days | 2016-06-01 | Survival horror | The Fun Pimps | The Fun Pimps |
1 | Age of Empires | 3.0 | Age of Empires | 1997-10-01 | Real-time strategy | Ensemble Studios | Microsoft |
2 | Age of Empires II: The Age of Kings | 2.0 | Age of Empires | 1999-09-01 | Real-time strategy | Ensemble Studios | Microsoft |
3 | Age of Empires III | 2.0 | Age of Empires | 2005-10-01 | Real-time strategy | Ensemble Studios | Microsoft |
4 | Age of Mythology | 1.0 | Age of Empires | 2002-10-01 | Real-time strategy | Ensemble Studios | Microsoft |
... | ... | ... | ... | ... | ... | ... | ... |
172 | Where in the World Is Carmen Sandiego? | 4.0 | Carmen Sandiego | 1985-06-01 | Educational | Broderbund | Broderbund |
173 | Who Wants to Be a Millionaire? | 1.0 | NaN | 1999-11-01 | Trivia game | Jellyvision | Disney Interactive Studios |
174 | Wing Commander 3: Heart of the Tiger | 1.0 | Wing Commander | 1994-12-01 | Space combat simulation | Origin Systems | Electronic Arts |
175 | World of Warcraft | 14.0 | Warcraft | 2004-11-01 | MMORPG | Blizzard Entertainment | Blizzard Entertainment |
176 | Zoo Tycoon | 1.0 | Zoo Tycoon | 2001-10-01 | Business simulation | Microsoft | Blue Fang Games |
177 rows × 7 columns
Sort Columns#
Kolom dalam dataframe dapat diurutkan menggunakan method .sort_values sesuai dengan parameter by= yang menunjukkan nama kolom yang ingin diurutkan disertai tanda kutip. Misalkan data kolom “Sales” diurutkan berdasarkan nilai yang paling besar ke terkecil seperti dibawah ini.
dataframe.sort_values(by='Sales', ascending=False)
Name | Sales | Series | Release | Genre | Developer | Publisher | |
---|---|---|---|---|---|---|---|
105 | PlayerUnknown's Battlegrounds | 42.0 | NaN | 2017-12-01 | Battle royale | PUBG Studios | Krafton |
95 | Minecraft | 33.0 | Minecraft | 2011-11-01 | Sandbox, survival | Mojang Studios | Mojang Studios |
67 | Garry's Mod | 20.0 | NaN | 2006-11-01 | Sandbox | Facepunch Studios | Valve |
50 | Diablo III | 20.0 | Diablo | 2012-05-01 | Action role-playing | Blizzard Entertainment | Blizzard Entertainment |
146 | Terraria | 17.2 | NaN | 2011-05-01 | Action-adventure | Re-Logic | Re-Logic |
... | ... | ... | ... | ... | ... | ... | ... |
41 | Danganronpa: Trigger Happy Havoc | 1.0 | Danganronpa | 2016-02-01 | Visual novel, adventure | Spike Chunsoft | Spike Chunsoft |
45 | Daryl F. Gates' Police Quest: SWAT | 1.0 | Police Quest | 1995-09-01 | Interactive movie | Sierra Online | Sierra Online |
47 | Deer Hunter | 1.0 | Deer Hunter | 1997-11-01 | Sports | Sunstorm Interactive | WizardWorks |
103 | Phantasmagoria | 1.0 | Phantasmagoria | 1995-07-01 | Interactive movie | Sierra Online | Sierra Online |
176 | Zoo Tycoon | 1.0 | Zoo Tycoon | 2001-10-01 | Business simulation | Microsoft | Blue Fang Games |
177 rows × 7 columns
Check for Duplicates#
Untuk mengecek duplikasi yang ada dalam database dapat menggunakan method .duplicated()
dataframe[dataframe.duplicated(keep=False)]
Name | Sales | Series | Release | Genre | Developer | Publisher | |
---|---|---|---|---|---|---|---|
135 | StarCraft II: Heart of the Swarm | 1.0 | StarCraft | 2013-03-01 | Real-time strategy | Blizzard Entertainment | Blizzard Entertainment |
136 | StarCraft II: Heart of the Swarm | 1.0 | StarCraft | 2013-03-01 | Real-time strategy | Blizzard Entertainment | Blizzard Entertainment |
137 | StarCraft II: Legacy of the Void | 1.0 | StarCraft | 2015-11-01 | Real-time strategy | Blizzard Entertainment | Blizzard Entertainment |
138 | StarCraft II: Legacy of the Void | 1.0 | StarCraft | 2015-11-01 | Real-time strategy | Blizzard Entertainment | Blizzard Entertainment |
Calculate Columns#
Dalam pandas dapat dilakukan perhitungan berbeda pada kolom dengan fungsi bawaan, dan perhitungan ini akan mengabaikan nilai NaN. Antara lain:
Method Perhitungan Pandas |
Penjelasan |
---|---|
.count() |
Jumlah data dalam tabel |
.sum() |
Jumlah nilai |
.mean() |
Nilai rata-rata |
.median() |
Nilai Median/nilai tengah |
.min() |
Nilai minimum/terkecil |
.max() |
Nilai maksimum/terbesar |
.std() |
standard deviation/simpangan baku |
contohnya:
dataframe['Sales'].mean()
3.116949152542373
dataframe['Sales'].max()
42.0
Clean and Transform Data#
Pembersihan data merupakan proses di mana data yang tidak akurat, diformat dengan kurang baik, atau berantakan akan diatur dan diperbaiki. Transformasi data adalah proses mengubah data dari satu format atau struktur ke format atau struktur lainnya. Beberapa method untuk cleaning and transforming data yang dapat digunakan dalam pandas antara lain:
Pandas .str Methods#
Method String Pandas |
Penjelasan |
---|---|
df[‘column_name’].str.lower() |
membuat string di setiap baris menjadi huruf kecil |
df[‘column_name’].str.upper() |
membuat string di setiap baris menjadi huruf besar |
df[‘column_name’].str.title() |
Mengonversi karakter pertama setiap kata menjadi huruf besar dan sisanya menjadi huruf kecil. |
df[‘column_name’].str.replace(‘old string’, ‘new string’) |
Ganti setiap kemunculan pola/regex di Seri/Indeks. |
df[‘column_name’].str.contains(‘pattern/string’) |
Mengecek apakah setiap baris berisi pola/string tersebut |
df[‘column_name’].str.split(‘delim’) |
Mengembalikan substring yang dipisahkan oleh batas yang diberikan |
df[‘column_name’].str.join(list) |
Kebalikan dari .split(), ini menggabungkan elemen dalam list berupa string |
Contoh:
dataframe['Series'].str.upper()
0 7 DAYS
1 AGE OF EMPIRES
2 AGE OF EMPIRES
3 AGE OF EMPIRES
4 AGE OF EMPIRES
...
172 CARMEN SANDIEGO
173 NaN
174 WING COMMANDER
175 WARCRAFT
176 ZOO TYCOON
Name: Series, Length: 177, dtype: object
dataframe[dataframe['Genre'].str.contains('strategy')].head(10)
Name | Sales | Series | Release | Genre | Developer | Publisher | |
---|---|---|---|---|---|---|---|
1 | Age of Empires | 3.0 | Age of Empires | 1997-10-01 | Real-time strategy | Ensemble Studios | Microsoft |
2 | Age of Empires II: The Age of Kings | 2.0 | Age of Empires | 1999-09-01 | Real-time strategy | Ensemble Studios | Microsoft |
3 | Age of Empires III | 2.0 | Age of Empires | 2005-10-01 | Real-time strategy | Ensemble Studios | Microsoft |
4 | Age of Mythology | 1.0 | Age of Empires | 2002-10-01 | Real-time strategy | Ensemble Studios | Microsoft |
21 | Civilization II | 1.0 | Civilization | 1996-02-01 | Turn-based strategy, 4X | MicroProse | MicroProse |
22 | Civilization III | 2.0 | Civilization | 2001-10-01 | Turn-based strategy, 4X | Firaxis Games | Infogrames |
23 | Civilization IV | 3.0 | Civilization | 2005-10-01 | Turn-based strategy, 4X | Firaxis Games | 2K Games & Aspyr |
24 | Civilization V | 8.0 | Civilization | 2010-09-01 | Turn-based strategy, 4X | Firaxis Games | 2K Games & Aspyr |
25 | Command & Conquer | 3.0 | Command & Conquer | 1995-08-01 | Real-time strategy | Westwood Studios | Virgin Interactive |
26 | Command & Conquer 3: Tiberium Wars | 1.0 | Command & Conquer | 2007-03-01 | Real-time strategy | EA Los Angeles | Electronic Arts |
Applying Functions#
Dengan menggunakan method .apply(), dapat menjalankan sebuah fungsi pada setiap baris dalam kolom pada dataframe.
def splitText(text):
splitdulu = text.split()
return splitdulu
dataframe['Name'].apply(splitText)
0 [7, Days, to, Die]
1 [Age, of, Empires]
2 [Age, of, Empires, II:, The, Age, of, Kings]
3 [Age, of, Empires, III]
4 [Age, of, Mythology]
...
172 [Where, in, the, World, Is, Carmen, Sandiego?]
173 [Who, Wants, to, Be, a, Millionaire?]
174 [Wing, Commander, 3:, Heart, of, the, Tiger]
175 [World, of, Warcraft]
176 [Zoo, Tycoon]
Name: Name, Length: 177, dtype: object
Merge Dataframe#
Jika ada dua kumpulan dataset yang terbagi, dalam pandas dapat juga digabungkan menjadi satu dengan menggunakan method .merge().
namafile="Games Sales- Dev.xlsx"
dataframe=pd.read_excel(namafile)
namafile2="Games Sales- Publisher.xlsx"
dataframe2=pd.read_excel(namafile2)
dataframe
Name | Sales | Series | Release | Genre | Developer | |
---|---|---|---|---|---|---|
0 | 7 Days to Die | 2.0 | 7 Days | 2016-06-01 | Survival horror | The Fun Pimps |
1 | Age of Empires | 3.0 | Age of Empires | 1997-10-01 | Real-time strategy | Ensemble Studios |
2 | Age of Empires II: The Age of Kings | 2.0 | Age of Empires | 1999-09-01 | Real-time strategy | Ensemble Studios |
3 | Age of Empires III | 2.0 | Age of Empires | 2005-10-01 | Real-time strategy | Ensemble Studios |
4 | Age of Mythology | 1.0 | Age of Empires | 2002-10-01 | Real-time strategy | Ensemble Studios |
... | ... | ... | ... | ... | ... | ... |
172 | Where in the World Is Carmen Sandiego? | 4.0 | Carmen Sandiego | 1985-06-01 | Educational | Broderbund |
173 | Who Wants to Be a Millionaire? | 1.0 | NaN | 1999-11-01 | Trivia game | Jellyvision |
174 | Wing Commander 3: Heart of the Tiger | 1.0 | Wing Commander | 1994-12-01 | Space combat simulation | Origin Systems |
175 | World of Warcraft | 14.0 | Warcraft | 2004-11-01 | MMORPG | Blizzard Entertainment |
176 | Zoo Tycoon | 1.0 | Zoo Tycoon | 2001-10-01 | Business simulation | Microsoft |
177 rows × 6 columns
dataframe2
Name | Publisher | |
---|---|---|
0 | 7 Days to Die | The Fun Pimps |
1 | Age of Empires | Microsoft |
2 | Age of Empires II: The Age of Kings | Microsoft |
3 | Age of Empires III | Microsoft |
4 | Age of Mythology | Microsoft |
... | ... | ... |
172 | Where in the World Is Carmen Sandiego? | Broderbund |
173 | Who Wants to Be a Millionaire? | Disney Interactive Studios |
174 | Wing Commander 3: Heart of the Tiger | Electronic Arts |
175 | World of Warcraft | Blizzard Entertainment |
176 | Zoo Tycoon | Blue Fang Games |
177 rows × 2 columns
Karena dua dataset memiliki satu kolom yang sama yaitu kolom “Name”, maka dapat digabung berdasarkan kolom “Name” tersebut
dataframe.merge(dataframe2,on='Name')
Name | Sales | Series | Release | Genre | Developer | Publisher | |
---|---|---|---|---|---|---|---|
0 | 7 Days to Die | 2.0 | 7 Days | 2016-06-01 | Survival horror | The Fun Pimps | The Fun Pimps |
1 | Age of Empires | 3.0 | Age of Empires | 1997-10-01 | Real-time strategy | Ensemble Studios | Microsoft |
2 | Age of Empires II: The Age of Kings | 2.0 | Age of Empires | 1999-09-01 | Real-time strategy | Ensemble Studios | Microsoft |
3 | Age of Empires III | 2.0 | Age of Empires | 2005-10-01 | Real-time strategy | Ensemble Studios | Microsoft |
4 | Age of Mythology | 1.0 | Age of Empires | 2002-10-01 | Real-time strategy | Ensemble Studios | Microsoft |
... | ... | ... | ... | ... | ... | ... | ... |
176 | Where in the World Is Carmen Sandiego? | 4.0 | Carmen Sandiego | 1985-06-01 | Educational | Broderbund | Broderbund |
177 | Who Wants to Be a Millionaire? | 1.0 | NaN | 1999-11-01 | Trivia game | Jellyvision | Disney Interactive Studios |
178 | Wing Commander 3: Heart of the Tiger | 1.0 | Wing Commander | 1994-12-01 | Space combat simulation | Origin Systems | Electronic Arts |
179 | World of Warcraft | 14.0 | Warcraft | 2004-11-01 | MMORPG | Blizzard Entertainment | Blizzard Entertainment |
180 | Zoo Tycoon | 1.0 | Zoo Tycoon | 2001-10-01 | Business simulation | Microsoft | Blue Fang Games |
181 rows × 7 columns
Plots and Data Visualizations#
Panda memudahkan dalam pembuatan plot dan visualisasi data. Kita dapat membuat plot sederhana dengan menambahkan .plot() ke objek DataFrame yang memiliki data numerik yang sesuai.
dataframe['Publisher'].value_counts()[:10].plot(kind='bar', title='Game Sales:\nMost Frequent "Publisher"')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3801 try:
-> 3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Publisher'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_16644\1727840546.py in <cell line: 0>()
----> 1 dataframe['Publisher'].value_counts()[:10].plot(kind='bar', title='Game Sales:\nMost Frequent "Publisher"')
~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
3805 if self.columns.nlevels > 1:
3806 return self._getitem_multilevel(key)
-> 3807 indexer = self.columns.get_loc(key)
3808 if is_integer(indexer):
3809 indexer = [indexer]
~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
-> 3804 raise KeyError(key) from err
3805 except TypeError:
3806 # If we have a listlike key, _check_indexing_error will raise
KeyError: 'Publisher'
dataframe['Publisher'].value_counts()[:10].plot(kind='pie', title='Game Sales:\nMost Frequent "Publisher"')
<AxesSubplot:title={'center':'Game Sales:\nMost Frequent "Publisher"'}, ylabel='Publisher'>

Write CSV File#
Untuk mengeluarkan file CSV baru, dapat menggunakan method .to_csv diikuti dengan nama file dalam tanda kutip. Sedangkan untuk excel dapat menggunakan .to_excel
dataframe.to_csv("newGameSales.csv", encoding='utf-8', index=False)