A CSV (Comma Separated Values) file is a file with values seperated by a comma. Its often used to import and export with databases and spreadsheets.
Values are mostly seperated by comma. Sometimes another character is used like a semicolon, the seperation character is called a delimiter.
Related Course: Data Analysis with Pandas and Python
What is a CSV file?
A CSV file is a type of plain text file that contains values that are separated by a delimiter. The .csv
extension is short for comma separated value
, because the delimter is often a comma.
Excel can open CSV files. Web Apps allow export of data into a CSV file.
By default a csv file has rows and columns, as it’s a representation of tabular data. You can see every value as a cell and every line as a row.
CSV Sample File
You can represent a table in a CSV (comma separated values) file. The text is the tabular data. Each line of the csv file is a line inside the table, where every value (seperated by comma) is a delimiter (a comma, a semicolon or another delimiter*).
Because CSV files are plain text data, almost all programming languages support it. You can easily parse it in Python.
Table Data
You could have this table:
Name | Age | Salary |
---|---|---|
Chris | 20 | $3600 |
Harry | 25 | $3200 |
Barry | 30 | $3000 |
CSV data
And represent the same data as a .csv
file.
Chris,20,3600
Harry,25,3200
Barry,30,3000
Here each row in the file matches a row in the table, and each value is a cell in the table.
Read CSV
In Python, there are two common ways to read csv files:
- read csv with the csv module
- read csv with the pandas module (see bottom)
Python CSV Module
Python comes with a module to parse csv files, the csv
module. You can use this module to read and write data, without having to do string operations and the like.
Read a CSV File
Lets get into how to read a csv file. You can use the csv module. The module is already installed, just import it with import csv
.
Then you’ll want to open the csv file, you can with:
with open('office.csv') as csvfile: |
Then create a reader object csv.reader()
where the parameters are the filename and the delimiter.
This sounds hard, but it is as simple as:
csvReader = csv.reader(csvfile, delimiter=',') |
Then you can loop over the rows and parse them or display them.
import csv |
When you run the program, it will show you every row as a list
➜ ~ python3 program.py
['Chris', '20', '3600']
['Harry', '25', '3200']
['Barry', '30', '3000']
Because it is a list, you can access cells using square brackets.
The first cell is row[0]
, the second cell row[1]
etcetera.
for row in csvReader: |
Read a CSV as a Dict
If you want to read the data into a dictionary instead of a list, you can do that.
The csv module comes with a DictReader
. This lets you read a csv file as dictionary.
If you want to read it as a dictionary, make sure to include a header because that will be included in the key-value mapping.
name,age,salary
Chris,20,3600
Harry,25,3200
Barry,30,3000
Then your program can read the csv with csv.DictReader(open(filename))
import csv |
The program then outputs dictionaries:
OrderedDict([('name', 'Chris'), ('age', '20'), ('salary', '3600')])
OrderedDict([('name', 'Harry'), ('age', '25'), ('salary', '3200')])
OrderedDict([('name', 'Barry'), ('age', '30'), ('salary', '3000')])
An OrderedDict
functions exactly the same as normal dict
.
Related Course: Data Analysis with Pandas and Python
CSV Module Functions
The csv
module comes with many different functions:
- csv.field_size_limit – return maximum field size
- csv.get_dialect – get the dialect which is associated with the name
- csv.list_dialects – show all registered dialects
- csv.reader – read data from a csv file
- csv.register_dialect - associate dialect with name
- csv.writer – write data to a csv file
- csv.unregister_dialect - delete the dialect associated with the name the dialect registry
- csv.QUOTE_ALL - Quote everything, regardless of type.
- csv.QUOTE_MINIMAL - Quote fields with special characters
- csv.QUOTE_NONNUMERIC - Quote all fields that aren’t numbers value
- csv.QUOTE_NONE – Don’t quote anything in output
This article focuses only on the csv.reader
, that lets you read a file.
Read CSV with Pandas
Pandas is a data analysis library. It often used in data science. If you work with data a lot, using the pandas
module is way better.
First we start with some data. Lets say you have a csv file containing nation statistics, nations.csv
:
United States, Washington, English, US dollar
Canada, Ottawa, English and French, Canadian dollar
Germany, Berlin, German, Euro
By default, the pandas
module is not installed. You can install it with Python package manager pip. After installation, load it like this:
import pandas as pd |
Pandas has a function read csv files, .read_csv(filename)
.
This loads the csv file into a Pandas data frame.
df = pd.read_csv('nations.csv') |
Pandas works with dataframes which hold all data. Data frames are really cool data structures, they let you grab an entire row at once, by using it’s header name. (The header was the first line in the csv file)
df['Country'] |
Example
The program below reads a csv file with pandas.:
import pandas as pd |
This outputs both the dataframe print(df)
and a row df['Country']
➜ ~ python3 sample.py
Country Capital Language Currency
0 United States Washington English US dollar
1 Canada Ottawa English and French Canadian dollar
2 Germany Berlin German Euro
United States
Canada
Germany
You can iterate row by row like this:
import pandas as pd |
Delimiter
If you have another delimter than the default comma, say a pipe then you an use the parameter sep =
.
import pandas as pd |
Header
If your csv file does not include a header, you can either remove it from the file or change the program to have the parameter header = None
.
import pandas as pd |
If it’s on another line (say the 2nd line) you can use:
import pandas as pd |
pandas names
If your csv data does not have a header, don’t worry. You can define the columns in code while opening the file:
import pandas as pd |
pandas use_cols
If you only want to load specific columns, you can specify the parameter usecols
.
This is useful if you have a large csv with a lot of columns. You can define one ore more columns:
import pandas as pd |
Related Course: Data Analysis with Pandas and Python