A CSV (Comma Separated Values) file is a file with values seperated by a comma. Its often used to import and export with databases and spreadsheets.

Values are mostly seperated by comma. Sometimes another character is used like a semicolon, the seperation character is called a delimiter.

Related Course: Data Analysis with Pandas and Python

What is a CSV file?

A CSV file is a type of plain text file that contains values that are separated by a delimiter. The .csv extension is short for comma separated value, because the delimter is often a comma.

Excel can open CSV files. Web Apps allow export of data into a CSV file.

By default a csv file has rows and columns, as it’s a representation of tabular data. You can see every value as a cell and every line as a row.

CSV Sample File

You can represent a table in a CSV (comma separated values) file. The text is the tabular data. Each line of the csv file is a line inside the table, where every value (seperated by comma) is a delimiter (a comma, a semicolon or another delimiter*).

Because CSV files are plain text data, almost all programming languages support it. You can easily parse it in Python.

Table Data

You could have this table:

NameAgeSalary
Chris20$3600
Harry25$3200
Barry30$3000

CSV data

And represent the same data as a .csv file.

Chris,20,3600
Harry,25,3200
Barry,30,3000

Here each row in the file matches a row in the table, and each value is a cell in the table.

Read CSV

In Python, there are two common ways to read csv files:

  • read csv with the csv module
  • read csv with the pandas module (see bottom)

Python CSV Module

Python comes with a module to parse csv files, the csv module. You can use this module to read and write data, without having to do string operations and the like.

Read a CSV File

Lets get into how to read a csv file. You can use the csv module. The module is already installed, just import it with import csv.

Then you’ll want to open the csv file, you can with:

with open('office.csv') as csvfile:

Then create a reader object csv.reader() where the parameters are the filename and the delimiter.

This sounds hard, but it is as simple as:

csvReader = csv.reader(csvfile, delimiter=',')

Then you can loop over the rows and parse them or display them.

import csv

with open('office.csv') as csvfile:
csvReader = csv.reader(csvfile, delimiter=',')
for row in csvReader:
print(row)

When you run the program, it will show you every row as a list

➜  ~ python3 program.py
['Chris', '20', '3600']
['Harry', '25', '3200']
['Barry', '30', '3000']

Because it is a list, you can access cells using square brackets.
The first cell is row[0], the second cell row[1] etcetera.

for row in csvReader:
print(row[0])
print(row[1])

Read a CSV as a Dict

If you want to read the data into a dictionary instead of a list, you can do that.
The csv module comes with a DictReader. This lets you read a csv file as dictionary.

If you want to read it as a dictionary, make sure to include a header because that will be included in the key-value mapping.

name,age,salary
Chris,20,3600
Harry,25,3200
Barry,30,3000

Then your program can read the csv with csv.DictReader(open(filename))

import csv

csvReader = csv.DictReader(open("students.csv"))
for row in csvReader:
print(row)

The program then outputs dictionaries:

OrderedDict([('name', 'Chris'), ('age', '20'), ('salary', '3600')])
OrderedDict([('name', 'Harry'), ('age', '25'), ('salary', '3200')])
OrderedDict([('name', 'Barry'), ('age', '30'), ('salary', '3000')])

An OrderedDict functions exactly the same as normal dict.

Related Course: Data Analysis with Pandas and Python

CSV Module Functions

The csv module comes with many different functions:

  • csv.field_size_limit – return maximum field size
  • csv.get_dialect – get the dialect which is associated with the name
  • csv.list_dialects – show all registered dialects
  • csv.reader – read data from a csv file
  • csv.register_dialect - associate dialect with name
  • csv.writer – write data to a csv file
  • csv.unregister_dialect - delete the dialect associated with the name the dialect registry
  • csv.QUOTE_ALL - Quote everything, regardless of type.
  • csv.QUOTE_MINIMAL - Quote fields with special characters
  • csv.QUOTE_NONNUMERIC - Quote all fields that aren’t numbers value
  • csv.QUOTE_NONE – Don’t quote anything in output

This article focuses only on the csv.reader, that lets you read a file.

Read CSV with Pandas

python pandas read excel

Pandas is a data analysis library. It often used in data science. If you work with data a lot, using the pandas module is way better.

First we start with some data. Lets say you have a csv file containing nation statistics, nations.csv:

Country,Capital,Language,Currency
United States, Washington, English, US dollar
Canada, Ottawa, English and French, Canadian dollar
Germany, Berlin, German, Euro

By default, the pandas module is not installed. You can install it with Python package manager pip. After installation, load it like this:

import pandas as pd

Pandas has a function read csv files, .read_csv(filename).

This loads the csv file into a Pandas data frame.

df = pd.read_csv('nations.csv')

Pandas works with dataframes which hold all data. Data frames are really cool data structures, they let you grab an entire row at once, by using it’s header name. (The header was the first line in the csv file)

df['Country']

Example

The program below reads a csv file with pandas.:

import pandas as pd
import numpy as np

df = pd.read_csv('nations.csv')
print(df)
print('\n')

for country in df['Country']:
print(country)

This outputs both the dataframe print(df) and a row df['Country']

➜  ~ python3 sample.py
         Country      Capital             Language          Currency
0  United States   Washington              English         US dollar
1         Canada       Ottawa   English and French   Canadian dollar
2        Germany       Berlin               German              Euro


United States
Canada
Germany

You can iterate row by row like this:

import pandas as pd

df = pd.read_csv('nations.csv')
for index, row in df.iterrows():
print(row['Country'], row['Capital'],row['Language'])

Delimiter

If you have another delimter than the default comma, say a pipe then you an use the parameter sep =.

import pandas as pd
df = pd.read_csv("data.csv", sep="|")

If your csv file does not include a header, you can either remove it from the file or change the program to have the parameter header = None.

import pandas as pd
df = pd.read_csv("data.csv", header = None)

If it’s on another line (say the 2nd line) you can use:

import pandas as pd
df = pd.read_csv("data.csv", header = 1)

pandas names

If your csv data does not have a header, don’t worry. You can define the columns in code while opening the file:

import pandas as pd
df = pd.read_csv('nations.csv', names=['Country','Capital','Language','Currency'])

pandas use_cols

If you only want to load specific columns, you can specify the parameter usecols.
This is useful if you have a large csv with a lot of columns. You can define one ore more columns:

import pandas as pd
df = pd.read_csv("nations.csv", usecols = ["Country", "Currency"])

Related Course: Data Analysis with Pandas and Python