The what:
I have a file as below:
$ cat sample.csv
DEPT_ID,EMP_NAME
100,Keshava
100,Madhava
100,Narayana
100,Govinda
100,Vishnu
110,Madhusudana
110,Trivikrama
110,Vamana
110,Shridhara
110,Hrishikesha
110,Padmanabha
120,Damodara
120,Sankashana
120,Vasudeva
120,Pradyumna
130,Aniruddha
130,Purushottama
130,Adhokshaja
130,Narasimha
140,Achyuta
140,Janardana
140,Upendra
140,Hari
140,Krishna
The challenge is to make it look like below:
$ cat sample.csv
DEPT_ID: EMP_NAME (SEPARATED BY COMMA)
100: Keshava, Madhava, Narayana, Govinda, Vishnu
110: Madhusudana, Trivikrama, Vamana, Shridhara, Hrishikesha, Padmanabha
120: Damodara, Sankashana, Vasudeva, Pradyumna, Aniruddha
130: Purushottama, Adhokshaja, Narasimha
140: Achyuta, Janardana, Upendra, Hari, Krishna
The How:
Method 1 - Using pure shell commands
tail -n +2 sample.csv | cut -d "," -f1 | sort -u | while read dept_id; do emp_names=$(grep "${dept_id}" sample.csv | cut -d "," -f2 | tr '\n' ','); echo $dept_id: $emp_names; done | sed -e 's/,$//g'
100: Keshava,Madhava,Narayana,Govinda,Vishnu
110: Madhusudana,Trivikrama,Vamana,Shridhara,Hrishikesha,Padmanabha
120: Damodara,Sankashana,Vasudeva,Pradyumna
130: Aniruddha,Purushottama,Adhokshaja,Narasimha
140: Achyuta,Janardana,Upendra,Hari,Krishna
Explanation:
The logic:
- Get the unique dept_id values
- Using the unique dept_id values, get the employee names who belong to one particular department
1. tail -n +2 sample.csv ==> Print from the second line of the file (since the first line contains the column names, the first line can be ignored).
2. cut -d"," -f1 ==> Get the first column of the CSV file by splitting it up and printing only the first column
3. sort -u ==> Sort the resultant output (by default, alphabetically) and print only the unique values (-u)
### From Steps 1-3, I have obtained the unique dept_id values. Now, for each dept_id, get the list of employees (EMP_NAME column) and separate them using comma
4. while read dept_id ==> Looping through the unique values of the dept_id to retrieve the EMP_NAME
5. emp_names=$(grep "${dept_id}" sample.csv | cut -d "," -f2 | tr '\n' ',') ==> Using variable emp_names for the sake of simplicity
grep "${dept_id}" sample.csv | cut -d "," -f2
==> The command "grep" gets the dept_id and emp_name for a particular dept_id and "cut" command will get the list of all EMP_NAME columns for the given dept_id
tr '\n' ',' ==> This command converts the emp_names from
n
newline (\n)
separated to comma separated.
6. echo $dept_id: $emp_names ==> prints out the dept_id and the related emp_names
Method 2 - Using Python
# Import the csv package
import csv
# Declare source and target files as variables
source_file_name = '/full/path/of/source/file/sample.csv'
# Open the file for "read"
with open(source_file_name, 'r') as fh:
lines = fh.readlines()
# Print the lines to ensure that it's read properly. Comment out after testing.
# print(type(lines), "\n", lines[1:])
# Store the contents of the converted file in a dict object
single_line = {}
# Loop through the source file
for line in lines[1:]:
# Split the contents of each line based on comma. The first column would be dept_id and the 2nd - emp_name
dept_id, emp_name = str(line).split(',')
# Strip each column value to ensure that there are no special/extra characters.
dept_id = str(dept_id).strip('\r').strip()
emp_name = str(emp_name).strip('\r').strip()
# Check if dept_id already exists in the single_line dictionary. If it does, then append the emp_name to the
# existing value. If it doesn't, then enter a new value with the dept_id as key and emp_name as value
if dept_id in single_line:
single_line[dept_id].append(emp_name)
else:
single_line[dept_id] = [emp_name]
# Print the dict object to ensure that it's converted properly. Comment out after testing.
print("\nsingle_line Dict object Contents:\n-----------------------------\n",
single_line, "\n\nOutput in Custom Format:\n-----------------------------\n")
# Loop through the dictionary object - single_line and print the values.
for key, v in single_line.items():
# print(key)
# Concatenate all the values of emp_name, separated by comma, for one dept_id into a single variable
val = ','.join(v)
print(key,":",val)
print("\n\n")
Output:
$ python csv_to_custom_format.py
single_line Dict object Contents:
-----------------------------
{'100': ['Keshava', 'Madhava', 'Narayana', 'Govinda', 'Vishnu'], '110': ['Madhusudana', 'Trivikrama', 'Vamana', 'Shridhara', 'Hrishikesha', 'Padmanabha'], '120': ['Damodara', 'Sankashana', 'Vasudeva', 'Pradyumna'], '130': ['Aniruddha', 'Purushottama', 'Adhokshaja', 'Narasimha'], '140': ['Achyuta', 'Janardana', 'Upendra', 'Hari', 'Krishna']}
Output in Custom Format:
-----------------------------
100 : Keshava,Madhava,Narayana,Govinda,Vishnu
110 : Madhusudana,Trivikrama,Vamana,Shridhara,Hrishikesha,Padmanabha
120 : Damodara,Sankashana,Vasudeva,Pradyumna
130 : Aniruddha,Purushottama,Adhokshaja,Narasimha
140 : Achyuta,Janardana,Upendra,Hari,Krishna
Comments