Raspberry Pi_Eng_10.9.4 “uniq” Command


Published Book on Amazon


All of IOT Starting with the Latest Raspberry Pi from Beginner to Advanced – Volume 1
All of IOT Starting with the Latest Raspberry Pi from Beginner to Advanced – Volume 2


출판된 한글판 도서


최신 라즈베리파이(Raspberry Pi)로 시작하는 사물인터넷(IOT)의 모든 것 – 초보에서 고급까지 (상)
최신 라즈베리파이(Raspberry Pi)로 시작하는 사물인터넷(IOT)의 모든 것 – 초보에서 고급까지 (하)


Original Book Contents


10.9.4  "uniq" Command

 

This command performs the function of removing adjacent redundant data when reading data from the input or exporting the data to the output.

 

[Command Format]

uniq  [option]   [input]  [output]

 

[Command Overview]             

   This removes adjacent redundant row data from input or output.

   User privilege          -- Normal user.

 

[Detail Description]

   Because duplication is checked only for adjacent data, if the data is not sorted, this command will not remove redundant data.

   After duplicate data is removed, the first data remains. Therefore, it is normal to sort the data with "sort" command and then execute this command.

 

[Main Option]

-c, --count

prefix lines by the number of occurrences

-d, --repeated

only print duplicate lines

-f, --skip-fields=N

avoid comparing the first N fields

-i, --ignore-case

ignore differences in case when comparing

-s, --skip-chars=N

avoid comparing the first N characters

-u, --unique

only print unique lines

 

[Used Example]

There is a file "customer_list_dup.txt" in the "testdata" directory of the "pi" account, and the contents are as follows.

 

pi@raspberrypi ~/testdata $ cat customer_list_dup.txt

Microsoft

Google

IBM

Samsung

Samsung

Facebook

LG

Microsoft

Samsung

Sony

Hewlett-Packard

 

In the above data, "Microsoft" and "Samsung" have several data. Now run the "uniq" command on this file.

 

pi@raspberrypi ~/testdata $ uniq customer_list_dup.txt

Microsoft

Google

IBM

Samsung

Facebook

LG

Microsoft

Samsung

Sony

Hewlett-Packard

 

In the above result, the duplication of "Samsung" data is removed, and the "Microsoft" data is displayed as it is. This is because the "uniq" command only works on adjacent data.

To solve this problem, we will first sort the data using the "sort" command, and then run the "uniq" command. This time, execute the command as follows. Then, the "Microsoft" data is also displayed with the duplicated data removed.

 

pi@raspberrypi ~/testdata $ sort customer_list_dup.txt | uniq

Facebook

Google

Hewlett-Packard

IBM

LG

Microsoft

Samsung

Sony

 


 

The "uniq" command can get more various informations by using various options. If you use "-c" option, you can get the number of duplicate data together.

 

pi@raspberrypi ~/testdata $ sort customer_list_dup.txt | uniq -c

      1 Facebook

      1 Google

      1 Hewlett-Packard

      1 IBM

      1 LG

      2 Microsoft

      3 Samsung

      1 Sony