Uniq Command – Remove Duplicate Lines from a Linux Files

As Linux users, we interact with various types of files on a regular basis. One of the most common file types on any computer system is a plain text file. Oftentimes, it is a very common requirement to find the required text in these files.

However, this simple task quickly becomes annoying if the file contains duplicate entries. In such cases, we can use the uniq command to filter duplicate text efficiently.

In Linux, we can use the uniq command that comes in handy when we want to list or remove duplicate lines that present adjacently.

Apart from this, we can also use the uniq command to count duplicate entries. It is important to note that, the uniq command works only when duplicate entries are adjacent.

In this simple guide, we will discuss the uniq command in-depth with practical examples in Linux.

uniq Command Syntax

The syntax of the uniq command is very easy to understand and is similar to other Linux commands:

$ uniq [OPTIONS] [INPUT] [OUTPUT]

It is important to note that, all the options and parameters of the uniq command are optional.

Creating Sample Text File

To begin, first, let’s create a simple text file with a vi editor and add the following duplicate contents located in the adjacent lines.

$ vi linux-distributions.txt
$ cat linux-distributions.txt
Create Text File in Linux
Create Text File in Linux

Now, let’s use this file to understand the usage of the uniq command.

1. Remove Duplicate Lines from a Text File

One of the common uses of the uniq command is to remove the adjacent duplicate lines from the text file as shown.

$ uniq linux-distributions.txt
Remove Duplicated Lines in a Text File
Remove Duplicated Lines in a Text File

In the above output, we can see that the uniq command has successfully eliminated the duplicated lines.

2. Count Duplicated Lines in a Text File

In the previous example, we saw how to remove duplicate lines. However, sometimes we also want to know how many times the duplicate line appears.

We can achieve this using the -c option as shown in the below example:

$ uniq -c linux-distributions.txt
Count Duplicate Lines in File
Count Duplicate Lines in the File

In the above output, the first column represents the number of times the line is repeated.

3. Remove Duplicates with Case Insensitive

By default, the uniq command works in a case-sensitive way. However, we can disable this default behavior by using the -i option as shown.

$ uniq -i linux-distributions.txt
Remove Duplicate Lines Ignore Case Insensitive
Remove Duplicate Lines Ignore Case Insensitive

In this example, we can observe that now, the string Ubuntu and UBUNTU is treated as same. Along with this, the same happens with the string CentOS and CENTOS.

4. Print Only Duplicate Lines from a File

Sometimes, we want just want to print the duplicate lines from a text file, in that case, you can use the -d option as shown.

$ uniq -d linux-distributions.txt
Print Only Duplicate Lines from File
Print Only Duplicate Lines from File

In the above output, we can see that the uniq command shows the duplicate entry from each group.

5. Print All Duplicate Lines from a File

In the previous example, we saw how to display a duplicate line from each group. In a similar way, we can also show all the duplicates lines using the -D option:

$ uniq -D linux-distributions.txt
Print All Duplicate Lines from File
Print All Duplicate Lines from the File

The above output doesn’t show the text UBUNTU, CENTOS, and Alpine as those are uniq lines.

6. Show Duplicate Lines By Groups in a New Line

In the previous example, we printed all duplicate lines. However, we can make the same output more readable by separating each group by a new line.

Let’s use the --all-repeated=separate option to achieve the same:

$ uniq --all-repeated=separate linux-distributions.txt
Show Duplicate Lines By Groups
Show Duplicate Lines By Groups

In the above output, we can see that each repeated group is separated by a new line delimiter.

7. Print Only Unique Lines from a File

In previous examples, we saw how to print duplicate lines. Similarly, we can also instruct the uniq command to print non-duplicate lines only.

Now, let’s use the -u option to print unique lines only:

$ uniq -u linux-distributions.txt
Find Unique Lines in File
Find Unique Lines in the File

Here, we can see that the uniq command displays the lines that aren’t duplicated.

8. Remove Non-adjacent Duplicate Lines in File

One of the trivial limitations of the uniq command is that it only removes adjacent duplicate entries. However, sometimes we want to remove the duplicate entries regardless of their order in the given file.

In such cases, first, we can sort the file contents and then pipe that output to the uniq command as shown.

$ sort linux-distributions.txt | uniq
Remove Non-adjacent Duplicate Lines in File
Remove Non-adjacent Duplicate Lines in File

In this example, we used the sort and uniq commands without any options. However, we can also combine other supported options with these commands.

Conclusion

In this guide, we learned the uniq command using practical examples. Do you know of any other best example of the uniq command in Linux? Let us know your views in the comments below.

Similar Posts