What happens when you type “ls -l *.c” in a linux command-line shell?

Francisco Calixto
7 min readApr 12, 2021

Giving commands to a computer is just telling it to perform any specific task, there are different ways in which you can give a computer these instructions. In this blog we are going to be talking about how does a command-line shell works, and what is exactly happening internally when the user types ‘ls -l *.c’ in a linux command line.

First off, some important terms which are going to be the backbone concepts of this blog must be defined.

What is Linux, and what is an operating system?

As we mentioned in the introduction this blog is focused on Linux, a Unix-based operating system. An operating system is software that manages all of the hardware resources associated with your desktop or laptop. To put it simply, the operating system manages the communication between your software and your hardware. Without the operating system (OS), the software wouldn’t function.

What is a command line Shell?

In computer science, the Shell is a program which exposes the operating system software to a human user or even to other programs. The Shell works as a ‘portal’ between a user and the kernel (main element of the linux OS). In general, operating systems use either GUI (graphical user interface) or CLI (command line interface); and as we mentioned before, we are focusing on the linux command-line shell.

Example of a GUI (Graphical User Interface)
Example of CLI (Command Line interface)

If you made this far into the blog you are probably still interested in knowing what exactly happens when `ls -l *.c` is typed in a linux command-line shell. Even though the answer to this question can be very simple, as everything that exists, there is always a more deep and complex explanation.

The following flowchart explains the main logic line of a working shell. We will be explaining what happens in each step when you type ‘ls -l *.c’ in a command-line shell.

Simplified command line shell flowchart.

As you can see the first major thing that happens when you start a command-line shell is the program requires an input from the user. Usually what happens here is that the program will display a prompt ending with a special symbol (either $ or #). This prompt can contain useful information such as the username, current directory and the name of the machine. The ‘$’ is the symbol for normal users without any privilege, whereas ‘#’ is the symbol that appears at the end of the prompt whenever the administrator or ‘root’ user is logged in.

Prompt example.

The way that the program manages to display this message, and many other things work in the shell, is through processes called system calls. System calls create an interface between the application and the linux kernel therefore requesting for a task to be done directly with the operating system. In the case of the prompt which we mentioned earlier, the system call ‘write’ is used. This system calls works by writing data from a buffer into a device, which in this case is the standard output, where the user will be reading from.

The user will then input a command (plus options) after the prompt has been displayed, and the shell will capture this command (and respective options) with the ‘getline’ function. This function will receive whatever was written into the standard input and append everything to a single string most known in this case as ‘line’.

$ ls -l *.c

Now that the program has the string containing whatever the user wants to execute the next step is to separate every argument into individual ‘tokens’; this step is known as tokenizing. For this task the most common procedure is to use a C library function called ‘strtok’, to which you need to pass a string to tokenize and separators for the function to identify, in this case the separators are the spaces separating each argument and the newline character (\n) at the end of the string.

Process of tokenizing the line passed by the user into separate strings.

The next step now is to compare in a way whether the first token is a built-in. A built-in is a command of a function which comes with the shell, nothing needs to be added for it to work and its use does not require a new process to be created. In the case of ‘ls -l *.c’ ‘ls’ is not a built-in, so the program is required to create a child process which will inherit many attributes from the father process and when it ends it can finish without affecting the father process.

The child process is created with the ‘fork’ system call and after it has finished its task(s) it is terminated (whenever the father process calls the ‘wait’ system call).

The task now is to see what to do with the first argument, as it is not a built-in the algorithm moves on and identifies whether the command is an executable in the current directory, an executable in the PATH or the entire path of an executable. In case it is neither of those options, there will no option rather than returning an error message.

In the case of ‘ls’, taking into consideration there does not exist an executable called ‘ls’ in the current directory (it isn’t a good practice to do that) the program will search for the command in the PATH.

In linux, the PATH is a variable that tells the shell in which directories to search for executable programs.

PATH example, the ‘:’ spearate each of the directories.

The shell will then search for this command in all of the directories listed by the PATH. In our case ‘ls’ is found in the directory /bin and if we execute the command ‘which ls’ it will confirm to us that ‘ls’ exists in the PATH.

Once the directory where ‘ls’ belongs to is found, the next step is to call the ‘execve’ system call. This powerful system call takes as parameters: the full pathname of the command and an array of pointers to strings, where each string is an argument of the initial string once obtained by the function ‘getline’, which was tokenized. The execve function also takes a third parameter, which is another array of pointers pointing to strings. This time the strings have the form ‘key=value’, where each key is an environmental variable equal to its respective value.

Environment variables are a set of dynamic named values, stored within the system that are used by applications launched in shells or subshells. These variables allow you to customize how the system works and the behavior of the applications on the system. The PATH, for instance, is an environmental variable.

Execve prototype:

int execve(/bin/ls ,  tokenized_argument_line ,  environment)

The execve function is then called with the arguments as defined above, this system call replaces the program that is currently being run by a new program. This new process is initialized with new stack, heap, and (initialized and uninitialized) data segments.

Internally the command specified by the its full pathname is being executed with all of the options that were assigned to it, in the case of this example the ‘-l’ option and the ‘*.c’ option. The first option is simply a command option that enables the command to list things in its long (thus the ‘l’) format, the latter specifies what should the command be applied to. Here we introduce the ‘*’ special character, which is a member of the linux wildcards. This wildcards work by replacing an undefined set of characters in order to generalize a task for the shell, in our case ‘*.c’ means every file that ends with a ‘.c’ extension, as the wildcard is used in the left. But this wildcard can be used at the right of characters to specify files starting with a pattern or even in the middle of a word.

The execve system call will now execute the command which all the arguments passed and the result will be printed into the standard output, at this stage the process that started by this function terminates and the shell moves on with the calling process that was interrupted earlier. In the case that execve cannot run the command an error message is displayed in the standard output, in our case nothing should go wrong and no error message will be seen.

The output looks something like this. All .c files listed in long format.

After this, the shell program returns to where it started, as we explained in the beginning of the blog. What happens next is that the program will keep on requesting for command from the user infinitely, unless the program is killed.

In this blog we saw that as simple as it may sound, when executing something in a command line shell, there is a lot going on behind the curtains. We hope to have covered every step as clear as possible.

Thank you for your attention.

Francisco Calixto && Diego Varela.

--

--