Python, indentation and white-space


so at the time of the last update i was able to do basic indentation whenever a start and an end indent specifier was provided, this time around i’m working on stuff when the end-indentation specifier is not provided,  for example languages like python

def func(x):

here we can see that there is no specifier that an unindent is going to occur, so how do i figure out what all lines are a part of one block?

Well the answer is very simple actually, i look for the start indent specifier which in case of python it is the very famous: ‘ : ‘. Now after i find the start of indent specifier, the next step is to find an unindent, in the previous example the line containing ‘indent-level2’ unindents, and voila we have our block, starting from the indent-specifier to the first unindent, easy right? The answer to that is NO, nothing’s that easy.

python doesn’t care about white-space:

well as we all know this isn’t true, python does care about white-space, but not as much as we thought. Python only cares about white-space to figure out indentation, anything else is pretty much useless to it,  for example:

def func(x):
    a = [1, 2,
3, 4, 5]
    if x in a:

this is a pretty valid python code, which prints x if x is an integer between 1 to 5.  What is odd about this examples is, that as we know in python everything has to be indented right? and this breaks that rule! go ahead try this on your own, it works! So no even in python not everything has to be indented, a simpler example could have been:

def func(x):
    a = 1
# This comment is not indented

does it matter if this comment is not indented?  absolutely not! this is a very valid python code as well.

The Problem:

so how is all this related to my algorithm?  as you can see in the second example, the line   ‘# This comment is not indented’ unindents and my algorithm is searching for unindents, hence breaking my algorithm, as it would think that block starts from ‘def func(x):’ and end at ‘# This comment is not indented’, also in the first example it would find that the line    ‘3, 4, 5] ‘ unindents which would again break the algorithm.

The Solution:

The Solution is quite simple in theory: Just be aware of these cases. But that changes the algorithm completely it goes from:

  • check first unindent
  • Report block as line containing specifier to the line which unindents


  • check if case of unindent
  • check if this line is a comment
  • check this if line is inside a multiline-comment
  • check if this line is inside paranthesis() or square-brackets []
  • If true repeat from 1
  • else report block.

So the final algorithm is my working solution as long as we are not able to find some problem in that as well. You can follow all the code  related to this algorithm in my PR.

Next steps:

Next steps are absolute indentation, hanging indents, keyword indents and an all new bear in the form of the LineLengthBear.

All of this looks really exciting, as i see my once planned Project come to life, i really hope all of this is useful someday and people actually use my code to solve their indentation problems :).


Exploring the world of indents

My Project so far:

The coding phase has begun and unfortunately i wasn’t able to start right away, but I had already done some part in the community bonding period and was able to work on another aspect of my project before the week got over, so now finally let me introduce you to my project which is *drumrolls* Generic Spacing Correction. I know it doesn’t sound very exciting right?But believe me the “Generic” makes it pretty exciting.

Demystifying the name, my project aims to do what you already see in most of your editor programs, be it natively or through plugins; which is automatically indenting your code. It doesn’t stop there though, it indents your code but it’ll indent it regardless the language you use! So you basically indent your C files and python files by the same algorithm! Bye-Bye ctrl + I? Maybe but there’s a lot of work to do.

See, every language style guide follows some basic rules when it comes to indentation, one of them being: there are levels of indentation and each level “means” being part of the same context.

So I just have to identify these levels. Easy. Right? Well for a particular language yes it’s not so difficult, but when it comes to managing the same for all languages out there, then the task becomes a little daunting.

Here’s how i plan to tackle these problems:

First of all my algorithm doesn’t support all languages from the start, it’d support basic languages like C, C++, JAVA maybe even ruby. This is because unlike languages like python, these languages have markers to specify when to start and end indentation.

So far i’ve been able to come up with a very basic implementation, though it still lacks features of indents based on keywords and also absolute Indentation, Like:



int some_function(param1,

other than it works for basic indentation of blocks specified by those indent markers.       my PR has all the code to this basic indentation algorithm and is still under-review/wip.

Later on i’d like to make this algorithm configurable to the extreme.

Apart from the basic functionalities like hanging indents and whatnot, it’d be nice to have an algorithm which is configurable to all styles of indentation.

What is Style of indentation?

Apparently there are many ways of correctly indenting your code, and it’s upto
the Community what type of indentation they want to follow.

For example:

    // code

    // code

    // code

all of these are ways indenting an if block. None of these is wrong and it’s
entirely up to you which style you prefer, a generic and versatile algorithm would support all three via configurations.
Sadly my basic implementation doesn’t support the third kind yet.

All in all it’s just the beginning and i’m really excited on how the project develops and what type of algorithms i’m able to deliver, hopefully i’ll be back next time with a more functional and useful algorithm.