Final Submission

This program was one of the best learning experiences I’ve ever had. During the entire GSoC phase i was able to contribute to mainly two repositories of the coala organisation, A sub org under the Python Software Foundation.

These include commits on the coala repository:

https://github.com/coala-analyzer/coala/commits?author=abhsag24

and the coala-bears repository:

https://github.com/coala-analyzer/coala-bears/commits?author=abhsag24

coala-bears also contains a branch which hasn’t been merged into master yet, All my commits on this branch can be found here:

https://github.com/coala-analyzer/coala-bears/pull/653/commits.

I’ve been really grateful to my mentor Fabian, and the admin of my sub-org, Lasse, I’ve had the most enriching discussions with these people 😀 , Cheers to coala, FOSS and the entire GSoC experience,  i’ll definitely look forward to working beside these people after the program as well.

Breaking Lines

As the GsoC period comes to an end i have only two major tasks left to do, one being making a LineBreakBear, which suggests line breaks, when the user has Lines of code that are more than a max_line_length setting, and the Other is adding Indents based on Keywords in the IndentationBear. 

This week i was able to device a simple algorithm to suggest line breaks. If you’ve followed my blogs you’d know that there’s something I like to call an encapsulator :P, it’s a fancy name for different but not all types of brackets. So the algorithm is as follows:

  1. Get all occurences of lines which exceed max_line_length
  2. Check wether these lines have an encapsulator which starts before the limit
  3. Find the last encapsulator started in this line before the limit.
  4. Suggest a line break at that point with the new line being indented in accordance with the indentation_width setting.

Now this algorithm is really simple and does not consider border cases such as hanging-indents.

Hopefully by the next blog posts i’d have completed my Project. I’ll have lots to share about my experience this year 🙂

tabs spaces tabs

Last time i was able to come up with an algorithm to indent python code or basically code without  un-indent specifiers.  This time the challenge was tackling hanging indentation.

Now hanging indents in terms of word processing occur when all the lines except the first line are indented. In code terms it is something like this:

some_function(
    param1,
    param2,
    param3)

here param1, param2, param3 are indented while some_function is not.

Again the algorithm i use do hanging indentation is pretty straight forward, in simple terms it is:

  • Check if there is text to the right of paranthesis.
  • If there is, indent all lines till closing paranthesis that the column right after paranthesis.
  • Otherwise calculate the indentation relative to some_function and indent all later prams to that level.

This is the broad algorithm i use.

Though the difficult part was actually aligning in the file. N0w i align my files in the correct indentation levels(barring hanging indents) by:

  • remove all whitespaces to the left of every line.

I have a list called indentation_levels  which has indentation corresponding to line number, and also a variable insert which is either ‘\t’ or  tab_width*’ ‘  where tab_width is the number of spaces to indent

  • to each line add insert*indentation_level[line] to the left of the line.

basically line -> insert*indentation_level[line] + line .

now the problem was to insert absolute_indentation_levels in between normal indentation.

So whats the problem? just add number of spaces along with the normal spaces right? WRONG!

there’s a difference between

\t        \t
and 
        \t\t
are different

real life example:

class {
\tfunction(param1{
\t        \t do_something();}
\t         param2)
}

i was able to accomplish this by:

  • storing the previous indentation.of a block
  • then adding previous indent + hanging indent + indent level of that line.

do tell me if you find any fault in these algorithms, my work has recently been merged and can be found as the IndentationBear in the master branch of the coala-bears repositiory.

 

Python, indentation and white-space

Updates:

so at the time of the last update i was able to do basic indentation whenever a start and an end indent specifier was provided, this time around i’m working on stuff when the end-indentation specifier is not provided,  for example languages like python

def func(x):
    indent-level1
indent-level2

here we can see that there is no specifier that an unindent is going to occur, so how do i figure out what all lines are a part of one block?

Well the answer is very simple actually, i look for the start indent specifier which in case of python it is the very famous: ‘ : ‘. Now after i find the start of indent specifier, the next step is to find an unindent, in the previous example the line containing ‘indent-level2’ unindents, and voila we have our block, starting from the indent-specifier to the first unindent, easy right? The answer to that is NO, nothing’s that easy.

python doesn’t care about white-space:

well as we all know this isn’t true, python does care about white-space, but not as much as we thought. Python only cares about white-space to figure out indentation, anything else is pretty much useless to it,  for example:

def func(x):
    a = [1, 2,
3, 4, 5]
    if x in a:
        print(x)

this is a pretty valid python code, which prints x if x is an integer between 1 to 5.  What is odd about this examples is, that as we know in python everything has to be indented right? and this breaks that rule! go ahead try this on your own, it works! So no even in python not everything has to be indented, a simpler example could have been:

def func(x):
    a = 1
# This comment is not indented
    print(a)

does it matter if this comment is not indented?  absolutely not! this is a very valid python code as well.

The Problem:

so how is all this related to my algorithm?  as you can see in the second example, the line   ‘# This comment is not indented’ unindents and my algorithm is searching for unindents, hence breaking my algorithm, as it would think that block starts from ‘def func(x):’ and end at ‘# This comment is not indented’, also in the first example it would find that the line    ‘3, 4, 5] ‘ unindents which would again break the algorithm.

The Solution:

The Solution is quite simple in theory: Just be aware of these cases. But that changes the algorithm completely it goes from:

  • check first unindent
  • Report block as line containing specifier to the line which unindents

To

  • check if case of unindent
  • check if this line is a comment
  • check this if line is inside a multiline-comment
  • check if this line is inside paranthesis() or square-brackets []
  • If true repeat from 1
  • else report block.

So the final algorithm is my working solution as long as we are not able to find some problem in that as well. You can follow all the code  related to this algorithm in my PR.

Next steps:

Next steps are absolute indentation, hanging indents, keyword indents and an all new bear in the form of the LineLengthBear.

All of this looks really exciting, as i see my once planned Project come to life, i really hope all of this is useful someday and people actually use my code to solve their indentation problems :).

GSoC: My first blog ever!

 

My proposal for GSoC ’16 has been selected and so my journey towards an awesome summer has begun.

When i started contributing to open source i had only the bookish knowledge learned from school and college and only a little bit of practical experience, i had no idea how a real life software was run and maintained, then i found syncplay . It was a software I used daily and got to know after a little while that it was Open Source, by that time i knew what Open Source meant but I hadn’t dove deep into it yet, so i began making some changes to it’s code-base(all hail open-source!).

A week or two later i had learned a lot, i learned about decorators, socket-programming, the twisted framework, utf-8 encoding and also some of the nitty-gritties of coding. I had known about the GSoC Program(one of my friends had participated earlier), and got to know(from the contributors) that syncplay wouldn’t be participating, so i started searching for organisations that’d be looking to participate and stumbled upon coala which is a static-code analyzer(though saying just this is doing it injustice).

the coala-community in one word is awesome! i have never seen a community in my little experience that is so helpful to newcomers! it took a little time getting used to learning how to operate the software at first but once i got used to it, it was and is still an  amazing experience. Working beside sils, AbdealiJK, Makman2 and all of the coala community has been an awesome learning/entertaining experience till now and i can’t wait to imagine what it would be like once the summer gets started!.

As far as my Project goes it deals with creating Indentation algorithms that would be language independent. It would automatically correct indentation(thanks to awesome diff management by coala) and would also look for cases when lines get too long and would even break the lines once it gets completed.

Open source has been a really fascinating experience so far. Apart from GSoC i’m looking to contribute to a lot of open source in my spare time i’d  like to finish my PR for syncplay, and find some other projects to contribute, i have a few ideas myself, let’s hope they see  the light of day ;).

All in all I’ve learned a lot from these few months contributing. I’ve developed habits like watching talks, reading blogs, all of them being so informative! I’ve learnt a lot about programming writing not only code, but writing efficient, well formatted code. I’ve had glimpses of frameworks, learnt new types of software. Being an IT student was never as exciting as it is now.