I did a weekend project a few weeks ago on a programming language classifier using Python and scikit-learn. This classifier can identify an arbitrary program if it’s from a set of 15 programming languages:

  • c
  • c#
  • clojure
  • common_lisp
  • haskell
  • java
  • javascript
  • ocaml
  • perl
  • php
  • python
  • ruby
  • scala
  • scheme
  • tcl

Once you’ve cloned the repository, you can run it against your own programs. Simply run python3 classifier.py filename from your command line.

I wrote this sample program in Python to illustrate the process:

1
2
3
4
5
6
7
8
9
10
11
12
13
import random


def sum_5_random_integers():
    return sum([random.randint(1, 101) for x in range(5)])


def main():
    print(sum_5_random_integers())


if __name__ == '__main__':
    main()

You can see that it predicts correctly: Sample.py output

Clone the repo and try it yourself. It’s kinda fun.