A pretty smart dumb programming language classifier
I did a weekend project a few weeks ago on a programming language classifier using Python and scikit-learn. This classifier can identify an arbitrary program if it’s from a set of 15 programming languages:
- c
- c#
- clojure
- common_lisp
- haskell
- java
- javascript
- ocaml
- perl
- php
- python
- ruby
- scala
- scheme
- tcl
Once you’ve cloned the repository, you can run it against your own programs. Simply run python3 classifier.py filename
from your command line.
I wrote this sample program in Python to illustrate the process:
1
2
3
4
5
6
7
8
9
10
11
12
13
import random
def sum_5_random_integers():
return sum([random.randint(1, 101) for x in range(5)])
def main():
print(sum_5_random_integers())
if __name__ == '__main__':
main()
You can see that it predicts correctly:
Clone the repo and try it yourself. It’s kinda fun.