Hopelessly passionate husband, engineer, hacker, gamer, artist, and tea addict.

Better Living Through Function Detection

I've submitted a pull request to the nucleus developers to support output for Binary Ninja. If you're not familiar with their research, or why you might need it, I have a short story for you...

UPDATE (2017-04-13): Patch was accepted! You can now get a Binary Ninja script out of Nucleus using the -n option. I've removed the reference to my fork below as a result.

The other day, I needed to reverse engineer a particular shared library. So, like most days, I fired up Binary Ninja. Unfortunately, Binary Ninja's function detection is...apparently not the most ideal for this kind of thing:

>>> len(bv.functions)

I happen to know that this particular executable has a large number of functions - 590 is nowhere near all of them. Begrudgingly, I fired up IDA Pro 6.95. IDA was much better about function detection:

Python>from idaapi import *
Python>from idautils import *
Python>from idc import *
Python>i = 0
Python>for seg in Segments():
Python>  if segtype(SegStart(seg)) != SEG_CODE:
Python>    continue
Python>  for f in Functions(SegStart(seg), SegEnd(seg)):
Python>    i += 1

Unfortunately, I quickly came across functions that weren't defined. IDA still missed things!

That's when I remembered seeing the Compiler-Agnostic Function Detection in Binaries paper earlier this year. Their website says they haven't presented their research yet. But, unlike most academics, they have an open-source prototype, the prototype is available before the presentation, and it works! Here's what nucleus has to say about my shared library:

$ ./nucleus -d linear -f -e some.dll | wc -l
ERROR: failed to read dynamic symtab (Invalid operation)

Not entirely sure what the ERROR is about, but that's more like it! With the -i script.py flag, I can even get an IDA Python script that will remove all of IDA's automatically-defined functions and define all of the ones nucleus found instead. Brilliant.

Since I really wanted to use Binary Ninja, though, I added a -n <file> option (-b was already in use) that will output a script that does the same thing in Binary Ninja. Their code is organized very well, so adding it in wasn't hard at all.

Finally, my Binary Ninja database has these functions defined:

>>> execfile("/home/fuzyll/Code/nucleus/script.py")
# ...lots of output from defining functions...
>>> len(bv.functions)

If you're interested in how this actually works, I highly recommend reading their paper (linked above). Academic research can sometimes be daunting and/or tedious to read through, but I found this one to be well-written and understandable. And now, back to reverse engineering!