UPDATE (2017-04-13): Patch was accepted! You can now get a Binary Ninja script out of Nucleus using the
option. I've removed the reference to my fork below as a result.
The other day, I needed to reverse engineer a particular shared library. So, like most days, I fired up Binary Ninja. Unfortunately, Binary Ninja's function detection is...apparently not the most ideal for this kind of thing:
>>> len(bv.functions) 590
I happen to know that this particular executable has a large number of functions - 590 is nowhere near all of them. Begrudgingly, I fired up IDA Pro 6.95. IDA was much better about function detection:
Python>from idaapi import * Python>from idautils import * Python>from idc import * Python>i = 0 Python>for seg in Segments(): Python> if segtype(SegStart(seg)) != SEG_CODE: Python> continue Python> for f in Functions(SegStart(seg), SegEnd(seg)): Python> i += 1 Python>i 13211
Unfortunately, I quickly came across functions that weren't defined. IDA still missed things!
That's when I remembered seeing the
Compiler-Agnostic Function Detection in Binaries paper earlier
this year. Their website says they haven't presented
their research yet. But, unlike most academics, they have an open-source prototype, the prototype
is available before the presentation, and it works! Here's what
nucleus has to say about my shared library:
$ ./nucleus -d linear -f -e some.dll | wc -l ERROR: failed to read dynamic symtab (Invalid operation) 19966
Not entirely sure what the
ERROR is about, but that's more like it! With the
-i script.py flag, I can even get
an IDA Python script that will remove all of IDA's automatically-defined functions and define all of the ones
nucleus found instead. Brilliant.
Since I really wanted to use Binary Ninja, though, I added a
-n <file> option (
-b was already in use) that will
output a script that does the same thing in Binary Ninja. Their code is organized very well, so adding
it in wasn't hard at all.
Finally, my Binary Ninja database has these functions defined:
>>> execfile("/home/fuzyll/Code/nucleus/script.py") # ...lots of output from defining functions... >>> len(bv.functions) 19966
If you're interested in how this actually works, I highly recommend reading their paper (linked above). Academic research can sometimes be daunting and/or tedious to read through, but I found this one to be well-written and understandable. And now, back to reverse engineering!