Internet Scale Malware Analysis BH US2015
Video: Black Hat USA 2015 - Internet Scale File Analysis
- Washington infosec group
- PhD students, Zachary Hanif
- Skald: framework / blueprint for micro-services framework for data gathering / analysis
- Incorporate existing data
- Totem: Internet-wide static analysis
- Drakvuf: dynamic analysis
Problem being solved: “Current malware analysis is broken”:
- Slow, resource-intensive RE analysis of malware binaries: Creating “new” binaries is quick + cheap.
- Defensive tools are mostly signature-based: Vendors can’t keep up with the ever-changing polymorphic, encrypted, cloaked malware.
- Large malware vendors’ legacy systems can’t scale for historical investigation of malware. One vendor can’t go more than a week in the past.
The Novetta “Answer”: Leverage Big Data to analysis existing data
What it is
- But CRITS distributes all malware samples to all analysis modules so does not scale well.
- Totem individually routes samples depending on dynamic criteria
- Leverage open source malware investigation engines
- RESTful wrappers around existing open source tools
- Https/s communication (explicit choice for ease of integration of new analysis services)
- Chose Scala / JVM over Python (yech!!!) These guys are typical hard-core “Big Data” people.
- Dynamic routing of analysis to determine which analytics are run
- Textbook cloud architecture
- Micro-services to scale up/down. Legacy systems / clusters built for peak loads.
- No shared state
- REST APIs
- Individual analytic work units are messages that are serialized, persisted.
- Recovery after failure: just requeuing remaining units of work for processing.
- Submit to Vtotal, other 3rd-party services
- CRIT services parity
- Resource extraction: PE32, pdf
- Lab: use opensource Radar RE framework for futher analysis
- Proprietary component!!!! Aha!
- Have crippled community version
- Proprietary $$ version
What it is
- based on LibVMI
- Will be drop-in replacement++ for Cuckoo (open-source malware sandbox).
- Generates a huge log text file of kernel level events as malware executes
What it isn’t
- Currently is prototype for the author’s PhD thesis. Open-source.
- Hic: Most cloud providers don’t enable Xen Security Modules which is basic function needed for isolation of tenants
- Does not support KVM
Technology under the covers
Some interesting ideas re technology approaches:
- Monitoring Xen’s Virtual Machine Introspection (VMI) subsystem. Uses Intel virtualization extensions
- Copy on write disk and memory for fast scaling to spin up VMs quickly.
- Hijacks an existing process to start the malware. Gives external cmd interface to run inside VM.
- Currently doesn’t inject keyboard / mouse events
- Put breakpoints into kernel to cause traps in hypervisor. Breakpoints are invisible from inside the VM. Eg trap on
- low-level system call interface
- heap allocation
- file object allocations
- Since Xen also supports ARM, have implemented basic function to watch memory mgmt.
- Use Big Data analysis tools to identify families, actors, infrastructure
- Structure data for ML
- Persist data
“Blockbuster” group’s analysis of Sony malware
- Essentially Novetta Solutions consulting group + “other parties”
- Demonstrate capabilities of Novetta’s shiny new Big Data Static Malware analysis system?
- Generate publicity for Novetta?
- Indirect attempt to further blame North Korea?
While the SPE attack occurred over a year ago, we are releasing this report now to detail our technical findings, clarify details surrounding the SPE hack, and profile the Lazarus Group
we were able to scan signatures over hundreds of millions of samples… From the billions of files scanned, Novetta’s signatures produced approximately 2000 samples, of which 1000 were manually vetted and catalogued as belonging to the Lazarus Group …
Note that “Lazarus Group” is Novetta’s name for the threat actors behind the Sony Incident.
- Publication of Novetta research to “correct” / extend what was said previously about Sony previously.
“Lazarus Group” activities
Section 3 describing the “Lazarus Group” activities is rife with disclaimers:
“However, Novetta’s analysis and findings suggest that the SPE attack was one of several attacks”
“Although Novetta is unable to determine via technical malware analysis whether or not the SPE attack was carried out by an identified nation-state …”
“these numbers should not be considered reflective of the totality of Lazarus Group tools detected in this Operation, due to the nature of our approach”
” These linked cyber operations over several years … suggest actions of a single group, or perhaps very close groups with similar goals who share tools, methods, taskings, and even operational duties”
In addition, the report associates observations
“The above example is a media report discussing the May 2015 South Korean parliamentary election, which included candidates for the Saenuri Party, South Korea’s ruling party since 2008 … “
with possibly unrelated context:
“Saenuri has taken a much stronger stance toward North Korea aggressions … Saenuri is also a major advocate of cyber security and the National Intelligence Service …”
This sort of tactic can quickly fall into “guilt by association” / “argumentum ad hominem” fallacies.
Binary Code analysis
On the other hand, section 4 “Malware Tooling” is a really interesting read.
The analysis presents a detailed list of code characteristics that can be used to identify common code that has been reused through the various malware. The authors develop a compelling argument for an underlying malware framework including wrappers:
- to import external code
- C&C using P2P technology.
The code characteristics enumerated include:
- Widespread use of an obscure stream cipher named Caracachs
- Standardized XOR obfuscation techniques
- Special text encoding technique for dynamically loaded Windows APIs
- Shared RSA encryption key
- Fake TLS traffic stream to hide C&C communications
- Abstraction of data transmission / receiving
- Distinctive “suicide” scripts
- Method for securely deleting files
- Target file identification using extensions.
This base analysis is used to deduce the various malware making up the attack toolset. The malware is classified in families.
All in all, this particular section is a fascinating read.
 Pg 6 Executive Summary
 Cf pg 47: “First, the order of the extensions is constant. Second, the function has a typo where the file extension .hwp is checked for twice in a row”