1 of 12

zeek2es.py - An Application to Make Your Zeek Logs Elastic!

Keith J. Jones, Sr. Security Researcher @ Corelight Labs

1

Feb 11, 2022 - 23:00 UTC | 18:00 EST

Feb 12, 2022 - 00:00 CET

2 of 12

What is Zeek?

  • https://zeek.org/
  • “An Open Source Network Security Monitoring Tool”
  • It’s an environment that lets you code around streaming network data. It natively outputs ASCII TSV or JSON logs of network events and compresses them as *.gz if running in cluster mode.
  • The security industry regularly uses Zeek to detect and respond to various network threats.
  • Vern Paxson at Corelight Labs, the originator of Zeek (f.k.a. Bro).

2

3 of 12

My Problems

  1. Terabytes of ASCII TSV.gz Zeek logs at several remote locations with only SSH access. This is Zeek’s default logging format and SSH only allows for console access or port tunneling to applications that open ports. (ssh -L localhost:5601:localhost:5601 hostname)
  2. I need to sort, search, and analyze the logs every day, several times a day. I don’t always know what I’m looking for, so I run a lot of queries. The more searches I can run in a day, the more research I can produce.

3

4 of 12

My Problems

  1. Data cannot leave the remote locations!!!
  2. Log formats will change, sometimes quite often. Expect conn.log to change over time. Researchers do research.
  3. New logs will appear, sometimes quite often. Expect to index new formats I have never seen before.
  4. I should not have to install anything more than I need. Simplicity. Storage, CPU, and memory are all shared with others. Non-Root would be ideal.

4

5 of 12

Current Elastic Zeek Support

5

6 of 12

Filebeat Connection Log

https://github.com/elastic/beats/blob/master/x-pack/filebeat/module/zeek/connection/

There is a lot of field management here for each expected log, which can be error prone and makes updates to schemas difficult (for my research purposes). �

I tried to keep any chance of me introducing errors to a minimum by reading the #field and #types from the Zeek log to build the ES mappings…

6

7 of 12

zeek2es.py Open Source Elastic Support

https://github.com/corelight/zeek2es

  1. Reads the #fields and #types lines from Zeek’s logs:

7

8 of 12

zeek2es.py Open Source Elastic Support

  1. Creates mappings according to #fields and #types.
    1. Makes all connections to the ElasticSearch server via Python’s requests library (no other Python ES libraries to install!)
  2. Uses the bulk upload API to move the Zeek data into ElasticSearch.
    • Still using Python’s requests library!
  3. I execute this on each file using find, awk, and parallel:
    • time find /usr/local/var/logs | awk '/^.*\/.*\.log\.gz$/' | parallel -j 10 python ~/zeek2es.py {} -g :::: -

8

9 of 12

Buy today for $0, I’ll also throw in…

  • Helper scripts (process_logs_as_datastream.sh & process_log.sh)
  • Geolocation and service splitting via an ingestion pipeline (-g option)
  • Add keyword fields to any text field (-k option)
  • Output filtering via Python’s Lambda functions! (-a and -f options)
    • lambda x: len(x.get('service',’’)) > 0 and x.get['orig_bytes'] > 0 � and x['resp_bytes'] > 0
    • lambda x: 'id.orig_h' in x and ipaddress.ip_address(x['id.orig_h']) � in ipaddress.ip_network('192.168.0.0/16')
  • Key filtering (-o and -e options)

9

10 of 12

Buy today for $0, I’ll also throw in…

  • JSON to stdout (-s and -b options)
  • Data streams (-d option)

10

11 of 12

Zeek->ElasticSearch->Kibana Demo

11

12 of 12

12

?