TrustedSec - Working with data in JSON format
What is JSON?
What is JSON? JSON is an acronym for JavaScript Object Notation. For years it has been in use as a common serialization format for APIs across the web. It also has gained favor as a format for logging (particularly for use in structured logging). Now, it has become even more common for command line applications to use JSON to serialize general output.
JSON can be used to serialize data into common object and value types. These include key-value pairs, arrays, strings, numbers, Boolean values, and null
. However, it is not without its limitations. The first limitation has drawbacks in the form of parsing. Because of how JSON is structured, an entire JSON object must be loaded completely in order to parse it. In most cases, this means the entire output of a command line or web API must be obtained before processing. The second limitation of JSON is the restriction to the types above (key value, list, true
, false
, null
, number, and string). This leads to some lossy conversion during serialization. For example, if you need to represent a date and time, that information must be converted to a string. There are many ways to address these limitations, but that is beyond the scope of this post.
Examples
Here are a few examples of where you might encounter JSON-formatted data:
- MITRE ATT&CK stix data
- Output from certipy
- Output from CrackMapExec spider (plus)
- Google IP address ranges
- Microsoft Office 365 IP address ranges
Tooling
The following is a breakdown of some tooling that may help when working with JSON-formatted data:
gron
The tool gron flattens JSON into keys and values. It can also take the flattened version and reconstitute it into JSON. The keys are complex and encode the absolute path to the value. This makes it easier to work on the data with other tools that are targeted toward line-delimited data, such as grep, sed, awk, Bash, etc.
Available for most platforms, gron is written in Go, which provides a high degree of portability. Installation is straightforward with pre-compiled binaries available for Windows, Linux, macOS, and FreeBSD. It may also be available via your favorite package manager. If none of the above cover your platform, you may also use Go to install it.
Here is a brief example showing how to grep for email addresses within a nested JSON object:
$ wget -qO - 'https://jsonplaceholder.typicode.com/users' | gron | grep -F '.email' json[0].email = "Sincere@april.biz"; json[1].email = "Shanna@melissa.tv"; json[2].email = "Nathan@yesenia.net"; json[3].email = "Julianne.OConner@kory.org"; json[4].email = "Lucio_Hettinger@annie.ca"; json[5].email = "Karley_Dach@jasper.info"; json[6].email = "Telly.Hoeger@billy.biz"; json[7].email = "Sherwood@rosamond.me"; json[8].email = "Chaim_McDermott@dana.io"; json[9].email = "Rey.Padberg@karina.biz";
jless
The tool jless allows interactive viewing of JSON-formatted data in a text user interface. It applies syntax highlighting and allows you to navigate through a complex, nested JSON object and search with regex. This makes it easy to explore data in JSON format, especially if it is minified. It will even allow you to copy a jq style path, which is useful when trying to create and debug jq filters. This is done with the keybinding yq
, which is one of many available keybindings.
Installing jless can be done via your favorite package manager. There are pre-compiled binaries available for macOS and Linux. And jless is written in Rust, so installing via Cargo is also an option for platforms that do not have a package manager providing it, and which do not have pre-compiled binaries for them.
Here is a brief example showing jless, searching for 123, and copying the jq path to the clipboard:
dataclasses-json
The Python package dataclasses-json facilitates parsing JSON-formatted data into simple classes that are easy to work with in Python. Under the hood, it uses the Python package marshmallow to provide deserialization and parsing beyond the basics of Python’s built-in JSON module. This makes it an excellent tool to keep in your kit for working with more complex data in JSON format. Installing dataclasses-json is done with pip (or another Python package manager such as pipenv or poetry).
Here is a brief example showing how to use dataclasses-json to load data into a dataclass:
import sys import fileinput from typing import List from dataclasses import field from dataclasses import dataclass from dataclasses_json import LetterCase from dataclasses_json import DataClassJsonMixin from dataclasses_json import config @dataclass(frozen=True) class Company(DataClassJsonMixin): bs :str catch_phrase :str = field(metadata=config(letter_case=LetterCase.CAMEL)) name :str @dataclass(frozen=True) class GeographicCoordinates(DataClassJsonMixin): latitude :str = field(metadata=config(field_name="lat")) longitude :str = field(metadata=config(field_name="lng")) @dataclass(frozen=True) class Address(DataClassJsonMixin): street :str suite :str city :str zipcode :str geo :GeographicCoordinates @dataclass(frozen=True) class Person(DataClassJsonMixin): index :int = field(metadata=config(field_name="id")) name :str phone :str username :str email :str website :str company :Company address :Address def cli(*args :List[str]) -> int: json_input = '\n'.join(list(fileinput.input(encoding="utf-8"))) print( *map( lambda p: f'{p.index:>3} {p.name:<25} {p.email}', Person.schema().loads(json_input, many=True) ), sep='\n', end='\n\n' ) return 0 if ('__main__' == __name__): sys.exit(cli(sys.argv))
JSON diff and patch
The command line tool (and Go library) jd can be used to diff and patch data in JSON format. If you have two similar files with JSON-formatted data and wish to isolate the differences, jd is the perfect tool for the job. It can be useful when you want to replace something nested deep within some JSON-formatted data.
Installation may be a little more complex than other tools listed here. It is available via the brew package manager for macOS or via the Go package manager. It can also be executed as a docker container. And finally, it is available on the web.
Here is a brief example showing what a diff from jd looks like:
$ cat jd-first.json [ { "id": 1, "text": "buy milk", "done": false }, { "id": 1, "text": "learn italian", "done": false }, { "id": 1, "text": "clean oven", "done": false } ] $ cat jd-final.json [ { "id": 1, "text": "buy milk", "done": true }, { "id": 1, "text": "learn italian", "done": false }, { "id": 1, "text": "clean oven", "done": false }, { "id": 1, "text": "make pickles", "done": false } ] $ jd jd-first.json jd-final.json @ [0,"done"] - false + true @ [-1] + {"done":false,"id":1,"text":"make pickles"}
jq
The tool jq supports complex filtering and transforming of data in JSON format. It has a huge feature set and is able to cut through a large file of JSON-formatted data and mutate it into another layout.
Installation is very straightforward because jq is a single binary with no dependencies. Pre-compiled binaries are available from the website. It is also available via many package managers.
If you do not want to install it, there is also an online version available.
Bonus
The complexity of jq can make working on a filter a steep climb. To help iterate on a filter, it can be helpful to have a plugin for your editor. Visual Studio Code jq playground is a great example. It allows you to work on one or more filters, rendering the output as you go.
Below is a brief example showing what can be done with jq (The screenshot is of Visual Studio Code, but running jq from the command line is not any different.):
jq (Python package)
The Python package jq provides bindings to the jq command line tool above. It allows you to use the best of both jq and Python to construct more complex processing of data in JSON format. Installing the jq package is done with pip (or another Python package manager like pipenv or poetry).
An example of using this package can be found below.
Practical Examples
Filtering for scopes in Google IP Address ranges
This example will use the Google IP address ranges data set.
The data set provides a nested array of JSON objects. We will use jq to filter and flatten it into a single array of IP v4 prefixes.
$ wget -qO - 'https://www.gstatic.com/ipranges/goog.json' | jq '[ .prefixes[] | .ipv4Prefix | select( . != null ) ] | unique' [ "104.154.0.0/15", "104.196.0.0/14", "104.237.160.0/19", "107.167.160.0/19", "107.178.192.0/18", "108.170.192.0/18", "108.177.0.0/17", "108.59.80.0/20", "130.211.0.0/16", "136.112.0.0/12", "142.250.0.0/15", "146.148.0.0/17", "162.216.148.0/22", "162.222.176.0/21", "172.110.32.0/21", "172.217.0.0/16", "172.253.0.0/16", "173.194.0.0/16", "173.255.112.0/20", "192.158.28.0/22", "192.178.0.0/15", "193.186.4.0/24", "199.192.112.0/22", "199.223.232.0/21", "199.36.154.0/23", "199.36.156.0/24", "207.223.160.0/20", "208.117.224.0/19", "208.65.152.0/22", "208.68.108.0/22", "208.81.188.0/22", "209.85.128.0/17", "216.239.32.0/19", "216.58.192.0/19", "216.73.80.0/20", "23.236.48.0/20", "23.251.128.0/19", "34.0.0.0/15", "34.128.0.0/10", "34.16.0.0/12", "34.2.0.0/16", "34.3.0.0/23", "34.3.128.0/17", "34.3.16.0/20", "34.3.3.0/24", "34.3.32.0/19", "34.3.4.0/24", "34.3.64.0/18", "34.3.8.0/21", "34.32.0.0/11", "34.4.0.0/14", "34.64.0.0/10", "34.8.0.0/13", "35.184.0.0/13", "35.192.0.0/14", "35.196.0.0/15", "35.198.0.0/16", "35.199.0.0/17", "35.199.128.0/18", "35.200.0.0/13", "35.208.0.0/12", "35.224.0.0/12", "35.240.0.0/13", "64.15.112.0/20", "64.233.160.0/19", "66.102.0.0/20", "66.22.228.0/23", "66.249.64.0/19", "70.32.128.0/19", "72.14.192.0/18", "74.114.24.0/21", "74.125.0.0/16", "8.34.208.0/20", "8.35.192.0/20", "8.8.4.0/24", "8.8.8.0/24" ]
Now, to use this for something more—Say you wanted to block all of these ranges—To do so, you could change the query slightly to produce a line-delimited list that can be passed to xargs and then on to iptables.
$ wget -qO - 'https://www.gstatic.com/ipranges/goog.json' | jq '[ .prefixes[] | .ipv4Prefix | select( . != null ) ] | unique | join("\n")' | xargs -L1 iptables -I INPUT -s "{}" -j ACCEPT
Explanation
- wget downloads the file, set to quiet mode with -q (meaning it does not output info other than the file), and the output -O is set to standard out with the – character.
- jq receives the piped output and—
- creates a new array by:
- filtering for the
prefixes
member as an array - filtering for the
ipv4Prefix
member of the item from the array - filtering for values that are not
null
- filtering for the
- filters the new array to ensure each entry in it is unique
- joins the new array with newlines to output the new array’s items one per line
- creates a new array by:
- xargs receives the piped output, then calls iptables for each line, adding allow rules to the beginning of the input chain.
Back-correlating IP addresses from firewall logs to their Microsoft Office 365 domain
In this example, we will use the Microsoft Office 365 IP address ranges data set, a fake access log file, and a Python script.
The data provides a list of objects that describe the various service areas and their URLs and IP address ranges (in CIDR form). For this example, we will use the jq binding package and a Python script.
#!/usr/bin/env python3 import sys import json import fileinput import ipaddress from typing import List from pathlib import Path from functools import reduce from functools import partial import jq def search_for_address(lookup_table, result_dict, line): # take the address from the front of the line address = ipaddress.IPv4Address(line.split(' ', 1)[0]) result_found = False for name,list_of_networks in lookup_table.items(): if result_found: break for network_str in list_of_networks: if (address not in ipaddress.IPv4Network(network_str)): # address is not one of interest for this range, continue on continue else: # add or update dict entry with the line value = result_dict.get(name, []) value.append(line.rstrip()) result_dict.update({name: value}) # set flag to true in order to break out of the outter loop result_found = True break return result_dict def cli(*args :List[str]) -> int: try: filter_text = '''\ .[] | select( .ips != null ) | { (.serviceAreaDisplayName): .ips | map(select(contains(":") | not)) } ''' # read in and remove carriage returns filter_input = json.loads(''.join(list( map( lambda x: x.rstrip(), fileinput.input(encoding="utf-8") ) ))) # run jq filter against the input filter_output = jq.compile(filter_text).input(filter_input).all() names_to_address_ranges = dict() # flatten the jq filter output (list of dict into a single dict with a combined list of unique networks) for x in filter_output: for name,v in x.items(): names_to_address_ranges.update({ name: list(set(names_to_address_ranges.get(name, []) + v)) }) access_log = (Path() / 'access.log').resolve() with access_log.open(mode='rt', encoding='utf-8') as handle: # call search_for_address against every line, combining the result into a single dict result = reduce(partial(search_for_address, names_to_address_ranges), handle, dict()) except Exception as ex: print(ex, file=sys.stderr) return 1 else: for k,v in result.items(): print(k, "=" * len(k), *v, sep='\n', end='\n\n') return 0 if ('__main__' == __name__): sys.exit(cli(*sys.argv))
$ wget -qO - 'https://endpoints.office.com/endpoints/worldwide?clientrequestid=b10c5ed1-bad1-445f-b386-b919946339a7' | python -u correlate.py Exchange Online =============== 52.103.78.154 - - [24/Aug/2022:00:00:00 ] "GET /index.html HTTP/1.1" 302 706 "-" "Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1" 52.101.250.115 - - [24/Aug/2022:00:00:18 ] "TRACE /eligendi/voluptate/asperiores/in/quis/perferendis/pariatur HTTP/1.1" 500 538 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2" 40.93.149.94 - - [24/Aug/2022:00:03:03 ] "PUT /index.html HTTP/1.1" 500 708 "-" "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11" 52.101.166.116 - - [24/Aug/2022:00:05:51 ] "DELETE /velit/dolores/at/aperiam/quaerat/quibusdam/enim HTTP/1.1" 500 445 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Macintosh; Intel Mac OS X 10_7_3; Trident/6.0)" 104.47.111.206 - - [24/Aug/2022:00:11:05 ] "DELETE /dolore/adipisci/minus/laudantium/veritatis/et/repudiandae/qui HTTP/1.1" 500 191 "-" "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US);" 40.107.183.115 - - [24/Aug/2022:00:18:16 ] "DELETE /amet/et/alias/quod/numquam/libero/nihil HTTP/1.1" 200 177 "-" "Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1" 40.92.113.173 - - [24/Aug/2022:00:23:04 ] "OPTIONS /amet/et/alias/quod/numquam/libero/nihil HTTP/1.1" 200 76 "-" "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US);" 40.93.27.4 - - [24/Aug/2022:00:23:12 ] "POST /in/ipsum/possimus/voluptate/consequatur HTTP/1.1" 202 304 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Macintosh; Intel Mac OS X 10_7_3; Trident/6.0)" 40.92.108.212 - - [24/Aug/2022:00:26:40 ] "PUT /quas/repellendus/voluptatem/rerum/quis/ut/itaque/accusamus HTTP/1.1" 302 526 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2" 40.107.168.15 - - [24/Aug/2022:00:27:18 ] "HEAD / HTTP/1.1" 500 871 "-" "Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1" Microsoft 365 Common and Office Online ====================================== 52.109.14.150 - - [24/Aug/2022:00:07:24 ] "POST /dolore/adipisci/minus/laudantium/veritatis/et/repudiandae/qui HTTP/1.1" 302 626 "-" "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)" 52.109.57.226 - - [24/Aug/2022:00:15:05 ] "CONNECT /labore/non/eum/quasi/sapiente HTTP/1.1" 302 598 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Macintosh; Intel Mac OS X 10_7_3; Trident/6.0)" 52.109.181.82 - - [24/Aug/2022:00:16:48 ] "HEAD /quis/quis/iure/vero/eaque/nisi/ad/molestiae HTTP/1.1" 302 406 "-" "Opera/9.80 (X11; Linux i686; U; ru) Presto/2.8.131 Version/11.11" 52.111.130.50 - - [24/Aug/2022:00:18:05 ] "OPTIONS /iure/recusandae/tempora/similique/sequi/culpa HTTP/1.1" 404 355 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1290.1 Safari/537.13" 52.110.169.110 - - [24/Aug/2022:00:20:03 ] "POST /index.html HTTP/1.1" 500 1024 "-" "Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25" 52.111.141.135 - - [24/Aug/2022:00:25:36 ] "HEAD /velit/dolores/at/aperiam/quaerat/quibusdam/enim HTTP/1.1" 200 1009 "-" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0.1" 52.109.246.226 - - [24/Aug/2022:00:38:21 ] "HEAD /ea/consequatur/omnis/voluptatibus/autem/earum/ut/quis/doloremque/ut/quaerat/in HTTP/1.1" 400 456 "-" "Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25" 52.111.114.23 - - [24/Aug/2022:00:44:45 ] "OPTIONS /eligendi/voluptate/asperiores/in/quis/perferendis/pariatur HTTP/1.1" 202 326 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1290.1 Safari/537.13"
Explanation
(More details can be found in comments in the Python script.)
- wget downloads the file, set to quiet mode with -q (meaning it does not output info other than the file), and the output -O is set to standard out with the
-
character. - The Python script receives the piped output and—
- applies the jq filter to get a list of objects with the name and a list of IP address ranges
- combines the list of
dicts
into a singledict
- reads the access log
- searches for and collects matches in another
dict
- prints the lists
Filtering certipy output for Misconfigured Certificate Templates
This example will use the technique described for ESC1 in Certified Pre-Owned, certipy JSON output captured in a file, gron, and a Bash script for processing.
#!/usr/bin/env bash # printf is used to preserve newlines (unlike some implementations of echo) # printf %s is used to preseve \\ so that output may be un-gronned function get-enabled { printf %s "$1" | grep -F '.Enabled = true;' | sed 's/^json\(.*\)\.Enabled = true\;/\1/;s/\(.*\)E$/\1/;s/\(.*\)\.$/\1/' } # ."Client Authentication"==true? function get-requires-manager-approval { printf %s "$1" | grep -F '["Requires Manager Approval"] = false;' | sed 's/^json\(.*\)\["Requires Manager Approval"] = false\;/\1/;s/\(.*\)E$/\1/;s/\(.*\)\.$/\1/' } function get-enrollee-supplies-subject { printf %s "$1" | grep -F '["Enrollee Supplies Subject"] = true;' | sed 's/^json\(.*\)\["Enrollee Supplies Subject"] = true\;/\1/;s/\(.*\)E$/\1/;s/\(.*\)\.$/\1/' } function get-client-authentication { printf %s "$1" | grep -F '["Client Authentication"] = true;' | sed 's/^json\(.*\)\["Client Authentication"] = true\;/\1/;s/\(.*\)E$/\1/;s/\(.*\)\.$/\1/' } function has-rights { printf %s "$1" | grep -F "$2" | grep -F '.Permissions["Enrollment Permissions"]["Enrollment Rights"]' \ | grep -E 'Authenticated Users|Domain Computers|Domain Users' 2>&1 >/dev/null return $? } GRON_FORMATTED=$(gron -m --no-sort "$1") INTERESTING_TEMPLATES=$( comm -12 <( get-enabled "${GRON_FORMATTED}") <( comm -12 <( get-requires-manager-approval "${GRON_FORMATTED}") <( comm -12 <( get-enrollee-supplies-subject "${GRON_FORMATTED}") <( get-client-authentication "${GRON_FORMATTED}" ))) ) RESULT="" while IFS= read -r x do if has-rights "${GRON_FORMATTED}" "${x}" then RESULT="${RESULT}$(printf %s "${GRON_FORMATTED}" | grep -F "${x}")" fi done <<< $INTERESTING_TEMPLATES printf %s "${RESULT}" | gron --no-sort --ungron
$ ./esc1_filter_from_certipy.sh 20220802122105_Certipy4_SecLab.json { "Certificate Templates": { "0": { "Any Purpose": false, "Authorized Signatures Required": 0, "Certificate Name Flag": [ "SubjectAltRequireDomainDns", "EnrolleeSuppliesSubject" ], "Client Authentication": true, "Display Name": "ESC1(4096)", "Enabled": true, "Enrollee Supplies Subject": true, "Enrollment Agent": false, "Enrollment Flag": [ "PublishToDs" ], "Extended Key Usage": [ "Client Authentication", "Server Authentication", "Smart Card Logon", "KDC Authentication" ], "Permissions": { "Enrollment Permissions": { "Enrollment Rights": [ "SECLAB.TEST.LOCAL\\Authenticated Users" ] }, "Object Control Permissions": { "Owner": "SECLAB.TEST.LOCAL\\Admin Mario", "Write Dacl Principals": [ "SECLAB.TEST.LOCAL\\Admin Mario" ], "Write Owner Principals": [ "SECLAB.TEST.LOCAL\\Admin Mario" ], "Write Property Principals": [ "SECLAB.TEST.LOCAL\\Admin Mario" ] } }, "Private Key Flag": [ "16777216", "65536" ], "Renewal Period": "6 weeks", "Requires Key Archival": false, "Requires Manager Approval": false, "Template Name": "ESC1(4096)", "Validity Period": "1 year", "[!] Vulnerabilities": { "ESC1": "'SECLAB.TEST.LOCAL\\\\Authenticated Users' can enroll, enrollee supplies subject and template allows client authentication" } }, "3": { "Any Purpose": false, "Authorized Signatures Required": 0, "Certificate Name Flag": [ "SubjectAltRequireDomainDns", "EnrolleeSuppliesSubject" ], "Client Authentication": true, "Display Name": "ESC1", "Enabled": true, "Enrollee Supplies Subject": true, "Enrollment Agent": false, "Enrollment Flag": [ "PublishToDs" ], "Extended Key Usage": [ "KDC Authentication", "Smart Card Logon", "Server Authentication", "Client Authentication" ], "Permissions": { "Enrollment Permissions": { "Enrollment Rights": [ "SECLAB.TEST.LOCAL\\Authenticated Users" ] }, "Object Control Permissions": { "Owner": "SECLAB.TEST.LOCAL\\Admin Mario", "Write Dacl Principals": [ "SECLAB.TEST.LOCAL\\Admin Mario" ], "Write Owner Principals": [ "SECLAB.TEST.LOCAL\\Admin Mario" ], "Write Property Principals": [ "SECLAB.TEST.LOCAL\\Admin Mario" ] } }, "Private Key Flag": [ "16777216", "65536" ], "Renewal Period": "6 weeks", "Requires Key Archival": false, "Requires Manager Approval": false, "Template Name": "ESC1", "Validity Period": "1 year", "[!] Vulnerabilities": { "ESC1": "'SECLAB.TEST.LOCAL\\\\Authenticated Users' can enroll, enrollee supplies subject and template allows client authentication" } } } }
Explanation
- gron is used to turn the JSON-formatted data into a line-delimited format, which is stored in a variable.
- That variable is run through a series of Bash functions which output template identifiers that match said criteria.
- The template identifiers are then compared with comm to filter down to the ones that meet all of the criteria.
- The identified templates are then checked against a single, final criterion.
- If the templates match, then the result is appended to a variable.
- gron is used to turn the value final result variable back into JSON.
Note: This code should not be considered a replacement for a good understanding of ESC1. It is an example of how to use gron to then leverage other command line applications in order to facilitate handling data in JSON format.
Deserialize complex data sets such as MITRE ATT&CK enterprise stix data
This example will use the MITRE ATT&CK stix data, the dataclasses-json Python package, and a Python script. It will process the large data set and, in this case, filter it for all the names of identified threat actors documented within the MITRE ATT&CK stix data.
#!/usr/bin/env python3 import json from typing import Set from typing import Dict from typing import List from typing import Optional from pathlib import Path from functools import reduce from dataclasses import field from dataclasses import dataclass from dataclasses_json import Undefined from dataclasses_json import config from dataclasses_json import dataclass_json @dataclass_json(undefined=Undefined.EXCLUDE) @dataclass(frozen=True) class MitreAttackObject: type :str stix_id :str = field(metadata=config(field_name='id')) name :Optional[str] = None @dataclass_json(undefined=Undefined.EXCLUDE) @dataclass(frozen=True) class MitreAttackIntrusionSet(MitreAttackObject): aliases :List[str] = field(default_factory=list) deprecated :bool = field(default=False, metadata=config(field_name='x_mitre_deprecated')) def mitre_attack_object_from_dict(d :Dict) -> MitreAttackObject: object_type = d.get('type') if ('intrusion-set' == object_type): return MitreAttackIntrusionSet.from_dict(d) else: return MitreAttackObject.from_dict(d) def collect_types(collection, object_): collection.update({object_.type: 1 + collection.get(object_.type, 0)}) return collection def type_is_intrusion_set(dataclass_object :MitreAttackObject) -> bool: return ('intrusion-set' == dataclass_object.type) def collect_names(collection :Set[str], dataclass_object :MitreAttackObject) -> Set[str]: collection.add(dataclass_object.name) collection.update(dataclass_object.aliases) return collection intruder_names = set() json_fullname = (Path(__file__).parent / 'enterprise-attack-11.3.json').resolve() with json_fullname.open(encoding='utf-8') as handle: objects = list(map(mitre_attack_object_from_dict, json.load(handle)['objects'])) object_types = reduce(collect_types, objects, dict()) print(f'file contains {object_types.get("intrusion-set"):,d} intrusion-sets') print( *list(sorted( reduce( collect_names, filter(type_is_intrusion_set, objects), set() ), key=str.casefold )), sep='\n', end='\n\n' )
Explanation
- The large file containing JSON-formatted data is read into memory and parsed with Python’s built-in json module.
- A function is used to map the objects within the data set into dataclasses.
- The dataclasses are used to create a
dict
with the names of the types and the number of types. - The number of intrusion sets (obtained from the
dict
above) is written to standard out. - A list of the names is printed from sorting (without case sensitivity), reduced from the full list of dataclass objects and filtered by their type.
Look for access time modification in CrackMapExec spider (plus) output over time
This example will use a couple of files of CrackMapExec spider (plus) JSON-formatted output and jd. With jd, we can quickly isolate and view the differences between the two scans:
$ jd cme_output_20220801.json cme_output_20220802.json @ ["Share1","HelpDesk/Passwords.txt","mtime_epoch"] - "2022-08-01 20:19:11" + "2022-08-02 08:32:12" @ ["Share1","HelpDesk/Passwords.txt","size"] - "1.27 KB" + "1.28 KB"
Closing Thoughts
Working with JSON-formatted data need not be a chore. There are many great tools to assist with processing and utilizing it in numerous ways. Above, I have attempted to present my favorite tools for working with JSON-formatted data as well as some practical examples to better illustrate their uses. As evidenced, JSON-formatted data is already prevalent on the web and is making its way as a standardized output format for many other tools. I hope some of the examples above prompt you to find new ways to work with JSON-formatted data and to embrace it as the useful container that it can be.
Thanks
This blog would not have been possible without the following people:
- Adam Compton @tatanus
- Julie Daymut
- Justin Bollinger @bandrel
- Larry Spohn @Spoonman1091
- Lou Scicchitano @LouScicchitano
- Mike Spitzer
Reference
- Certified Pre-Owned by Will Schroeder from SpecterOps
The post Working with data in JSON format appeared first on TrustedSec.
from TrustedSec https://www.trustedsec.com/blog/working-with-data-in-json-format/
Comments
Post a Comment