YARA Rules Guide: Learning this Malware Research Tool

YARA rules are used to classify and identify malware samples by creating descriptions of malware families based on textual or binary patterns.

In this article I will cover:

How YARA rules function
Use cases for YARA
Elements you will need to know about YARA
How to write YARA rules

How Do YARA Rules Function?

YARA rules are like a piece of programming language, they work by defining a number of variables that contain patterns found in a sample of malware. If some or all of the conditions are met, depending on the rule, then it can be used to successfully identify a piece of malware.

When analyzing a piece of malware researchers will identify unique patterns and strings within the malware that allows them to identify which threat group and malware family the sample is attributed to. By creating a YARA rule from several samples from the same malware family, it is possible to identify multiple samples all associated with perhaps the same campaign or threat actor.

When investigating a piece of malware an analyst may create a YARA rule for a new sample they are investigating. This rule could then be used to search their own private malware database or online repositories such VirusTotal for similar samples.

If the malware analyst works for an organization that deploys an IPS or another YARA-supported platform that is used for malware protection, then YARA rules can be used as an incident response tool to detect malicious binaries within the organization.

Use Cases

YARA has proven to be extremely popular within the infosec community, the reason being is there are a number of use cases for implementing YARA:

Identify and classify malware
Find new samples based on family-specific patterns
Incident Responders can deploy YARA rules to identify samples and compromised devices
Proactive deployment of custom YARA rules can increase an organization’s defenses

Get the Free Pentesting Active
Directory Environments e-book

YARA Elements to Know

In order to build a useful YARA rule, you will need to know the various elements that can be used to build your own custom YARA rule.

Metadata

Metadata doesn’t affect what the YARA rule will search for, instead, it provides useful information about the rule itself.

Author – Name, email address, Twitter handle.
Date – Date rule was created.
Version – The version number of the YARA rule for tracking amendments.
Reference – A link to an article or download of the sample, this is used to provide relevant information on the malware sample the rule is designed to detect.
Description – A brief overview of the rule’s purpose and malware it aims to detect.
Hash – A list of sample hashes that were used to create the YARA rule.

Strings

It is common to find unique and interesting strings within a malware sample, these are ideal for building out a YARA rule. To define a string within a rule, the string itself needs to be declared as a variable.

$a=”string from malware sample”

In addition to declaring a string, we can also append modifiers after the declared string to fine-tune the search.

$a=”malwarestring” fullword – This modifier will match against an exact word. For example ‘www.malwarestring.com’ would return a match, but ‘www.abcmalwarestring.com’ would not.
$a=”malwarestring” wide – This would match unicode strings which are separated by null bytes, for example ‘w.w.w…m.a.l.w.a.r.e.s.t.r.i.n.g…c.o.m.’
$a=”malwarestring” wide ascii – This will allow the rule to match on unicode and ascii characters.
$a=”MalwareString” nocase – The rule will match the string regardless of case.

In the image below I have used HxD, a hex editor, here we can see some strings within the tool.

I have highlighted the ASCII string ‘\photo.png’ and the corresponding hexadecimal representation is also highlighted. Using this information you can declare a hex string within a YARA rule.

$a={5C 70 68 6F 74 6F 2E 70 6E 67} – Note the use of curly brackets instead of speech quotations.
$a={5C 70 68 6F ?? ?F 2E 70 6E 67} – Question marks can be used as wildcards if you have detected a slight variation of a hex pattern within multiple samples.
$a={5C [2-10] 6F 74 6F 2E 70 6E 67} – In this example, I have stated that the string may start with the value ‘5C’ but there may be 2 – 10 random bytes before the matching pattern begins again.
$a={5C (01 02 | 03 04) 6F 2E 70 6E 67} – In this example i have stated that the hex values in this location could be ‘01 02’ or ‘03 04’.

Some strings and unique identifiers that are great for YARA rules:

Mutexes – Unique to malware families, these are used by malware to check if a device has already been compromised by checking for the presence of the mutex.
Rare and unusual user agents – Identified when malware communicates with its C2 infrastructure.
Registry keys – Often created by malware as a persistence mechanism.
PDB paths – This stands for Program Database, a PDB contains debugging information about a file. It is very unlikely you will have PDB for a piece of malware but the PDB path can often be found and used in a YARA rule i.e. c:\users\user\desktop\vc++ 6\6.2.20\scrollerctrl_demo\scrollertest\release\scrollertest.pdb.
Encrypted config strings – Malware will often encrypt its config which contains useful IOCs such as IP addresses and domains. If you have the reverse engineering skills to identify this encrypted data then it can be used within a YARA rule.

Conditions

The strings section defines the search criteria that will be used for a YARA rule, the conditions section defines the criteria for the rule to trigger a successful match. There are multiple conditions that can be used which I will outline.

uint16(0) == 0x5A4D – Checking the header of a file is a great condition to include in your YARA rules. This condition is stipulating that the file must be a Windows executable, this is because the hex values 4D 5A are always located at the start of an executable file header. This is reversed in YARA due to endianness.
uint32(0)==0x464c457f) or (uint32(0) == 0xfeedfacf) or (uint32(0) == 0xcffaedfe) or (uint32(0) == 0xfeedface) or (uint32(0) == 0xcefaedfe) – Used to identify Linux binaries by checking the file header.
(#a == 6) – String count is equal to 6.
(#a > 6) – String count is greater than 6

There are a few different ways to specify the file size condition.

(filesize>512)
(filesize<5000000)
(filesize<5MB)

Once the strings have been declared within a rule you can then customize how many matches need to be triggered as a condition for the rule to return what it deems a successful condition.

2 of ($a,$b,$c)
3 of them
4 of ($a*)
all of them
any of them
$a and not $b

Where possible try and use 2-3 groups of conditions in order to avoid generating false positives and to also create a reliable rule.

Imports

Imports are a great way to implement additional conditions into your YARA rules, in this article I will cover some examples of how to use the PE import.

PE Library:

Adding the syntax ‘import pe’ to the start of a YARA rule will allow you to use the PE functionality of YARA, this is useful if you cannot identify any unique strings.

Exports are great additions to a YARA rule, exports are functions that the malware author has created so be sure to make use of their unique names. In the image below I have identified some exports used by a DLL that was dropped by a piece of Formbook malware.

pe.exports(“Botanist”, “Chechako”, “Originator”, “Repressions”)

In the image below I have identified an interesting DLL that is used for HTTP connectivity, winhttp.dll:

We can also see that this library imports a number of interesting APIs that could be included within a rule.

pe.imports(“winhttp.dll”, “WinHttpConnect”)
pe.machine == pe.MACHINE_AMD64 – Used for checking machine type.

An imphash is the hash of the malware’s import address table or IAT which we identified in the previous image using PEStudio. The same IAT will often be used across a malware family so using it in a YARA rule should detect similar samples.

pe.imphash() == “0E18F33408BE6E4CB217F0266066C51C”

For a files timestamp to be used in a YARA rule it must be converted to an epoch unix timestamp, in the image below I have identified when the malware was compiled.

Using the syntax ‘//’ allows comments to be made within the rule, so below I am able to add a comment which specifies what the epoch timestamp is.

pe.timestamp == 1616850469 // Tue Dec 08 17:58:56 2020

The version section of PEStudio shows that this sample of Lokibot has some unique version identifiers, using the pe.version_info function we can specify which version properties to use such as the ‘CompanyName’ field.

pe.version_info[“CompanyName”] contains AmAZon.cOm
pe.language(0x0804) // China – Languages identified can be used by specifying the Microsoft language code.

In the image below I have identified a number of sections in the malware that aren’t commonly found in other Windows executables I have analyzed. Using this information I can specify specific section names and the associated section number.

Note the sections are zero-indexed, so the first section would be ‘0’, the second would be ‘1’, and so on. So in the example below I have used the section named ‘BSS’ which is section number two.

pe.sections[2].name == “BSS”

How to Write Yara Rules

The image below is an example YARA rule I have created based on a sample of Redline malware:

Start of Rule

The YARA rule begins with the syntax ‘rule’ followed by the name of the rule. This is then appended with ‘{‘ to signify the content of the YARA rule.

Just above this, I have imported PE functionality by using the statement “import pe”, this functionality is used in the condition section of the rule.

MetaData

In the example rule, I have included the author, file type of the malware, date the rule was written, rule version, a reference to where I got the sample from, and also a hash of the malware. This gives some contextual information to anybody else who may use the rule or may even be of use to the author when they revisit the rule at a later point in time.

Declaring Strings

Next, I have specified some strings that I have found in the malware sample, these are declared as variables within the rule and can be used to search for files with similar content.

The strings I have used were identified using PEStudio and are a mix of interesting Windows API and strings that I think will be unique to this malware family.

Conditions

The conditions section is where the rule declares what conditions must be met in order for the YARA rule to trigger a match, the first rule I have stipulated is that the file header must be a Windows Executable. This is done by specifying the hex values found in the header of a Windows Executable, in the image below you can see how this is identified using a hex editor.

The file version within the malware also struck me as something that may be unique to the malware so I included this within the rule – “Versium Research 5 Installation”.

I have also specified that three imports must be present which PEStudio has flagged up as suspicious.

With time and experience, you will be able to spot suspicious sections within samples. Some examples of common sections you will see are ‘.data’, ‘.reloc’, and ‘.rsrc’. In this sample, I have found a few sections which don’t fit this pattern so my YARA rule is looking for the sections named ‘CODE’ and ‘BSS’.

The other conditions I have stipulated that must be met are that the first string declared as ‘$a1’ must be present OR three of the ‘$b’ strings or one of the ‘$c’ strings. The remaining condition is that the file size must be less than 50000 bytes.

This rule can now be used to start hunting for additional Redline samples.

Final Thoughts:

You now have the knowledge to start building out your own YARA rules to start hunting out new samples for analysis or alternatively start implementing some proactive detections within your organization.

If you are looking to mature your organization’s security posture then check out Varonis’ Edge: Perimeter Detection and Data Security Platform.

What should I do now?

Below are three ways you can continue your journey to reduce data risk at your company:

Schedule a demo with us to see Varonis in action. We'll personalize the session to your org's data security needs and answer any questions.

See a sample of our Data Risk Assessment and learn the risks that could be lingering in your environment. Varonis' DRA is completely free and offers a clear path to automated remediation.

Follow us on LinkedIn, YouTube, and X (Twitter) for bite-sized insights on all things data security, including DSPM, threat detection, AI security, and more.

Neil Fox Neil is a cyber security professional specializing in incident response and malware analysis. He also creates cyber security content for his YouTube channel and blog at 0xf0x.com.

YARA Rules Guide: Learning this Malware Research Tool

How Do YARA Rules Function?

Use Cases

Get the Free Pentesting Active
Directory Environments e-book