How is rule strength measured?

Strength is measured as a combination of:

  • The rule strength figure which indicates the number of messages captured by the rule on all systems (comparatively) on a logarithmic scale from 0.0-5.0. Typically rule strength figures at or above 3.0 are considered "strong" because this figure indicates a substantial number of messages are being captured by the rule. Conversely this also means that removing the rule would likely cause the filtering system to "leak" a significant number of messages that would normally be considered "spam".
  • In addition, the lifetime of the rule is considered. If the rule has been active for a significant amount of time with a moderate rule strength figure (capture rate) then this indicates that substantially all users of the system are satisfied that the rule is not creating false positives on their system.
  • Finally, the source of the rule is considered. There are two sets of rules we use when evaluating spam rules -- one (spamtrap) for clean sources where there is no chance of human error. Messages sourced through spamtraps arrived because they were sent to illegitimate mailbox addresses that were never active (or similar mechanisms). User sources indicate messages that arrived because some human being reviewed the message, decided it was spam and submitted it to us. We treat user sourced spam very conservatively in order to avoid false positives. We treat spamtrap sourced spam less conservatively -- but we are still careful to avoid creating rules that will cause false positives for other reasons or due to artifacts in the process. For example, we will avoid creating rules that will capture automated responses while we will attempt to create rules that will capture automated responses that are generated from spam/malware (backscatter).

For example, a strong rule would be a rule that has been in place for more than a year (372 days), has a substantial capture rate (3.288), and was created from messages sourced in a clean spamtrap. Put another way, the rule was created from a message that arrived at an illegitimate email address and the rule has been capturing a substantial number of messages for more than a year without any prior complaints.

Related Topics