Scroll Top
NEWS
NEWS
NEWS

On “Normalizing” or “Scaling” Cybersecurity Metrics and Measuring The Right Thing For The Right Entities

Introduction

One challenge of cyber security is deciding where to prioritize one’s limited cyber security resources. Who’s doing “okay?” Who desperately needs help? Who should we just quarantine until they can get their chaotic users under control? At a minimum, can we at least rank countries according to who’s worst/best? We need to have some basis for prioritizing our limited time and resources.

Focusing On Botted Hosts Sending Spam

To make this problem concrete, consider the terrific spam-related metrics shared by the Spamhaus Composite Blocking List (CBL).1 On Wednesday, March 9th, 2016, the CBL knew about 9,121,043 IP addresses associated with spam-sending botted hosts. That’s a LOT of malware-infected systems!

As shown in table 1, just three countries – India, Vietnam and China – account for roughly 1/3rd of all CBL listings, and a total of just 10% of all countries (20 out of 200) collectively accounted for ~3/4ths of all CBL listings:

Country

Listings

% Total Listings

% Cumulative Total Listings

Rank

Total

9,121,043

IN

1,182,291

12.96

12.96

1

VN

998,743

10.95

23.91

2

CN

766,659

8.41

32.32

3

RU

511,132

5.60

37.92

4

BR

437,531

4.80

42.72

5

ID

424,372

4.65

47.37

6

IR

347,215

3.81

51.18

7

US

227,315

2.49

53.67

8

TH

218,422

2.39

56.06

9

MX

201,497

2.21

58.27

10

PK

189,572

2.08

60.35

11

IT

178,898

1.96

62.31

12

AR

173,898

1.96

62.31

13

TW

163,190

1.79

66

14

JP

150,565

1.65

67.65

15

DE

147,173

1.61

69.27

16

TR

137,183

1.50

70.77

17

EG

133,232

1.46

72.23

18

AU

130,114

1.43

73.66

19

VE

118,022

1.29

74.95

20

Surely we can all agree that those 20 countries represent the “worst of the worst” when it comes to malware infected systems used to send spam? Surprisingly, no.

Normalizing By Country Population

Representatives of some of those “top” countries, perhaps feeling a bit self-defensive, may be quick to point out that they’ve got huge populations, so it really isn’t “fair” to just compare “raw counts” between countries. E.g., India has 1,182,291 CBL listings, but spread over a population of over 1.25 billion Indians, that’s a rate of just (1,182,291 / 1,250,000,000) * 100 = 0.0945%

By comparison, Italy has 178,898 CBL listings, but a population of only 60 million, which yields a rate of (178,898 / 60,000,000) * 100 = 0.2981%.

Dividing Italy’s 0.2981% by India’s 0.0945%, we can see that Italy is currently 3.155 times “more infested” than India on a per-capita basis.

Arguably, then, Italy should “obviously” be prioritized ahead of India when it comes to any hypothetical anti-bot “clean up campaign,” right? Actually, no. Malware infections per capita represent a measure of infection density. Infection density is important if you’re thinking about efficient infection cleanup, but largely irrelevant if your goal is to reduce the impact of the bots on mail servers and their admins.

Ranking According to Actual Pain Delivered Toward A Target

Mail admins running mail servers under siege from spam really don’t care about “infection rates per capita.” They care about the spam traffic they’re seeing. Fortunately, the CBL has data about that as well. Focusing on spam that’s being delivered (vs. botted hosts potentially able to send spam) changes the picture dramatically:

Table 2. Spam Sent To One CBL Spamtrap, By Top 20 Origin Countries, Past Three Days

Country

Traffic

% Traffic

% Cumulative Traffic

Traffic Rank

Bot Rank

Spams/Bots

Total

158,082,638

100.00

US

52,440,285

33.17

33.17

1

8

231

BR

27,016,447

17.09

50.26

2

5

62

VN

15,119,324

9.56

59.83

3

2

15

RU

7,248,463

4.59

64.41

4

4

14

IN

5,384,253

3.41

67.82

5

1

4

MX

4,685,557

2.96

70.78

6

10

23

AR

3,257,637

2.06

72.84

7

13

18

PL

2,638,323

1.67

74.51

8

22

22

UA

2,557,023

1.62

76.13

9

23

22

ES

2,177,246

1.38

77.51

10

21

18

CO

1,639,560

1.04

78.54

11

34

34

IT

1,546,022

0.98

79.52

12

12

8

TR

1,360,135

0.86

80.38

13

17

9

TW

1,230,689

0.78

81.16

14

14

7

ID

1,212,763

0.77

81.93

15

6

2

IQ

1,179,500

0.75

82.67

16

36

28

CN

1,170,168

0.74

83.41

17

3

1

AT

1,034,575

0.65

84.07

18

38

25

RO

983,530

0.62

84.69

19

25

10

DE

969,088

0.61

85.30

20

16

6

Now the problem is clearly not the huge number of botted hosts in India, but the huge volumes of spam coming from botted systems in the United States. That is, while there are a relatively small number of botted hosts in the United States, those botted hosts are typically particularly aggressive, sending an average of 231 spam/bot, while by comparison, the botted hosts in India send a measly 4 spam/bot. Treating all botted hosts as if they were essentially equivalent is clearly unwarranted. Some of those infected systems are heavy artillery, while others are mere cap guns.

[We must also remember to take the above numbers with a “grain of salt” since that the statistics in the above table are based on a single CBL spamtrap site, which although they are among the best data currently available, may not be reflective of global spam flows overall]

Teasing Apart US Spam Traffic Sources

Let’s now drill down on spam traffic from US ISPs. Where does the majority of US-origin spam traffic actually come from?

For the purpose of this analysis, ISPs are identified by their Autonomous System Number, or “ASN.” Many ISPs use only a single ASN, but some (such as rr.com) are associated with multiple ASNs, often as a result of legacy networks getting acquired as a result of mergers and acquisitions. When multiple ASNs are associated with the same ISP, we’ve aggregated them for the purposes of this analysis.

Table 3. US ASNs, Ranked By Traffic/ASN, Last Three Days, 500K+ spam/ASN

ASN

Listings

% Total

Traffic

% Traffic

Rank

Spams/Bots

Total

9,098,790

100.00

158,082,638

17.37

Total rr.com (sum of all rr.com ASNs shown)

11,944

0.13

6,093,048

3.85

AS20001 rr.com US

2,427

0.03

1,595,433

1.01

7

657

AS10796 rr.com US

2,714

0.03

1,361,222

0.86

10

501

AS11427 rr.com US

2,180

0.02

1,068,145

0.68

14

489

AS11426 rr.com US

1,808

0.02

799,539

0.51

26

442

AS11351 rr.com US

1,583

0.02

747,733

0.47

28

472

AS12271 rr.com US

1,232

0.01

520,966

0.33

41

422

AS20115 charter.net US

4,998

0.05

5,322,681

3.37

3

1,064

AS46892 Winnebago US

102

0.00

2,609,218

1.65

5

25,580

AS12083 knology.net US

945

0.01

1,308,289

0.83

11

1,384

AS11232 midco.net US

308

0.00

1,025,612

0.65

15

3,329

AS33548 unwiredbb.com US

91

0.00

996,577

0.63

16

10,951

AS7922 comcast.net US

19,793

0.22

908,030

0.57

19

45

AS11979 blue.net US

175

0.00

883,478

0.56

21

5,048

AS30036 fortrex.com US

790

0.01

827,467

0.52

22

1,047

AS33363 mybrighthouse.com US

2,123

0.02

819,728

0.52

23

386

AS10835 vcn.com US

2,123

0.02

816,028

0.52

25

9,600

AS19108 suddenlink.net US

1,083

0.01

711,898

0.45

30

657

AS46606 Unified Layer, US

855

0.01

656,722

0.42

31

768

AS33588 bresnan.net US

309

0.00

626,872

0.40

32

2,028

AS174 cogentco.com US

1,436

0.02

566,408

0.36

37

394

AS5056 netins.net US

262

0.00

514,315

0.33

43

1,963

AS22773 cox.net US

2,575

0.03

510,427

0.32

44

198

Unfiltered Pain

There is one other reality that we must remember: the spam that the CBL sees all gets blocked (at least if you’re using the CBL as part of your spam filtering, as many sites do).

Thus, ironically, if we were to prioritize working on the ISPs that are most broadly represented in the CBL, you’d (in some ways) be wasting your time: the spam from those hosts is already getting blocked, at least at sites that use the CBL.

So now you can see the problem. We need to identify the hosts that are successfully delivering spam IN SPITE of block list entries and other anti-spam heuristics.

Documenting the “false negatives” that get through filtering is a hard and largely thankless job, and one that relies on inherently error-prone mechanisms such as users pushing a “this is spam” button, or perhaps the processing of mail streams by multiple categorization engines.

Author: Joe St Sauver, Ph.D., Scientist, Farsight Security, Inc., Member of CyberGreen’s Statistics Experts Group

1 http://www.abuseat.org/

PRIVACY POLICY

CyberGreen (“us”, “we”, or “our”) operates the CyberGreen website (the “Website Service”).

This page informs you of our policies regarding the collection, use and disclosure of Personal Information when you use our Website Service.

We will not use or share your information with anyone except as described in this Privacy Policy.

We use your Personal Information for providing and improving the Website Service. By using the Website Service, you agree to the collection and use of information in accordance with this policy. Unless otherwise defined in this Privacy Policy, terms used in this Privacy Policy have the same meanings as in our Terms of Use, accessible at https://www.cybergreen.net.

MANUAL INFORMATION COLLECTION AND USE

While using our Website Service, we may ask you to provide us with certain personally identifiable information that can be used to contact or identify you. Personally identifiable information (“Personal Information”) may include, but is not limited to:

  • Name
  • Email address
  • Affiliation
  • Internet Protocol (“IP”) address/range (if requesting opt out from CyberGreen scans)

BROWSER INFORMATION (LOG DATA)

We collect information that your browser sends whenever you visit our Website Service (“Log Data”). This Log Data may include information such as your computer’s IP address, browser type, browser version, the pages of our Website Service that you visit, the time and date of your visit, the time spent on those pages and other statistics.

COOKIES

Cookies are files with small amounts of data, which may include an anonymous unique identifier. Cookies are sent to your browser from a web site and stored on your computer’s hard drive.

We use cookies to collect, store, and/or correlate information. You can instruct your browser to refuse all cookies or to indicate when a cookie is being sent. However, if you do not accept cookies, you may not be able to use some portions of our Website Service.

SERVICE PROVIDERS

We may employ third-party companies and individuals to facilitate our Website Service, to provide the Website Service on our behalf, to perform Website Service-related services or to assist us in analyzing how our Website Service is used.

These third parties have access to your Personal Information only to perform these tasks on our behalf and are obligated not to disclose or use it for any other purpose.

SECURITY

The security of your Personal Information is important to us, but remember that no method of transmission over the Internet, or method of electronic storage is 100% secure. While we strive to use commercially acceptable means to protect your Personal Information, we cannot guarantee its absolute security.

LINKS TO OTHER SITES

Our Website Service may contain links to other sites that are not operated by us. If you click on a third-party link, you will be directed to that third party’s site. We strongly advise you to review the Privacy Policy of every site you visit.

We have no control over, and assume no responsibility for the content, privacy policies or practices of any third-party sites or services.

CHILDREN’S PRIVACY

Our Website Service does not address anyone under the age of 18 (“Children”).

We do not knowingly collect personally identifiable information from children under 18. If you are a parent or guardian and you are aware that your child has provided us with Personal Information, please contact us. If we discover that a child under 18 has provided us with Personal Information, we will delete such information from our servers immediately.

COMPLIANCE WITH LAWS

We will disclose your Personal Information where required to do so by law or subpoena.

CHANGES TO THIS PRIVACY POLICY

We may update our Privacy Policy from time to time. We will notify you of any changes by posting the new Privacy Policy on this page.

You are advised to review this Privacy Policy periodically for any changes. Changes to this Privacy Policy are effective when they are posted on this page.

CONTACT US

If you have any questions about this Privacy Policy, please contact us.

TERMS OF USE

The CyberGreen Institute (“CyberGreen”) Is a non-profit, charitable organization dedicated to the creation and dissemination of metrics measuring the Cyber health of networks along with related data, metrics, and analysis. We also assist network operators with the adoption of Cyber hygiene best practices and risk remediation. A big part of our mission is the collection, calculation, and public distribution of our CyberGreen Index. The CyberGreen Index and the other data that we publish on this website is released under the Affero General Public License (version 3) (the “License”). The use of License ensures that our data remains freely accessible and freely useable by members of the public. (In rare circumstances, we may use another license to distribute data, in which case the specific data set will not be available without a click-thru notice specifying the specific license that applies.)

We do ask that you cite us properly in any academic work as the source for anything that you take from this website. If you are a commercial firm and wish to incorporate our data into a commercial product, you must acknowledge CyberGreen as the source of the data that you used and provide your customers with a link to this website with simple instructions on how to find the data that you took from our site.

We do not publish personally identifiable information (PII) or other information that implicates third party privacy rights. CyberGreen is committed to being compliant with GDPR. Our compliance efforts have been certified by the Institute for Social Internet Public Policy (ISIPP).

TERMS OF USE

The CyberGreen Institute (“CyberGreen”) Is a non-profit, charitable organization dedicated to the creation and dissemination of metrics measuring the Cyber health of networks along with related data, metrics, and analysis. We also assist network operators with the adoption of Cyber hygiene best practices and risk remediation. A big part of our mission is the collection, calculation, and public distribution of our CyberGreen Index. The CyberGreen Index and the other data that we publish on this website is released under the Affero General Public License (version 3) (the “License”). The use of License ensures that our data remains freely accessible and freely useable by members of the public. (In rare circumstances, we may use another license to distribute data, in which case the specific data set will not be available without a click-thru notice specifying the specific license that applies.)

We do ask that you cite us properly in any academic work as the source for anything that you take from this website. If you are a commercial firm and wish to incorporate our data into a commercial product, you must acknowledge CyberGreen as the source of the data that you used and provide your customers with a link to this website with simple instructions on how to find the data that you took from our site.

We do not publish personally identifiable information (PII) or other information that implicates third party privacy rights. CyberGreen is committed to being compliant with GDPR. Our compliance efforts have been certified by the Institute for Social Internet Public Policy (ISIPP).

TERMS OF USE

The CyberGreen Institute (“CyberGreen”) Is a non-profit, charitable organization dedicated to the creation and dissemination of metrics measuring the Cyber health of networks along with related data, metrics, and analysis. We also assist network operators with the adoption of Cyber hygiene best practices and risk remediation. A big part of our mission is the collection, calculation, and public distribution of our CyberGreen Index. The CyberGreen Index and the other data that we publish on this website is released under the Affero General Public License (version 3) (the “License”). The use of License ensures that our data remains freely accessible and freely useable by members of the public. (In rare circumstances, we may use another license to distribute data, in which case the specific data set will not be available without a click-thru notice specifying the specific license that applies.)

We do ask that you cite us properly in any academic work as the source for anything that you take from this website. If you are a commercial firm and wish to incorporate our data into a commercial product, you must acknowledge CyberGreen as the source of the data that you used and provide your customers with a link to this website with simple instructions on how to find the data that you took from our site.

We do not publish personally identifiable information (PII) or other information that implicates third party privacy rights. CyberGreen is committed to being compliant with GDPR. Our compliance efforts have been certified by the Institute for Social Internet Public Policy (ISIPP).

Privacy Preferences
When you visit our website, it may store information through your browser from specific services, usually in form of cookies. Here you can change your privacy preferences. Please note that blocking some types of cookies may impact your experience on our website and the services we offer.