1 of 25

Data transfer cost optimization

2 of 25

What is Data Transfer Cost?

3 of 25

Problem of ever increasing Data Transfer cost

  • DT cost is charged per gb on the basis of amount of data your infrastructure is sending and receiving over the internet or within your own infrastructure.
  • As we need scalability and availability, our infrastructure grows:
    • Increased Instances
    • Supporting more AZs
    • Distributed Backups
    • More services communicating with each other
  • It is apparent that this would increase the cost of running your infrastructure but here the DT cost is unaccounted for.

4 of 25

Tracking DT costs

  • DT costs can be tracked by querying AWS Cost and Usage reports

5 of 25

6 of 25

7 of 25

Key Takeaways to handle DT cost

8 of 25

Data transfer over public IP

9 of 25

Serving static High availability content

10 of 25

Data transfer over NAT gateway

11 of 25

Data transfer over VPC endpoints

Currently only supports S3 and dynamodb

12 of 25

Other points to keep in mind

  • Inter-Region Data Transfer
  • Inter-AZ Data Transfer
  • Expensive Region for resources
  • Tagging your resources for cost-allocation

13 of 25

What is still missing

What have we achieved?

  • Basic overview of Data Transfer in your VPC
  • Common Architectural misconfigurations

What is still missing?

  • Identifying the source and destination resources of these transfers
  • What are the protocols of these transfers?
  • What was the size of these transfers?

14 of 25

VPC Flow Logs

15 of 25

VPC Flow Record

  • IP Address
  • Port
  • IP Address
  • Port

Source

Destination

Protocol, Bytes, Packets . . .

16 of 25

Setup VPC Flow Logs

17 of 25

Extract

18 of 25

VPC Flow Record

Version - 2

Account Id - 123456789010

Network Interface Id - eni-1235b8ca123456789

Source IP - 172.31.16.139

Destination IP - 172.31.16.21

Source Port - 20641

Destination Port - 22

Protocol - 6

Packets - 20

Size(Bytes) - 4249

Start time - 1418530010

End time - 1418530070

Action - ACCEPT

Log Status - OK

19 of 25

Transform

20 of 25

Enriched flow logs

Version - 2

Account Id - 123456789010

Network Interface Id - eni-1235b8ca123456789

Source IP - 172.31.16.139

Destination IP - 172.31.16.21

Source Port - 20641

Destination Port - 22

Protocol - 6

Packets - 20

Size(Bytes) - 4249

Start time - 1418530010

End time - 1418530070

Action - ACCEPT

Log Status - OK

Version - 2

Account Id - 123456789010

Network Interface Id - eni-1235b8ca123456789

Source IP - 172.31.16.139

Source IP type - private

Source resource Id - i-34j23j40fsdhf

Source Region - us-east-1

Source Az - us-east-1a

Source VPC Id - vpc-342j432jh

Source Subnet Id - subnet-1

Source Service - AmazonEC2

Destination IP - 172.31.16.21

Destination IP type - private

Destination resource Id - i-sfwe4534fsd3

Destination Region - us-west-1

Destination Az - us-west-1a

Destination VPC Id - vpc-345fgd45

Destination Subnet Id - subnet-4

Destination Service - AmazonEC2

Source Port - 20641

Destination Port - 22

Protocol - 6

Packets - 20

Size(Bytes) - 4249

Transfer Type - Inter Region

. . .

. . .

21 of 25

Load

22 of 25

Analysing data

23 of 25

Granular data view

24 of 25

Future Work

  • More edge cases have to be handled to gain 100% visibility (ex. when there is an ELB involved).
  • Ability to have even granular information by attaching the details of the application running on the instances for large and dynamic infrastructure.
  • Ability to analyse current data transfer routes and recommend based on pricing the cost efficient routes without affecting the availability or scalability.
  • Ability to identify resources with anomalous data transfer

25 of 25

Thank you