linux – nivas,b:=log() https://www.nivas.hr/blog This is a blog from the Nivas.hr crew to the galaxy of unknown. Tue, 19 Sep 2017 12:50:51 +0000 en-US hourly 1 https://wordpress.org/?v=5.8.2 Measuring Disk IO Performance on MacOS https://www.nivas.hr/blog/2017/09/19/measuring-disk-io-performance-macos/ https://www.nivas.hr/blog/2017/09/19/measuring-disk-io-performance-macos/#comments Tue, 19 Sep 2017 12:50:51 +0000 https://www.nivas.hr/blog/?p=2573 Over time and numerous hardware updates around the office, I collected a vast number of 2.5″ HDD’s in my “hardware junk” box. The other day, I noticed two Kingston SSDNow V200 128GB SSD’s just sitting there doing nothing, so I decided to make them usable again. I have a really BAD track record of broken non-ssd 2.5″ travelling external disks. 99% of them broke or started showing serious problems just after 1st year of usage (traveling with them with the notebook). I wanted to see how will SSD disk act in same conditions.

I visited my local hardware store to get USB3 2.5″ HDD enclosure, being geek, I did my homework and decided to get noname enclosure for 15 EUR with semi rubber protection.
Good lady at the counter suggested that instead of 15EUR one, I get 13EUR noname enclosure since “it was better”.

Sceptical that I am, I bought both and decided to do a test and prove her that she is wrong. The one with higher price had to be better. :)

After fitting disks in enclosures, first issue I stumbled upon was a lack of disk benchmarking tool on MacOS. On Windows I used hdtune for ages and was happy with it. On MacOS however, Blackmagic Disk Speed Test in Mac App Store did not inspire confidence in me (blac kmagic, cmon?), not did 11yrs old Xbench or jDiskMark beta (written in Java).

In Ubuntu/Debian/RHEL land I’ve benchmarked device IO before and had good experience with FIO. FIO is a popular tool for measuring IOPS on a Linux servers.


Do not make mistake of benchmarking (or using dd for eg.) /dev/disk device.
On MacOS you should always use /dev/rdisk device.

/dev/disk – buffered access, for kernel filesystem calls, broken in 4kb chunks. goes more expensive root.
/dev/rdisk – “raw” in the BSD sense and force block-aligned I/O. Those devices are closer to the physical disk than the buffered cache ones.
If you do a read or write larger than one sector to /dev/rdisk, that request will be passed straight through. The lower layers may break it up (eg., USB breaks it up into 128KB pieces due to the maximum payload size in the USB protocol), but you generally can get bigger and more efficient I/Os. When streaming, like via dd, 128KB to 1MB are pretty good sizes to get near-optimal performance on current non-RAID hardware. (source)

1. Install FIO

brew install fio

2. Check correct disk number

diskutil list

Everything from this step forward can and will delete data on your disk. So BE VERY CAREFUL on which disk you use. You have been warned.

3. Precondition SSD
We precondition each drive the same way for each measurement, and stimulate the drive to the same performance state so the test process is deterministic

sudo dd if=/dev/zero of=/dev/rdisk2 bs=1m

4. Running tests

Random read/write performance

./fio --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

Random read performance

./fio --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randread

Random write performance

./fio --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite

(On MacOS we must use posixaio ioengine. If you are on running some different flavour of Unix just replace –ioengine=posixaio with eg. –ioengine=libaio for Ubuntu)

5. The results

The lady at the store was right! Using same HDD’s the cheaper HDD enclosure gave us better results. It was faster by almost 35%.

tray

read mb/s write mb/s read IOPS write IOPS
ASMT (/dev/disk)

10.9MiB/s 11.9MiB/s 86 IOPS 94 IOPS
ASMT

69.7MiB/s 72.8MiB/s 552 IOPS 576 IOPS
PATRIOT

92.4MiB/s 93.5MiB/s 738 IOPS 747 IOPS

If you are interested in values I got, here there are.

The first set of benchmarks (done on buffered /dev/disk device) revealed really poor performance [r=10.9MiB/s,w=11.9MiB/s][r=86,w=94 IOPS].

sudo fio --filename=/dev/disk2 --direct=1 --rw=randrw --rwmixwrite=50 --refill_buffers --norandommap --randrepeat=0 --ioengine=posixaio --bs=128k --rate_iops=1280  --iodepth=16 --numjobs=1 --time_based --runtime=86400 --group_reporting --name=benchtest
fio-2.18
Starting 1 thread
^Cbs: 1 (f=1), 0-2560 IOPS: [m(1)][0.5%][r=10.9MiB/s,w=11.9MiB/s][r=86,w=94 IOPS][eta 23h:52m:35s]
fio: terminating on signal 2

benchtest: (groupid=0, jobs=1): err= 0: pid=3075: Fri Mar 24 20:14:55 2017
   read: IOPS=94, BW=11.8MiB/s (12.4MB/s)(5234MiB/445379msec)
    slat (usec): min=0, max=303, avg= 0.40, stdev= 2.28
    clat (msec): min=47, max=228, avg=100.40, stdev=14.81
     lat (msec): min=47, max=228, avg=100.40, stdev=14.81
    clat percentiles (msec):
     |  1.00th=[   74],  5.00th=[   82], 10.00th=[   85], 20.00th=[   90],
     | 30.00th=[   93], 40.00th=[   96], 50.00th=[   98], 60.00th=[  102],
     | 70.00th=[  105], 80.00th=[  111], 90.00th=[  119], 95.00th=[  127],
     | 99.00th=[  151], 99.50th=[  161], 99.90th=[  184], 99.95th=[  192],
     | 99.99th=[  208]
  write: IOPS=94, BW=11.8MiB/s (12.4MB/s)(5237MiB/445379msec)
    slat (usec): min=0, max=296, avg= 0.53, stdev= 2.81
    clat (msec): min=25, max=177, avg=69.66, stdev= 9.52
     lat (msec): min=25, max=177, avg=69.66, stdev= 9.52
    clat percentiles (msec):
     |  1.00th=[   51],  5.00th=[   58], 10.00th=[   61], 20.00th=[   63],
     | 30.00th=[   66], 40.00th=[   68], 50.00th=[   69], 60.00th=[   71],
     | 70.00th=[   73], 80.00th=[   76], 90.00th=[   80], 95.00th=[   86],
     | 99.00th=[  105], 99.50th=[  114], 99.90th=[  133], 99.95th=[  137],
     | 99.99th=[  151]
    lat (msec) : 50=0.44%, 100=76.81%, 250=22.76%
  cpu          : usr=0.46%, sys=0.41%, ctx=283619, majf=3, minf=6
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.0%, 16=50.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=98.3%, 8=1.7%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=41875,41894,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=11.8MiB/s (12.4MB/s), 11.8MiB/s-11.8MiB/s (12.4MB/s-12.4MB/s), io=5234MiB (5489MB), run=445379-445379msec
  WRITE: bw=11.8MiB/s (12.4MB/s), 11.8MiB/s-11.8MiB/s (12.4MB/s-12.4MB/s), io=5237MiB (5491MB), run=445379-445379msec

Repeated benchmark on same enclosure, but using raw device (/dev/rdisk) revealed much nicer numbers – 600% faster than buffered device
[m(1)][0.3%][r=69.7MiB/s,w=72.8MiB/s][r=552,w=576 IOPS][eta 23h:55m:54s]

sudo fio --filename=/dev/rdisk2 --direct=1 --rw=randrw --rwmixwrite=50 --refill_buffers --norandommap --randrepeat=0 --ioengine=posixaio --bs=128k --rate_iops=1280  --iodepth=16 --numjobs=1 --time_based --runtime=86400 --group_reporting --name=benchtest
fio-2.18
Starting 1 thread
^Cbs: 1 (f=1), 0-2560 IOPS: [m(1)][0.3%][r=69.7MiB/s,w=72.8MiB/s][r=552,w=576 IOPS][eta 23h:55m:54s]
fio: terminating on signal 2

benchtest: (groupid=0, jobs=1): err= 0: pid=3075: Fri Mar 24 21:13:39 2017
   read: IOPS=538, BW=67.3MiB/s (70.6MB/s)(16.2GiB/245308msec)
    slat (usec): min=0, max=47, avg= 0.45, stdev= 1.02
    clat (msec): min=8, max=45, avg=15.05, stdev= 2.70
     lat (msec): min=8, max=45, avg=15.05, stdev= 2.70
    clat percentiles (usec):
     |  1.00th=[11200],  5.00th=[12224], 10.00th=[12736], 20.00th=[13376],
     | 30.00th=[13888], 40.00th=[14400], 50.00th=[14784], 60.00th=[15168],
     | 70.00th=[15680], 80.00th=[16320], 90.00th=[17280], 95.00th=[18048],
     | 99.00th=[23936], 99.50th=[36608], 99.90th=[39680], 99.95th=[40192],
     | 99.99th=[42240]
  write: IOPS=538, BW=67.4MiB/s (70.7MB/s)(16.2GiB/245308msec)
    slat (usec): min=0, max=65, avg= 0.46, stdev= 0.67
    clat (msec): min=6, max=45, avg=14.56, stdev= 2.71
     lat (msec): min=6, max=45, avg=14.57, stdev= 2.71
    clat percentiles (usec):
     |  1.00th=[10560],  5.00th=[11712], 10.00th=[12224], 20.00th=[12864],
     | 30.00th=[13376], 40.00th=[13888], 50.00th=[14272], 60.00th=[14784],
     | 70.00th=[15168], 80.00th=[15808], 90.00th=[16768], 95.00th=[17536],
     | 99.00th=[23680], 99.50th=[36096], 99.90th=[39168], 99.95th=[40192],
     | 99.99th=[42240]
    lat (msec) : 10=0.22%, 20=98.34%, 50=1.44%
  cpu          : usr=3.48%, sys=2.40%, ctx=531264, majf=3, minf=5
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.0%, 16=50.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=97.9%, 8=1.8%, 16=0.3%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=132027,132160,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=67.3MiB/s (70.6MB/s), 67.3MiB/s-67.3MiB/s (70.6MB/s-70.6MB/s), io=16.2GiB (17.4GB), run=245308-245308msec
  WRITE: bw=67.4MiB/s (70.7MB/s), 67.4MiB/s-67.4MiB/s (70.7MB/s-70.7MB/s), io=16.2GiB (17.4GB), run=245308-245308msec

Finally, the second HDD tray I benchmarked revealed best results, almost 35% faster than cheap-enclosure-1.
[m(1)][0.5%][r=92.4MiB/s,w=93.5MiB/s][r=738,w=747 IOPS][eta 23h:52m:50s]

sudo fio --filename=/dev/rdisk3 --direct=1 --rw=randrw --rwmixwrite=50 --refill_buffers --norandommap --randrepeat=0 --ioengine=posixaio --bs=128k --rate_iops=1280  --iodepth=16 --numjobs=1 --time_based --runtime=86400 --group_reporting --name=benchtest
fio-2.18
Starting 1 thread
^Cbs: 1 (f=1), 0-2560 IOPS: [m(1)][0.5%][r=92.4MiB/s,w=93.5MiB/s][r=738,w=747 IOPS][eta 23h:52m:50s]
fio: terminating on signal 2

benchtest: (groupid=0, jobs=1): err= 0: pid=3075: Fri Mar 24 20:37:26 2017
   read: IOPS=761, BW=95.2MiB/s (99.8MB/s)(39.2GiB/430198msec)
    slat (usec): min=0, max=310, avg= 0.55, stdev= 2.23
    clat (msec): min=1, max=48, avg=11.43, stdev= 2.84
     lat (msec): min=1, max=48, avg=11.43, stdev= 2.84
    clat percentiles (usec):
     |  1.00th=[ 6880],  5.00th=[ 8256], 10.00th=[ 8896], 20.00th=[ 9536],
     | 30.00th=[10048], 40.00th=[10560], 50.00th=[11072], 60.00th=[11584],
     | 70.00th=[12224], 80.00th=[12864], 90.00th=[14016], 95.00th=[15296],
     | 99.00th=[22912], 99.50th=[28800], 99.90th=[35584], 99.95th=[37120],
     | 99.99th=[40704]
  write: IOPS=762, BW=95.3MiB/s (99.9MB/s)(40.3GiB/430198msec)
    slat (usec): min=0, max=767, avg= 0.96, stdev= 3.58
    clat (usec): min=492, max=45310, avg=9422.63, stdev=2869.71
     lat (usec): min=493, max=45311, avg=9423.59, stdev=2869.68
    clat percentiles (usec):
     |  1.00th=[ 5024],  5.00th=[ 6240], 10.00th=[ 6944], 20.00th=[ 7712],
     | 30.00th=[ 8256], 40.00th=[ 8640], 50.00th=[ 9024], 60.00th=[ 9536],
     | 70.00th=[10048], 80.00th=[10688], 90.00th=[11712], 95.00th=[13120],
     | 99.00th=[21888], 99.50th=[27264], 99.90th=[35072], 99.95th=[37120],
     | 99.99th=[40704]
    lat (usec) : 500=0.01%
    lat (msec) : 2=0.01%, 4=0.08%, 10=49.48%, 20=49.08%, 50=1.35%
  cpu          : usr=4.59%, sys=2.86%, ctx=1256049, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=57.4%, 16=42.6%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=98.2%, 8=1.8%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=327551,327861,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=95.2MiB/s (99.8MB/s), 95.2MiB/s-95.2MiB/s (99.8MB/s-99.8MB/s), io=39.2GiB (42.1GB), run=430198-430198msec
  WRITE: bw=95.3MiB/s (99.9MB/s), 95.3MiB/s-95.3MiB/s (99.9MB/s-99.9MB/s), io=40.3GiB (42.1GB), run=430198-430198msec

Conclusion
fio is pretty robust utility for io testing. Beware of quality of onboard electronics when buying HDD trays. Trays within same price range, can vary 15-30% in speed.

]]>
https://www.nivas.hr/blog/2017/09/19/measuring-disk-io-performance-macos/feed/ 2
Apache sending “Vary: Host” making things uncacheable for Varnish https://www.nivas.hr/blog/2017/02/13/apache-sending-vary-host-making-things-uncacheable-varnish/ https://www.nivas.hr/blog/2017/02/13/apache-sending-vary-host-making-things-uncacheable-varnish/#respond Mon, 13 Feb 2017 22:36:02 +0000 https://www.nivas.hr/blog/?p=2530 TLDR;
Using %{HTTP_HOST} in .htaccess, will cause Apache to included a “Vary: Host” field in response.
Subsequently “Vary: Host” header from Apache will force Varnish not to cache otherwise cacheable content.

HTTP Vary is not a trivial concept. It is by far the most misunderstood HTTP header. (Varnish Docs)

On a project I’ve been working, I could not make Varnish hard cache the site no matter what I did.
It was mostly read-only site and I wanted to achieve that in case of backend failures – site would still run from the cache. To avoid surprises, I was using my own configuration template which was working exactly as I wanted it on different project.

But, no matter what I did – Varnish was not caching everything. Instead I got ‘per-user‘ cache: if user visited eg, homepage, in case of backend failure – user could reload homepage and Varnish would serve homepage from stale cache to the user. Visiting any other page user did not visit before backend failure (and therefore not in his cache), would result in Varnish freaking that backend is down.

What I noticed was strangely large Vary-Header in Varnish response

Vary:Host,Accept-Encoding,User-Agent

After spending hours of tunneling, debugging Varnish configuration, analysing header responses, comparing server configurations, hunting for cookies that could have been somehow magically slip through… last place I looked was my main .htaccess. I was using mod_deflate there, but as it checked out everything was fine.

On a side note, I have been experimenting with per host configuration in .htaccess, so Basic Authentication would not kick in dev enviroment. It was working great up until now, and it goes something like this:

<if "%{HTTP_HOST} == 'dev.nivas.hr'">

Require valid-user
...
Allow from facebook.com

</if>

What I had to find out the hard way, is that if special Apache enviroment variable %{HTTP_HOST} was used in a eg. .htaccess, Apache would change response header. Server was returning a header which included a “Vary: Host” field, which means that the server didn’t serve a regular static page, but its reply depends on the “Host” field in the HTTP request. Browsers interpret this as “the content returned is dynamic, don’t cache it (source).

curl -I http://localhost/
HTTP/1.1 200 OK
Date: Mon, 13 Feb 2017 14:48:14 GMT
Server: Apache/2.4.6 (CentOS)
Vary: Host,User-Agent
Content-Type: text/html; charset=UTF-8

It is really strange I did not hit this before because RewriteCond can add “Host” to the Vary-Header as well. eg. a RewriteCond that evaluates %{HTTP_HOST} automatically adds “Host” to the Vary-Header. This is unnecessary and not permitted according to https://tools.ietf.org/html/rfc7231#section-7.1.4. The issue was reported and has been sitting in Apache bugtraq for a while without clear resolution.

I can understand why Varnish is not caching, but cannot understand Apache logic. I do have VirtualHost defined, so therefore my request do vary on Host. There is no need in forcing this out in response.

After removing %{HTTP_HOST} from .htaccess, site was cacheable as we wanted.

HTTP/1.1 200 OK
Date: Mon, 13 Feb 2017 22:18:33 GMT
Content-Type: text/html; charset=UTF-8
Vary: Accept-Encoding
Age: 3707
X-Nivas-Crew: loves you :) https://www.nivas.hr
X-Backend: backend_app1
X-Cache: HIT
X-Cache-Hits: 786332
X-Vudu-Url-Cache: hit

Don’t forget to normalize your Vary in Varnish, chance are without normalization it will never see a cache hit.

Happy caching!

Have a cool Varnish project you need help on? Contact us.

]]>
https://www.nivas.hr/blog/2017/02/13/apache-sending-vary-host-making-things-uncacheable-varnish/feed/ 0