Work-around for page heap "reallocate in place" false positive.

TL;DR: verifier.dll has a bug where it falsely reports heap corruption. I've found a work-around.

While testing a new fuzzer in Chrome, verifier.dll started reporting heap corruption when a memory chunk was being freed. The report stated that a DWORD at 14 bytes before the start of the chunk was corrupted. When running with page heap enabled, each heap chunk gets a header that contains some information about the chunk. Whenever a heap function is called, the code in verifier.dll does a sanity check on this is information to see if it has been corrupted. This chunk was failing that check.

My repro could reliably trigger this issue and the chunk in which the corruption happened was always the same. I therefore assumed this was an out-of-bounds write in Chrome caused by a negative offset, so I tried to determine where in the Chrome code the instruction that corrupted this memory was. To that effect, I set a memory write breakpoint immediately after the chunk was allocated and output the value that was stored there. But when I ran Chrome the breakpoint did not trigger and the value was not changed even though verifier.dll was still reporting corruption when the chunk was freed.

So, I Googled the error code verifier.dll gave me and found that someone had reported a bug in verifier.dll two years ago that looked like it might be what was going on here: the value stored at the "corrupted" location contains the size of the memory allocation, which consists of whole pages (0x1000 bytes in my case). Stored immediately before this location is the size in bytes of the memory chunk. Verifier.dll checks that rounding the size in bytes value up to whole pages is the same as the size in pages value. So, if you allocate 0x5050 bytes, these two values would be 0x5050 and 0x6000, which would pass this test. However, if you reallocate this chunk in place to a smaller size (say 0x4000), page heap does not shrink the memory allocation and only updates the size in bytes. The values will now be 0x4000 and 0x6000, which will fail this test and Verifier.dll incorrectly reports that the header has been corrupted.

To work around this issue, simply updating the size of the memory allocation in pages after a reallocation seems to work. To do so, I created a WinDBG command that sets a breakpoint on the reallocation handling code in verifier.dll. When the breakpoint is hit, it checks if the reallocation will shrink the chunk. If it will, it continues the reallocation function until it returns. The function will return the address of the reallocated memory chunk in @eax. If this address is the same as the one passed in the arguments, the reallocation was in place. In this case, the heap allocation size in the header is updated to prevent triggering this bug. This means that this value is no longer correct in that the actual allocation is larger, but AFAIK this does not cause any problems.

It would more sense to fix this with a shim, but I haven't the time to develop one myself. But please do let me know if you create one!

Here is the WinDBG work around:

bp verifier!AVrfDebugPageHeapReAllocate "r $t0=poi(@esp+0xC);.if (dwo(@$t0-18) > dwo(@esp+10)) {.printf \"Reallocate shrink %X -> %X @ %X\\r\\n\",dwo(@$t0-18),dwo(@esp+10),@$t0;~.gu;.if(@$t0==@eax) {r @$t1=(dwo(@$t0-18)+0xfff)&0xFFFFF000;.printf \"Update page heap data %X->%X\\r\\n\",dwo(@$t0-14),@$t1;ed @$t0-14 @$t1};g}.else{g}"
Update:The original work-around used "gu", which might cause issues in multi-threaded processes. Replacing this with "~.gu" appears to fix the issue, as this prevents other threads from running, and thus interfering with the execution.

If you have a minute to spare, please up-vote this bug to put some pressure on Microsoft to fix it.