PL - you asked “what 64 bit chipset uses 32 bit addressing ? That is by definition not a 64 bit chipset then”
Plenty of systems ship 64-bit processors with 32-bit chipsets. So you’re asking the wrong question. You’re right that no 64-bit chipsets use 32-bit addressing. But that doesn’t stop vendors wiring a 64-bit CPU into a 32-bit chipset. (In just the same way that the presence of a 36-bit address bus on most of the CPUs Intel has shipped since the Pentium Pro hasn’t stopped vendors only providing 32-bit addressing in the chipsets.)
Matti - you said “Not really knowing what the 32-bit Windows can or can not do, probably it does not know how to use memory at above 4 GB physical address,”. Not true. You’ve been able to use more than 4GB on 32-bit versions of the server versions of Windows for years. (Although not all editions supported it.) It was possible even back on Windows 2000. (Maybe even NT 4 - I can’t remember exactly when they brought in this support.)
Incidentally, the ‘double buffering’ DMA issue has been with Windows since NT 3.1 - I know this because I used to write device drivers for NT back then… You had to do it for the various RISC systems (Alpha, MIPS, PPC) because most of those systems did something completely different from the x86: the physical addresses seen by the CPU were, in general, not the same as the physical addresses presented on the peripheral buses like PCI. They had a mapping layer in there of exactly the same kind that you need for a 32-bit PCI device to be able to DMA into a 64-bit address space. Drivers are supposed to use the APIs that work with this mapping layer on all hardware. (The exact details of the mapping layer, and whether it was even present, were dealt with by the HAL - the Hardware Abstraction Layer. On typical pre-PAE x86 hardware, the HAL would provide a ‘do-nothing’ mapping layer.)
Device drivers that are written correctly should just work - you don’t need to do anything different to enable a device on a 32-bit bus to DMA into the higher ranges of a 64-bit physical address space. The Windows NT memory APIs have supported the necessary mechanisms since day 1, and a driver written back in, say, 1995 that used the APIs correctly should work today with 4GB on a /PAE-enabled system.
However, lazy device driver writers may have noticed that the relevant APIs for mapping from CPU physical addresses to bus-specific physical addresses are always a NOP on 32-bit x86 systems, as are the APIs for creating DMA mapping ranges. So if they decide not to bother calling the appropriate APIs those drivers will stop working when you enable /PAE. This has never been ‘correct’ - it’s in clear violation of what the documentation tells you to do. However, it happens to work on the vast majority of x86 systems, so you just know there will be some drivers out there that do this. (I would hope that any that have been through the WHQL certification won’t have this problem, but loads of drivers aren’t certified.)
My understanding is that this is the issue that makes /PAE a potential non-starter even where your motherboard happens to support it. The reason switching to a 64-bit OS makes this a non-problem is that the driver writer will have had no option but to write the driver correctly, whereas if you’re using 32-bit drivers, you have to hope that it was written correctly. And there are a lot of crappy drivers out there.