NTFS short file name generation

During the week, I discovered that the product I was working on appeared to slow down dramatically after the dataset reached a certain size (approximately 150,000 items). This was worrying, as we’re aiming to be “enterprise” ready and need to be able to process millions of items. In addition to that, we already had customers far more items than that and yet they had not reported such an issue.

After some in depth investigation involving a profiler, it turned out that our web service was spending >90% of its time on 1 line of code – this line created a file in a directory to store some binary data that was sent from the client. It wasn’t an I/O issue as the disk was hardly thrashing and the next line of code that actually wrote the data to the file was fast, i.e. <0.001% of CPU time.

It turned out that this was due to the way the file system worked. Windows automatically creates a MS-DOS short file name to go with each file. When there are many files in a single directory, the time it takes to generate a unique short file name grows. As we had more than 200,000 files in the directory, it was obvious that the hashing algorithm struggling. This would explain the huge CPU usage and the long file creation time.

To fix this, we disabled short file name generation by bringing up the command prompt and typing in the following command:

fsutil.exe behavior set disable8dot3 1

Microsoft cautions that this could break compatibility with older 16 bit applications that rely on short file names. Given that most machines are 64 bit capable and the days of 32 bit applications are numbered, I think it’s a risk is sufficiently minimal 🙂