## ## Original mail thred ## http://lists.samba.org/archive/samba/2005-February/099744.html ## List: samba Subject: [Samba] Large numbers of files in a directory - take #2 :-) From: Jeremy Allison Date: 2005-02-03 19:22:27 Message-ID: <20050203192227.GG4381 () legion ! cup ! hp ! com> [Download message RAW] Ok, second attempt now I'm sure the code is working :-). JohnT - if you want to turn this into a HOWTO or part of the book, be my guest. Remember it'll be in 3.0.12, not 3.0.11 or below. ----------------------------------------------------------- I've been working (inspired by James Peach of SGI) on the problem of using Samba3 with applications that need large numbers of files (100,000 or more) per directory. I think the current code in SVN in the SAMBA_3_0 branch may hold the fix for this problem, so I'd like to request people who need this functionality to give it a try. The key was fixing the directory handling to read only the current list requested instead of the old (up to 3.0.11) behaviour of reading the entire directory into memory before doling out names. Normally this would have broken OS/2 applications which have *very* strange delete semantics :-), but by stealing logic from Samba4 (thanks tridge) I think the current code in SVN handles this correctly. So here's how to set up an application that needs large number of files per directory in a way that doesn't damage performance. Firstly, you need to canonicalize all the files in the directory to have one case, upper or lower - take your pick (I chose upper as all my files were already upper case names). Then set up a new custom share for the application as follows: [bigshare] path = /home/jeremy/tmp/manyfilesdir read only = no case sensitive = True default case = upper preserve case = no short preserve case = no Of course, use your own path and settings, but set the case options to match the case of all the files in your directory. The path should point at the large directory needed for the application - any new files created in there and in any paths under it will be forced by smbd into upper case - but smbd will no longer have to scan the directory for names - it knows that if a file doesn't exist in upper case then it doesn't exist at all. The secret to this is really in the "case sensitive = True" line - it tells smbd never to scan for case-insensitive versions of names. So if an application asks for a file called "FOO", and it can't be found by a simple stat call, then smbd will return file not found immediately without scanning the containing directory for a version of a different case. The other "xxx case xxx" lines make this work by forcing a consistent case on all files created by smbd. Remember, all files and directories under the "path" directory must be in upper case with this smb.conf stanza as smbd won't be able to find lower case filenames with these settings. Also note this is done on a per-share basis, allowing this to be set only for a share servicing an application with this problematic behaviour (using large numbers of entries in a directory) - the rest of your smbd shares don't need to be affected. This makes smbd *much* faster when dealing with large directories. My test case has over 100,000 files and smbd now deals with this very efficiently. So please give this a test if you have problems with Samba and large sized directories. Remember this is in SVN code only, it isn't in the 3.0.11 pre releases or rc candidates, as we need to ensure this new code is correct. If you can help me test it it'll be in 3.0.12 (security problems notwithstanding :-). Cheers, Jeremy.