Stripping out illegal characters from filenames

5 posts / 0 new
Last post
Offline
Last seen: 5 years 12 months ago
Joined: Dec 20 2003 - 10:38
Posts: 851
Stripping out illegal characters from filenames

Does anybody have a script *that actually works* that I can run on my OSX machine to traverse a hierarchy of folders, find all the files with naughty characters in the filenames that Linux doesn't like and renames the files to remove said naughty characters? I'm trying to copy roughly 40,000 files to my new linux based server via samba and it keeps barfing on bad filenames.

I've already searched for scripts online and none of the ones I've downloaded actually worked.

Alternatively, does anybody know how to set a linux samba server to automatically remove illegal characters when someone tries to copy them onto the server from a remote client? This would be a better solution if possible.

Thanks

Eudimorphodon's picture
Offline
Last seen: 4 months 2 weeks ago
Joined: Dec 21 2003 - 14:14
Posts: 1207
Lovely.

If this is a one-time file transfer would it be the better part of valor to use an alternate file transfer protocol, like rsync-over-ssh, or possibly NFS? (Or NetaTalk, I suppose, but I've found it pretty fragile lately talking to 10.4.)

I'm pretty sure Samba can translate illegal filenames to "safe" combinations when *sharing* a file that already resides on the server, but I don't think you can configure it to accept a file creation request containing bad characters. (Samba imitates Windows by design, and the same filenames would break a Windows machine.)

No denying it would be nice of the SMB filesystem mount shim in OS X would take care of that for you.

--Peace

Offline
Last seen: 5 years 12 months ago
Joined: Dec 20 2003 - 10:38
Posts: 851
For the most part, this is a

For the most part, this is a one time file transfer, but this issue could easily pop up in the future whenever a mac user makes a contribution to my archive.

Also, using another protocol wouldn't work because the characters are illegal for the destination filesystem

Eudimorphodon's picture
Offline
Last seen: 4 months 2 weeks ago
Joined: Dec 21 2003 - 14:14
Posts: 1207
Re: For the most part, this is a

Also, using another protocol wouldn't work because the characters are illegal for the destination filesystem

I'm pretty sure you can use *any* UTF8 character in a Linux filename. The problem is that such characters break shell expansions, pipes, etc, etc.

(Example, try this:

# touch " This is an evil filename:<> ! * # ; ?

Linux is perfectly happy to make it. Try copying it to a Samba server, however, and:

smb: \> mput *
Put file This is an evil filename:<> ! * # ; ?? y
NT_STATUS_OBJECT_NAME_INVALID opening remote file \ This is an evil filename:<>! * # ; ?

This is with smbclient, to show the error, but it doesn't work if you do this either:

# mount -t smbfs -o username=whoever //otherserver/share /mnt
# cp \ This\ is\ an\ evil\ filename\:\<\>\ \!\ \*\ #\ \;\ \? /mnt/
cp: cannot create regular file `/mnt/ This is an evil filename:<> ! * # ; ?': No such file or directory

Woot.)

Unfortunately, pretty much the only ongoing solution is to slap users who try to contribute files with non-Windows-compatible filenames, if samba's your filesharing poison of choice.

For reference, here's my favorite way of transferring huge wads of files between unixoid systems:

# cd (parent directory of what you want to transfer)
# tar -cf - * | ssh root@remotehost "( cd destdir; tar -xpvf - )"

Works like a charm. The only mac file character I think it *might* choke on is embedded carriage returns in a filename.

--Peace

Offline
Last seen: 5 years 12 months ago
Joined: Dec 20 2003 - 10:38
Posts: 851
Well the funny thing is, I'm

Well the funny thing is, I'm actually copying the files from my mac to my linux share with Apple's SMB client and it refuses to copy the files due to illegal filenames.

However, if I get a script that can traverse the directory on my mac and fix the filenames, then I can process the batches I receive from others before uploading. None of my users will have actual write permissions to this archive so I have to touch the files first anyway.

Log in or register to post comments