| Tapina ( @ 2006-03-23 10:18:00 |
Command-line file searching
Just found an excellent article on using Mac OS X’s Spotlight file index from the command line. Basically, instead of the very slow:
or the not-up-to-date:
Both of these just work by name. If I don’t know for sure that there’s an Oracle in the name then I have to resort to:
That sucks. But on Mac OS X you can just do:
Now, I’ve done a lot of work with Oracle and therefore this still takes a while (5 seconds on a second run)—but the biggest problem is that it returns 942 records, including text files, Excel spreadsheets and e-mail messages. Now I could pipe the results through grep to narrow down the filename but if I know I’m looking for a spreadsheet I can get a lot cleverer (and here’s where find and locate just don’t cut it).
mdfind works with the whole file—not only its content but also metadata added specific to the file format. To that end, I can look for spreadsheets that I wrote. But how to find what the metadata fields are? Well, the best way is by example. So I take an existing Excel file that I know I wrote and run it through mdls:
OK, so If I revise my Spotlight search to look for spreadsheets I created in my home directory which mention Oracle:
Bingo! Which reminds me of another neat feature in Mac OS X—content types.
Not content with MIME types and file extensions and creator and file types, Apple decided to invent yet another classification scheme. But this time, they did a good job. Rather like MIME, it’s hierarchical (but to more than two levels, which is better). Unlike MIME (and like other smart ideas like Java package names) it piggybacks onto DNS registration to avoid centralised registration. Apple manage the public namespace (and the com.apple one) but the rest is up for grabs. From my example before, you can see that an Excel spreadsheet is also a public.data and a public.item content type. My AAC grab of Gorillaz’ DARE is:
And I can use those to search for all MPEG 4 audio, all audio files, etc etc. Very cool.
Is there anything like Spotlight for Linux? Is it part of Darwin’s open source stuff?
Just found an excellent article on using Mac OS X’s Spotlight file index from the command line. Basically, instead of the very slow:
find / -name '*Oracle*'
or the not-up-to-date:
locate Oracle
Both of these just work by name. If I don’t know for sure that there’s an Oracle in the name then I have to resort to:
grep -r Oracle /*
That sucks. But on Mac OS X you can just do:
mdfind Oracle
Now, I’ve done a lot of work with Oracle and therefore this still takes a while (5 seconds on a second run)—but the biggest problem is that it returns 942 records, including text files, Excel spreadsheets and e-mail messages. Now I could pipe the results through grep to narrow down the filename but if I know I’m looking for a spreadsheet I can get a lot cleverer (and here’s where find and locate just don’t cut it).
mdfind works with the whole file—not only its content but also metadata added specific to the file format. To that end, I can look for spreadsheets that I wrote. But how to find what the metadata fields are? Well, the best way is by example. So I take an existing Excel file that I know I wrote and run it through mdls:
kMDItemAttributeChangeDate = 2006-03-21 17:56:41 +0000
kMDItemAuthors = ("Gareth Boden")
kMDItemContentCreationDate = 2006-01-20 07:12:01 +0000
kMDItemContentModificationDate = 2006-03-21 17:56:40 +0000
kMDItemContentType = "com.microsoft.excel.xls"
kMDItemContentTypeTree = ("com.microsoft.excel.xls", "public.data", "public.item")
...
OK, so If I revise my Spotlight search to look for spreadsheets I created in my home directory which mention Oracle:
mdfind -onlyin /Users/gareth "kMDItemAuthors = 'Gareth Boden' && kMDItemContentType = 'com.microsoft.excel.xls' && kMDItemTextContent = 'Oracle'"
Bingo! Which reminds me of another neat feature in Mac OS X—content types.
Not content with MIME types and file extensions and creator and file types, Apple decided to invent yet another classification scheme. But this time, they did a good job. Rather like MIME, it’s hierarchical (but to more than two levels, which is better). Unlike MIME (and like other smart ideas like Java package names) it piggybacks onto DNS registration to avoid centralised registration. Apple manage the public namespace (and the com.apple one) but the rest is up for grabs. From my example before, you can see that an Excel spreadsheet is also a public.data and a public.item content type. My AAC grab of Gorillaz’ DARE is:
"public.mpeg-4-audio",
"public.audio",
"public.audiovisual-content",
"public.data",
"public.item",
"public.content"And I can use those to search for all MPEG 4 audio, all audio files, etc etc. Very cool.
Is there anything like Spotlight for Linux? Is it part of Darwin’s open source stuff?