googleimagesdownload is dead. Long live bingimagesdownload…
How you can download hundreds of images via script at once + include them in your iOS project
Quite some time, I have been using a very handy tool created by Hardik Vasa called googleimagesdownload
. This tool allows you to download multiple images from Google at once via command line. The following command will download 10 cat images, for example.
% googleimagesdownload -k cat -l 10
That was… until a month ago unfortunately 😢 (It is Mar 1, 2020 at the moment of writing). Apparently, google has changed the format of Google Image search results page and the scraping became hard. Now the above command mercilessly fails with a single line:
Unfortunately all 10 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!
However, fortunately, a community member has quickly created a similar tool which downloads images from Bing, instead of Google. Today, I will
- summarise how you can use the new variation of the tool, and
- also explain how you can edit the downloaded images into iOS friendly format in short steps.
The latter part applies to iOS developers only, but the idea could be applicable to any software development which handles multiple images. Also, the technique is confirmed on Mac OS X only, but should be applicable on any Linux based systems.
Download and use the script
Go to the gist and download the script. Place at a convenient place.
Now you would think you can just use it like
% python ${PATH_TO_THE_SCRIPT}/bing_images_download.py -k cat -l 10
but it is not that easy… (at least as of now)
You need to first go to Bing and kick search.
And then copy the URL from the address bar.
Then we can specify the copied URL as a parameter to the script like this:
% python ${PATH_TO_THE_SCRIPT}/bing_images_download.py -u "https://www.bing.com/images/search?q=cat&search=&scope=images&form=QBLH&sp=-1&pq=ca&sc=8-2&qs=n&sk=&cvid=7D17745015D14FA3A9AEB35EE4248448" -l 10
Download more than 100 items
If you want to download more than 100 items, you need to use chromedriver. Download it from here and place at a convenient place. And you can specify this executable with “-cd” option like this :
% python ${PATH_TO_THE_SCRIPT}/bing_images_download.py -u "https://www.bing.com/images/search?q=cat&search=&scope=images&form=QBLH&sp=-1&pq=ca&sc=8-2&qs=n&sk=&cvid=7D17745015D14FA3A9AEB35EE4248448" -l 300 -cd ${PATH_TO_THE_CHROMEDRIVER}
Now, you can get the exact same type of results from Bing as you used to get from Google via googleimagesdownload
.
Using the images for development
Ok, the first part is over. Next comes the part to format the file names convenient for iOS development.
The googleimagesdownload
and bing_images_download.py
both keep the original file names of the downloaded images, just by prepending the sequence number like this:
1.Animals_Cats_Small_cat_005241_.jpg
2.funny-cat-pictures-047-001.jpg
3.Cat-05.jpg
4.o-CAT-ATTACK-facebook.jpg
5.Beautiful+Cats+Hd+Wallpapers_7.jpg
...
This is already good enough for many situations like using them for machine learning training/testing etc. But if you are using these images for software development, in many situations you want to access them via pattern, like
for i in 0..<100 {
let imageName = String(format: "%d", i)
let image = UIImage(named: imageName)
...
}
In order to enable this, we need to tweak the file names a little bit.
Also, the downloaded images are not in the unified dimensions. You can specify same size group in the Bing search, but it will not yield exactly the same dimensions.
In many cases where we want to show the images in exact same place within a list or grid, it is most convenient if we had images set of the exact same dimensions.
Rename file names
First, let’s convert the file names from
1.Animals_Cats_Small_cat_005241_.jpg
2.funny-cat-pictures-047-001.jpg
3.Cat-05.jpg
4.o-CAT-ATTACK-facebook.jpg
5.Beautiful+Cats+Hd+Wallpapers_7.jpg
...
to
1.jpg
2.jpg
3.jpg
4.jpg
5.jpg
...
We can achieve this by executing a piped shell script in the directory with images :
ls | sed -E 'p;s/([0-9]*)\..*\.(jpg|jpeg|png|JPG|JPEG|PNG)/\1.\2/' | xargs -n2 m
When executed in the directory of the downloaded images, ls
outputs the image names and sed
command outputs list of strings after applying substring replacement. The sed
command part looks a little bit complex, so let’s break it down.
-E
option tellssed
to allow modern regular expression.p;
inside single quote means to output original string (means ‘p’re-replacement?). So, with this, it outputs original string before the replaced string.s/XXX/YYY/
part is a standard format insed
. It means if it findsXXX
in the original string, it replaces the matched section withYYY
.\1
and\2
are called backreferences. They are variables that refer to matched sections denoted by parenthesis(...)
. So, in this case,\1
matches[0-9]*
part and\2
matchesjpg|jpeg|png|JPG|JPEG|PNG
part.
With this information, hopefully you understand that the first part of the script
ls | sed -E 'p;s/([0-9]*)\..*\.(jpg|jpeg|png|JPG|JPEG|PNG)/\1.\2/'
outputs something like this :
1.Animals_Cats_Small_cat_005241_.jpg
1.jpg
10.cats.jpg
10.jpg
11.shocked_cat.jpg
11.jpg
...
Since handing the list to | xargs -n2
means to take two items at a time, ... | xargs -n2 mv
means to generate sequence of following commands :
% mv 1.Animals_Cats_Small_cat_005241_.jpg 1.jpg
% mv 10.cats.jpg 10.jpg
% mv 11.shocked_cat.jpg 11.jpg
...
As a result, we get desired output as
1.jpg
2.jpg
3.jpg
4.jpg
5.jpg
...
Resizing images
In order to manipulate images from command line, you need to install imagemagick. This is a very powerful/defacto tool to edit images from command line. So if you didn’t know about it yesterday, I recommend to start using it from today! You can see the installation instructions for different environment from the website.
After you installed imagemagick
, let’s make a directory to output the resized images, because we don’t want to contaminate original images.
% mkdir output
Now, let’s say we wanted to get 80pts*80pts
thumbnail images. We can use following command to generate 3x
images by:
ls -p | grep -v / | while read file; do convert $file -resize 240x240 -size 240x240 xc:none +swap -gravity center -composite "output/${file/./@3x.}"; done
This is another complex looking shell script if you are not used to it, but let’s look one by one.
ls -p | grep -v /
is a tricky way to list files in the directory excluding (sub)directory. -p
options outputs list with a/
at the end of a directory, and grep -v /
will exclude anything that includes /
. So, as a result, we get results excluding directories.
while read file; do ...; done
means to iterate through the loop and set list item in a variable named file
. The ...
part is the heart of the script where we do image manipulation on each file
:
convert $file -resize 240x240 -size 240x240 xc:none +swap -gravity center -composite "output/${file/./@3x.}"
convert
command is the imagemagick
command to manipulate images. Each option means the following.
-resize
and-size
options are self explanatory, but why do we need both?-resize
options is to resize the image itself.-size
option is to generate the black/transparent background image of the same size. Because not all images have the same aspect ratio as the specified dimension, we need to apply padding. To achieve the padding,imagemagick
generates a separate background image of the specified size.xc:none
means to have a transparent background color for the aforementioned background image. With+swap
option, it will apply the letter box padding for the area with transparent background. Like this:
-gravity center
option represents the location of the original image when we have letter box padding. For example, you get this when specify-gravity north
.
-composite
option is needed. Otherwise,convert
will generate background (the black background) and foreground image separately. We don’t want this in most cases."output/${file/./@3x.}"
part is ashell
functionality where you can generate a string, based on the specified variable.${var/str1/str2}
means if you findstr1
in$var
, then replace it withstr2
. So, when$file
refers to “1.jpg” ,"output/${file/./@3x.}"
outputs"output/1@3x.jpg
Now, I hope you understand
ls -p | grep -v / | while read file; do convert $file -resize 240x240 -size 240x240 xc:none +swap -gravity center -composite "output/${file/./@3x.}"; done
will output
output/1@3x.jpg
output/2@3x.jpg
output/3@3x.jpg
...
and these files are all 240px * 240px
.
The rest is to generate @2x/@1x
images as well. You just follow the same pattern with different dimensions and destinations.
ls -p | grep -v / | while read file; do convert $file -resize 160x160 -size 160x160 xc:none +swap -gravity center -composite "output/${file/./@2x.}"; done
and
ls -p | grep -v / | while read file; do convert $file -resize 80x80 -size 80x80 xc:none +swap -gravity center -composite "output/${file}"; done
Copy images into the asset catalog
The rest is easy. Just drag and drop all the images in the output
folder to Xcode’s asset catalog! You should automatically see something like this :
Summary
Steps to use bing_images_download.py
:
- Go to Bing and do images search
- Copy the search string and run the script
% python ${PATH_TO_THE_SCRIPT}/bing_images_download.py -u "https://www.bing.com/images/search?q=cat&search=&scope=images&form=QBLH&sp=-1&pq=ca&sc=8-2&qs=n&sk=&cvid=7D17745015D14FA3A9AEB35EE4248448" -l 10
3. If you need to download more than 100 images, then download chromedriver and specify with option -cd
.
% python ${PATH_TO_THE_SCRIPT}/bing_images_download.py -u "https://www.bing.com/images/search?q=cat&search=&scope=images&form=QBLH&sp=-1&pq=ca&sc=8-2&qs=n&sk=&cvid=7D17745015D14FA3A9AEB35EE4248448" -l 300 -cd ${PATH_TO_THE_CHROMEDRIVER}
Steps to reformat the downloaded files:
- Rename files to
[0-9]*.extension
format.
% ls | sed -E 'p;s/([0-9]*)\..*\.(jpg|jpeg|png|JPG|JPEG|PNG)/\1.\2/' | xargs -n2 mv
2. Create output directory for resized images
% mkdir output
3. Run imagemagick
to generate resized images
% ls -p | grep -v / | while read file; do convert $file -resize 240x240 -size 240x240 xc:none +swap -gravity center -composite "output/${file/./@3x.}"; done% ls -p | grep -v / | while read file; do convert $file -resize 160x160 -size 160x160 xc:none +swap -gravity center -composite "output/${file/./@2x.}"; done% ls -p | grep -v / | while read file; do convert $file -resize 80x80 -size 80x80 xc:none +swap -gravity center -composite "output/${file}"; done
That’s it, hope you enjoyed!! Happy coding!!