googleimagesdownload is dead. Long live bingimagesdownload…

Yuichi Fujiki
8 min readMar 2, 2020

How you can download hundreds of images via script at once + include them in your iOS project

Quite some time, I have been using a very handy tool created by Hardik Vasa called googleimagesdownload . This tool allows you to download multiple images from Google at once via command line. The following command will download 10 cat images, for example.

% googleimagesdownload -k cat -l 10

That was… until a month ago unfortunately 😢 (It is Mar 1, 2020 at the moment of writing). Apparently, google has changed the format of Google Image search results page and the scraping became hard. Now the above command mercilessly fails with a single line:

Unfortunately all 10 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!

However, fortunately, a community member has quickly created a similar tool which downloads images from Bing, instead of Google. Today, I will

  • summarise how you can use the new variation of the tool, and
  • also explain how you can edit the downloaded images into iOS friendly format in short steps.

The latter part applies to iOS developers only, but the idea could be applicable to any software development which handles multiple images. Also, the technique is confirmed on Mac OS X only, but should be applicable on any Linux based systems.

Download and use the script

Go to the gist and download the script. Place at a convenient place.

Now you would think you can just use it like

% python ${PATH_TO_THE_SCRIPT}/bing_images_download.py -k cat -l 10

but it is not that easy… (at least as of now)

You need to first go to Bing and kick search.

Search for “cat” in bing.com/images

And then copy the URL from the address bar.

Address bar after the search

Then we can specify the copied URL as a parameter to the script like this:

% python ${PATH_TO_THE_SCRIPT}/bing_images_download.py -u "https://www.bing.com/images/search?q=cat&search=&scope=images&form=QBLH&sp=-1&pq=ca&sc=8-2&qs=n&sk=&cvid=7D17745015D14FA3A9AEB35EE4248448" -l 10

Download more than 100 items

If you want to download more than 100 items, you need to use chromedriver. Download it from here and place at a convenient place. And you can specify this executable with “-cd” option like this :

% python ${PATH_TO_THE_SCRIPT}/bing_images_download.py -u "https://www.bing.com/images/search?q=cat&search=&scope=images&form=QBLH&sp=-1&pq=ca&sc=8-2&qs=n&sk=&cvid=7D17745015D14FA3A9AEB35EE4248448" -l 300 -cd ${PATH_TO_THE_CHROMEDRIVER}

Now, you can get the exact same type of results from Bing as you used to get from Google via googleimagesdownload .

Using the images for development

Ok, the first part is over. Next comes the part to format the file names convenient for iOS development.

The googleimagesdownload and bing_images_download.py both keep the original file names of the downloaded images, just by prepending the sequence number like this:

1.Animals_Cats_Small_cat_005241_.jpg
2.funny-cat-pictures-047-001.jpg
3.Cat-05.jpg
4.o-CAT-ATTACK-facebook.jpg
5.Beautiful+Cats+Hd+Wallpapers_7.jpg
...

This is already good enough for many situations like using them for machine learning training/testing etc. But if you are using these images for software development, in many situations you want to access them via pattern, like

for i in 0..<100 {
let imageName = String(format: "%d", i)
let image = UIImage(named: imageName)
...
}

In order to enable this, we need to tweak the file names a little bit.

Also, the downloaded images are not in the unified dimensions. You can specify same size group in the Bing search, but it will not yield exactly the same dimensions.

Specify the image size in Bing search. This will only define size “groups”, not the exact size.

In many cases where we want to show the images in exact same place within a list or grid, it is most convenient if we had images set of the exact same dimensions.

Rename file names

First, let’s convert the file names from

1.Animals_Cats_Small_cat_005241_.jpg
2.funny-cat-pictures-047-001.jpg
3.Cat-05.jpg
4.o-CAT-ATTACK-facebook.jpg
5.Beautiful+Cats+Hd+Wallpapers_7.jpg
...

to

1.jpg
2.jpg
3.jpg
4.jpg
5.jpg
...

We can achieve this by executing a piped shell script in the directory with images :

ls | sed -E 'p;s/([0-9]*)\..*\.(jpg|jpeg|png|JPG|JPEG|PNG)/\1.\2/' | xargs -n2 m

When executed in the directory of the downloaded images, ls outputs the image names and sed command outputs list of strings after applying substring replacement. The sed command part looks a little bit complex, so let’s break it down.

  • -Eoption tells sed to allow modern regular expression.
  • p; inside single quote means to output original string (means ‘p’re-replacement?). So, with this, it outputs original string before the replaced string.
  • s/XXX/YYY/ part is a standard format in sed . It means if it finds XXX in the original string, it replaces the matched section with YYY. \1 and \2 are called backreferences. They are variables that refer to matched sections denoted by parenthesis (...). So, in this case, \1 matches [0-9]* part and \2 matches jpg|jpeg|png|JPG|JPEG|PNG part.

With this information, hopefully you understand that the first part of the script

ls | sed -E 'p;s/([0-9]*)\..*\.(jpg|jpeg|png|JPG|JPEG|PNG)/\1.\2/'

outputs something like this :

1.Animals_Cats_Small_cat_005241_.jpg
1.jpg
10.cats.jpg
10.jpg
11.shocked_cat.jpg
11.jpg
...

Since handing the list to | xargs -n2 means to take two items at a time, ... | xargs -n2 mv means to generate sequence of following commands :

% mv 1.Animals_Cats_Small_cat_005241_.jpg 1.jpg
% mv 10.cats.jpg 10.jpg
% mv 11.shocked_cat.jpg 11.jpg
...

As a result, we get desired output as

1.jpg
2.jpg
3.jpg
4.jpg
5.jpg
...

Resizing images

In order to manipulate images from command line, you need to install imagemagick. This is a very powerful/defacto tool to edit images from command line. So if you didn’t know about it yesterday, I recommend to start using it from today! You can see the installation instructions for different environment from the website.

After you installed imagemagick , let’s make a directory to output the resized images, because we don’t want to contaminate original images.

% mkdir output

Now, let’s say we wanted to get 80pts*80pts thumbnail images. We can use following command to generate 3x images by:

ls -p | grep -v / | while read file; do convert $file -resize 240x240 -size 240x240 xc:none +swap -gravity center -composite "output/${file/./@3x.}"; done

This is another complex looking shell script if you are not used to it, but let’s look one by one.

ls -p | grep -v / is a tricky way to list files in the directory excluding (sub)directory. -p options outputs list with a/ at the end of a directory, and grep -v / will exclude anything that includes / . So, as a result, we get results excluding directories.

while read file; do ...; done

means to iterate through the loop and set list item in a variable named file. The ... part is the heart of the script where we do image manipulation on each file:

convert $file -resize 240x240 -size 240x240 xc:none +swap -gravity center -composite "output/${file/./@3x.}"

convert command is the imagemagick command to manipulate images. Each option means the following.

  • -resize and -size options are self explanatory, but why do we need both?-resize options is to resize the image itself. -size option is to generate the black/transparent background image of the same size. Because not all images have the same aspect ratio as the specified dimension, we need to apply padding. To achieve the padding, imagemagick generates a separate background image of the specified size.
  • xc:none means to have a transparent background color for the aforementioned background image. With +swap option, it will apply the letter box padding for the area with transparent background. Like this:
There is a letter box on top/bottom where the original image to fulfil the specified dimension
  • -gravity center option represents the location of the original image when we have letter box padding. For example, you get this when specify -gravity north.
With -gravity north option, the foreground image is located at the very top.
  • -composite option is needed. Otherwise, convert will generate background (the black background) and foreground image separately. We don’t want this in most cases.
  • "output/${file/./@3x.}" part is a shell functionality where you can generate a string, based on the specified variable. ${var/str1/str2} means if you find str1 in $var, then replace it with str2 . So, when $file refers to “1.jpg” , "output/${file/./@3x.}" outputs "output/1@3x.jpg

Now, I hope you understand

ls -p | grep -v / | while read file; do convert $file -resize 240x240 -size 240x240 xc:none +swap -gravity center -composite "output/${file/./@3x.}"; done

will output

output/1@3x.jpg
output/2@3x.jpg
output/3@3x.jpg
...

and these files are all 240px * 240px.

The rest is to generate @2x/@1x images as well. You just follow the same pattern with different dimensions and destinations.

ls -p | grep -v / | while read file; do convert $file -resize 160x160 -size 160x160 xc:none +swap -gravity center -composite "output/${file/./@2x.}"; done

and

ls -p | grep -v / | while read file; do convert $file -resize 80x80 -size 80x80 xc:none +swap -gravity center -composite "output/${file}"; done

Copy images into the asset catalog

The rest is easy. Just drag and drop all the images in the output folder to Xcode’s asset catalog! You should automatically see something like this :

Asset catalog after copying all the images (I didn’t generate 1x image for this)

Summary

Steps to use bing_images_download.py:

  1. Go to Bing and do images search
  2. Copy the search string and run the script
% python ${PATH_TO_THE_SCRIPT}/bing_images_download.py -u "https://www.bing.com/images/search?q=cat&search=&scope=images&form=QBLH&sp=-1&pq=ca&sc=8-2&qs=n&sk=&cvid=7D17745015D14FA3A9AEB35EE4248448" -l 10

3. If you need to download more than 100 images, then download chromedriver and specify with option -cd.

% python ${PATH_TO_THE_SCRIPT}/bing_images_download.py -u "https://www.bing.com/images/search?q=cat&search=&scope=images&form=QBLH&sp=-1&pq=ca&sc=8-2&qs=n&sk=&cvid=7D17745015D14FA3A9AEB35EE4248448" -l 300 -cd ${PATH_TO_THE_CHROMEDRIVER}

Steps to reformat the downloaded files:

  1. Rename files to [0-9]*.extension format.
% ls | sed -E 'p;s/([0-9]*)\..*\.(jpg|jpeg|png|JPG|JPEG|PNG)/\1.\2/' | xargs -n2 mv

2. Create output directory for resized images

% mkdir output

3. Run imagemagick to generate resized images

% ls -p | grep -v / | while read file; do convert $file -resize 240x240 -size 240x240 xc:none +swap -gravity center -composite "output/${file/./@3x.}"; done% ls -p | grep -v / | while read file; do convert $file -resize 160x160 -size 160x160 xc:none +swap -gravity center -composite "output/${file/./@2x.}"; done% ls -p | grep -v / | while read file; do convert $file -resize 80x80 -size 80x80 xc:none +swap -gravity center -composite "output/${file}"; done

That’s it, hope you enjoyed!! Happy coding!!

--

--

Yuichi Fujiki

Technical director, Freelance developer, a Dad, a Quadriplegic, Life of Rehab