The 773 Million Record “Collection #1” Data Breach

Collection #1 is a set of email addresses and passwords totalling 2,692,818,238 rows. It’s made up of many different individual data breaches from literally thousands of different sources. (And yes, fellow techies, that’s a sizeable amount more than a 32-bit integer can hold.)

In total, there are 1,160,253,228 unique combinations of email addresses and passwords. This is when treating the password as case sensitive but the email address as not case sensitive. This also includes some junk because hackers being hackers, they don’t always neatly format their data dumps into an easily consumable fashion. (I found a combination of different delimiter types including colons, semicolons, spaces and indeed a combination of different file types such as delimited text files, files containing SQL statements and other compressed archives.)

The unique email addresses totalled 772,904,991. This is the headline you’re seeing as this is the volume of data that has now been loaded into Have I Been Pwned (HIBP). It’s after as much clean-up as I could reasonably do and per the previous paragraph, the source data was presented in a variety of different formats and levels of “cleanliness”. This number makes it the single largest breach ever to be loaded into HIBP.

There are 21,222,975 unique passwords. As with the email addresses, this was after implementing a bunch of rules to do as much clean-up as I could including stripping out passwords that were still in hashed form, ignoring strings that contained control characters and those that were obviously fragments of SQL statements. Regardless of best efforts, the end result is not perfect nor does it need to be. It’ll be 99.x% perfect though and that x% has very little bearing on the practical use of this data. And yes, they’re all now in Pwned Passwords, more on that soon.

That’s the numbers, let’s move onto where the data has actually come from.

Data Origins

Last week, multiple people reached out and directed me to a large collection of files on the popular cloud service, MEGA (the data has since been removed from the service). The collection totalled over 12,000 separate files and more than 87GB of data. One of my contacts pointed me to a popular hacking forum where the data was being socialised, complete with the following image:

As you can see at the top left of the image, the root folder is called “Collection #1” hence the name I’ve given this breach. The expanded folders and file listing give you a bit of a sense of the nature of the data (I’ll come back to the word “combo” later), and as you can see, it’s (allegedly) from many different sources. The post on the forum referenced “a collection of 2000+ dehashed databases and Combos stored by topic” and provided a directory listing of 2,890 of the files which I’ve reproduced here. This gives you a sense of the origins of the data but again, I need to stress “allegedly”. I’ve written before about what’s involved in verifying data breaches and it’s often a non-trivial exercise. Whilst there are many legitimate breaches that I recognise in that list, that’s the extent of my verification efforts and it’s entirely possible that some of them refer to services that haven’t actually been involved in a data breach at all.

However, what I can say is that my own personal data is in there and it’s accurate; right email address and a password I used many years ago. Like many of you reading this, I’ve been in multiple data breaches before which have resulted in my email addresses and yes, my passwords, circulating in public. Fortunately, only passwords that are no longer in use, but I still feel the same sense of dismay that many people reading this will when I see them pop up again. They’re also ones that were stored as cryptographic hashes in the source data breaches (at least the ones that I’ve personally seen and verified), but per the quoted sentence above, the data contains “dehashed” passwords which have been cracked and converted back to plain text. (There’s an entirely different technical discussion about what makes a good hashing algorithm and why the likes of salted SHA1 is as good as useless.) In short, if you’re in this breach, one or more passwords you’ve previously used are floating around for others to see.

So that’s where the data has come from, let me talk about how to assess your own personal exposure.

Japan satellite blasts into space to deliver artificial meteors

A rocket carrying a satellite on a mission to deliver the world’s first artificial meteor shower blasted into space on Friday, Japanese scientists said.

A start-up based in Tokyo developed the micro- for the celestial show over Hiroshima early next year as the initial experiment for what it calls a “ on demand” service.

The satellite is to release tiny balls that glow brightly as they hurtle through the atmosphere, simulating a meteor shower.

It hitched a ride on the small-size Epsilon-4 rocket that was launched from the Uchinoura space centre by the Japan Aerospace Exploration Agency (JAXA) on Friday morning.

[…]

The company ALE Co. Ltd plans to deliver its first out-of-this-world show over Hiroshima in the spring of 2020.

Lena Okajima, CEO of a space technology venture ALE is hoping to deliver shooting stars on demand and choreograph the cosmos

The satellite launched Friday carries 400 tiny balls whose chemical formula is a closely-guarded secret.

That should be enough for 20-30 events, as one shower will involve up to 20 stars, according to the company.

ALE’s satellite, released 500 kilometres (310 miles) above the Earth, will gradually descend to 400 kilometres over the coming year as it orbits the Earth.

Worldwide meteor shower shows

The company plans to launch a second satellite on a private-sector rocket in mid-2019.

ALE says it is targeting “the whole world” with its products and plans to build a stockpile of shooting stars in space that can be delivered across the world.

The annual Perseid meteor shower—seen here over eastern France—is a highlight for sky-watchers

When its two satellites are in orbit, they can be used separately or in tandem, and will be programmed to eject the balls at the right location, speed and direction to put on a show for viewers on the ground.

Tinkering with the ingredients in the balls should mean that it is possible to change the colours they glow, offering the possibility of a multi-coloured flotilla of shooting stars.

Read more at: https://phys.org/news/2019-01-japan-satellite-blasts-space-artificial.html#jCp

Read more at: https://phys.org/news/2019-01-japan-satellite-blasts-space-artificial.html#jCp

Source: Japan satellite blasts into space to deliver artificial meteors

Watch an AI robot program itself to pick things up and push them around

Robots normally need to be programmed in order to get them to perform a particular task, but they can be coaxed into writing the instructions themselves with the help of machine learning, according to research published in Science.

Engineers at Vicarious AI, a robotics startup based in California, USA, have built what they call a “visual cognitive computer” (VCC), a software platform connected to a camera system and a robot gripper. Given a set of visual clues, the VCC writes a short program of instructions to be followed by the robot so it knows how to move its gripper to do simple tasks.

“Humans are good at inferring the concepts conveyed in a pair of images and then applying them in a completely different setting,” the paper states.

“The human-inferred concepts are at a sufficiently high level to be effortlessly applied in situations that look very different, a capacity so natural that it is used by IKEA and LEGO to make language-independent assembly instructions.”

Don’t get your hopes up, however, these robots can’t put your flat-pack table or chair together for you quite yet. But it can do very basic jobs, like moving a block backwards and forwards.

It works like this. First, an input and output image are given to the system. The input image is a jumble of colored objects of various shapes and sizes, and the output image is an ordered arrangement of the objects. For example, the input image could be a number of red blocks and the output image is all the red blocks ordered to form a circle. Think of it a bit like a before and after image.

The VCC works out what commands need to be performed by the robot in order to organise the range of objects before it, based on the ‘before’ to the ‘after’ image. The system is trained to learn what action corresponds to what command using supervised learning.

Dileep George, cofounder of Vicarious, explained to The Register, “up to ten pairs [of images are used] for training, and ten pairs for testing. Most concepts are learned with only about five examples.”

Here’s a diagram of how it works:

vicarious_ai

A: A graph describing the robot’s components. B: The list of commands the VCC can use. Image credit: Vicarious AI

The left hand side is a schematic of all the different parts that control the robot. The visual hierarchy looks at the objects in front of the camera and categorizes them by object shape and colour. The attention controller decides what objects to focus on, whilst the fixation controller directs the robot’s gaze to the objects before the hand controller operates the robot’s arms to move the objects about.

The robot doesn’t need too many training examples to work because there are only 24 commands, listed on the right hand of the diagram, for the VCC controller.

Source: Watch an AI robot program itself to, er, pick things up and push them around • The Register

NL judge says doc’s official warning needs removing from Google

An official warning by the Dutch Doctors guild to a serving doctor needs to be removed from Google’s search result, as the judge says that the privacy of the doctor is more important than the public good that arises from people being warned that this doctor has in some way misbehaved.

As a result of this landmark case, there’s a whole line of doctors requesting to be removed from Google.

Link is in Dutch.

Source: Google moet berispte arts verwijderen uit zoekmachine | TROUW