Socialarcs 400GB of scraped data exposing 200+ million Facebook, Instagram and LinkedIn users. Again.

High-flying and rapidly growing Chinese social media management company Socialarks has suffered a huge data leak leading to the exposure of over 400GB of personal data including several high-profile celebrities and social media influencers.

The company’s unsecured ElasticSearch database contained personally identifiable information (PII) from at least 214 million social media users from around the world, using both populist consumer platforms such as Facebook and Instagram, as well as professional networks such as LinkedIn.

The Elastic instance was discovered as part of Safety Detectives’ cybersecurity mission of discovering online vulnerabilities that could potentially pose risks to the general public.  Once the owner of the data is identified, our team then informs the affected parties as soon as possible to mitigate the risk of any cybersecurity breaches and server leaks.

In Socialarks’ case, our team found the ElasticSearch server to be publicly exposed without password protection or encryption, during routine IP-address checks on potentially unsecured databases.

The lack of security apparatus on the company’s server meant that anyone in possession of the server IP-address could have accessed a database containing millions of people’s private information.

According to Anurag Sen, head of the Safety Detectives cybersecurity team, the affected database contained a “huge trove” of sensitive personal information to the tune of 408GB and more than 318 million records in total.

Given the sheer size of the data leak, it has been severely challenging for our team to unravel the full extent of the potential damage caused.

Our research team was able to determine that the entirety of the leaked data was “scraped” from social media platforms, which is both unethical and a violation of Facebook’s, Instagram’s and LinkedIn’s terms of service.

Moreover, it is important to note that Socialarks suffered a similar data breach in August 2020 leading to data from 150 million LinkedIn, Facebook and Instagram users being exposed.

Almost as a carbon-copy, August’s database breach revealed reams of personal data from 66 million LinkedIn users, 11.6 million Instagram accounts and 81.5 million Facebook accounts.

From the leaked data we discovered, it was possible to determine people’s full names, country of residence, place of work, position, subscriber data and contact information, as well as direct links to their profiles.


The database contained more than 408GB of data and more than 318 million records.

What was leaked?

Without any protection whatsoever, our research team discovered the following:

  • 11,651,162 Instagram user profiles
  • 66,117,839 LinkedIn user profiles
  • 81,551,567 Facebook user profiles
  • a further 55,300,000 Facebook profiles which were summarily deleted within a few hours after our team first discovered the server and its vulnerability.

What was  surprising, that the numbers of profiles affected in the data leak found by our team are the same as the numbers mentioned in the August data leak.  However, there were big differences, such as size of a database, the companies hosting those servers and the amount of indices.

The affected server, hosted by Tencent, was segmented into indices in order to store data obtained from each social media source. Our team discovered records from 3 major social media platforms: Instagram, Facebook and LinkedIn.

Instagram data

The Instagram index contained various popular personalities and online celebrities.

Our team discovered several high-profile influencers in the exposed database, including prominent food bloggers, celebrities and other social media influencers.

Instagram data

Celebrity Instagram profile including phone number and email address.

Every record contained public data scraped from influencer Instagram accounts, including their biographies, profile pictures, follower totals, location settings as well as personal information such as contact details in the form of email addresses and phone numbers.

Instagram data

The Instagram records exposed the following details:

  • Full name
  • Phone numbers for 6+ million users
  • Email addresses for all 11+ million users
  • Profile link
  • Username
  • Profile picture
  • Profile description
  • Average comment count
  • Number of followers and following count
  • Country of location
  • Specific locality in some cases
  • Frequently used hashtags

Facebook data

As mentioned above, the leak exposed 81.5 million Facebook user profiles with over 40 million exposed phone numbers and a further 32 million email address entries. Notably, most of the phone numbers our team discovered originated from pages and not individuals.

The Facebook records exposed the following details:

  • Full name
  • ‘About’ text
  • Email addresses
  • Phone numbers
  • Country of location
  • Like, Follow and Rating count
  • Messenger ID
  • Facebook link with profile pictures
  • Website link
  • Profile description

LinkedIn data

Finally, our team discovered 66.1 million LinkedIn user profiles with as many as 31 million leaked email addresses (not disclosed in the profile but obtained through other, as yet unknown, sources).

The LinkedIn records exposed the following details:

  • Full name
  • Email addresses
  • Job profile including job title and seniority level
  • LinkedIn profile link
  • User tags
  • Domain name
  • Connected social media account login names e.g., Twitter
  • Company name and revenue margin
LinkedIn data

Database search showing 66 million LinkedIn profile results including personal information such as job title, name and email address.

The chart below shows a sample breakdown of user-profiles, sorted by country, from a sample of 42 million records.

LinkedIn data

Unexplained presence of Instagram and LinkedIn personal data

Socialarks’ database contained scraped data including personal information, albeit user data was partially completed.

However, according to our findings, Socialarks’ database stored personal data for Instagram and LinkedIn users such as private phone numbers and email addresses for users that did not divulge such information publicly on their accounts. How Socialarks could possibly have access to such data in the first place remains unknown.

Also, the fact that such a large, active, and data-rich database was left completely unsecured (probably for a second time) is astonishing.

It remains unclear how the company managed to obtain private data from multiple secure sources.

Unexplained presence of Instagram and LinkedIn personal data

Instagram profile showing email and phone number despite information not being provided to Instagram.

It is also worth noting that Socialarks is based in China and was founded with private venture capital in 2014, while the vulnerable server is located in Hong Kong.

Source: Chinese start-up leaked 400GB of scraped data exposing 200+ million Facebook, Instagram and LinkedIn users

Organisational Structures | Technology and Science | Military, IT and Lifestyle consultancy | Social, Broadcast & Cross Media | Flying aircraft