Mozilla installs Scheduled Telemetry Task on Windows with Firefox 75 – if you had put telemetry on

Observant Firefox users on Windows who have updated the web browser to Firefox 75 may have noticed that the upgrade brought along with it a new scheduled tasks. The scheduled task is also added if Firefox 75 is installed on a Windows device.

The task’s name is Firefox Default Browser Agent and it is set to run once per day. Mozilla published a blog post on the official blog of the organization that provides information on the task and why it has been created.

firefox default browser agent

According to Mozilla, the task has been created to help the organization “understand changes in default browser settings”. At its core, it is a Telemetry task that collects information and sends the data to Mozilla.

Here are the details:

  • The Task is only created if Telemetry is enabled. If Telemetry is set to off (in the most recently used Firefox profile), it is not created and thus no data is sent. The same is true for Enterprise telemetry policies if they are configured. Update: Some users report that the task is created while Telemetry was set to off on their machine.
  • Mozilla collects information “related to the system’s current and previous default browser setting, as w2ell as the operating system locale and version”.
  • Mozilla notes that the data cannot be “associated with regular profile based telemetry data”.
  • The data is sent to Mozilla every 24 hours using the scheduled task.

Mozilla added the file default-browser-agent.exe to the Firefox installation folder on Windows which defaults to C:\Program Files\Mozilla Firefox\.

Firefox users have the following options if they don’t want the data sent to Mozilla:

  • Firefox users who opted-out of Telemetry are good, they don’t need to make any change as the new Telemetry data is not sent to Mozilla; this applies to users who opted-out of Telemetry in Firefox or used Enterprise policies to do so.
  • Firefox users who have Telemetry enabled can either opt-out of Telemetry or deal with the task/executable that is responsible.

Disable the Firefox Default Browser Agent task

firefox-browser agent task disabled

Here is how you disable the task:

  1. Open Start on the Windows machine and type Task Scheduler.
  2. Open the Task Scheduler and go to Task Scheduler Library > Mozilla.
  3. There you should find listed the Firefox Default Browser Agent task.
  4. Right-click on the task and select Disable.
  5. Note: Nightly users may see the Firefox Nightly Default Browser Agent task there as well and may disable it.

The task won’t be executed anymore once it is disabled.

Closing Words

The new Telemetry task is only introduced on Windows and runs only if Telemetry is enabled (which it is by default [NOTE: Is it? I don’t think so! It asks at install!]). Mozilla is transparent about the introduction and while that is good, I’d preferred if the company would have informed users about it in the browser after the upgrade to Firefox 75 or installation of the browser and before the task is executed the first time.

Source: Mozilla installs Scheduled Telemetry Task on Windows with Firefox 75 – gHacks Tech News

Go  to about:telemetry in Firefox to see what it’s collecting. In my case this was none, because when FF was installed it asked me whether I wanted it on or off and I said off.

Creepy Face Recognition Firm Clearview AI Sure Has a Lot of Ties to the Far Right – the extremist kinds of far right

Clearview AI, the dystopian face recognition company that claims to have amassed a database of billions of photos, signed contracts with hundreds of law enforcement agencies, and shopped its app around to the rich and powerful, has extensive links to the far right, according to a Huffington Post investigation. In fact, one of its associates claimed to have been working on a face recognition product explicitly designed to be useful for mass deportations.

Founder Hoan Ton-That’s has links to the far-right movement that move right past suspicious into obvious, according to HuffPo. He reportedly attended a 2016 dinner with white supremacist Richard Spencer and organized by alt-right financier Jeff Giesea, an associate of Palantir founder and Trump-supporting billionaire Peter Thiel. (Thiel secretly bankrolled a lawsuit that bankrupted Gizmodo’s former parent company, Gawker Media.) Ton-That was also a member of a Slack channel run by professional troll Chuck Johnson for his now-defunct WeSearchr, a crowdfunding platform primarily used by white supremacists; that channel included people like the webmaster of neo-Nazi site Daily Stormer, Andrew Auernheimer, and conspiracy theorist Mike Cernovich,

Per HuffPo, in January 2017 Johnson posted on Facebook that he was working on “building algorithms to ID all the illegal immigrants for the deportation squads.” Another source told HuffPo that they had seen him bragging about that work to “a whole bunch of really important people” at Trump’s DC hotel that spring, introducing them to a man the source identified as almost certainly being Ton-That.

Johnson, who was involved with Trump’s transition team, also hit up then-Breitbart employee Katie McHugh, who at that time was a white supremacist but has since left the movement. McHugh told HuffPo that Johnson asked to be put in contact with ghoulish Trump adviser Stephen Miller so he could tout a “way to identify every illegal alien in the country.” (It’s unclear whether that happened, but Clearview’s clients include Immigration and Customs Enforcement and the FBI.) That same year, Thiel invested $200,000 in Clearview.

Smartcheckr’s labor pool also included many ethnonationalists who believe in purging the U.S. of nonwhites, according to HuffPo. One of those was hardcore racist and Johnson associate Tyler Bass, who described himself as an “investigator” doing “remote software testing” for the app and whose LinkedIn posts suggest may have had access to law enforcement data associated with criminal investigations as late as 2018. Bass also claimed to McHugh to have been in attendance at a disastrous far-right rally in Charlottesville, Virginia in 2017, where a neo-Nazi terror attack killed protester Heather Heyer and wounded scores of others.

Another was Douglass Mackey, the overseer of a vast online racist propaganda operation under the moniker “Ricky Vaughn,” had a role as a contract consultant for Smartcheckr. While there, he touted the use of its face recognition tools to anti-Semitic congressional candidate Paul Nehlen for extreme campaign opposition research. (Ton-That told HuffPo that Mackey was only a contractor for three weeks and his offer to Nehlen was unauthorized, though Smartcheckr employees took steps to distance themselves from Mackey after he was outed as “Ricky Vaughn” in 2018.)

There was also Marko Jukic, HuffPo wrote, a Clearview AI employee who marketed its products to police departments and had a history as a prolific contributor to extremist blogs, including a post where he advocated “segregation and separation” of Jews. One of Clearview’s lawyers, Tor Ekeland, is best known for representing far-right provocateurs and racists like Auernheimer.

Johnson appears to have had access to WeSearchr until at least January 2020, when he showed a fellow passenger on a flight to Boston a powerful face recognition app on his phone, according to a BuzzFeed report. In a statement to HuffPo, Ton-That denied that Johnson was an “executive, employee, consultant” or board member of Clearview, though he didn’t clarify whether Johnson holds equity in the company. He also told the site that Clearview has severed ties with Bass and Jukic, claiming he was “shocked by and completely unaware of Marko Jukic’s online writings under a different name.” (Jukic used the same pseudonym to talk with Ton-That on Slack and email that he did in his racist blog posts, HuffPo noted.)

Ton-That also told the site that he grew up on the internet, which “not always served me well” during his upbringing, ad“There was a period when I explored a range of ideas—not out of belief in any of them, but out of a desire to search for self and place in the world. I have finally found it, and the mission to help make America a safer place. To those who have read my words in the Huffington Post article, I deeply apologize for them.”

Clearview built its face recognition database by scraping photos en masse from public social media posts, a practice that is technically legal but could expose it to significant civil liability from rights holders. While scraping is legal, Clearview’s business practices have resulted in cease-and-desists from Silicon Valley giants like Google, and may have run afoul of other laws. The state attorney general of Vermont filed a lawsuit against the company last month alleging violations of the Vermont Consumer Protection Act and a state data broker law, while the AG of New Jersey ordered all police in the state to stop using Clearview products. Canadian privacy commissioners are investigating the company; it is also facing two class action lawsuits, one of which alleges that the company violated Illinois biometrics laws.

Source: Creepy Face Recognition Firm Clearview AI Sure Has a Lot of Ties to the Far Right

Facebook asks users about coronavirus symptoms, releases friendship data to researchers

Facebook Inc said on Monday it would start surveying some U.S. users about their health as part of a Carnegie Mellon University research project aimed at generating “heat maps” of self-reported coronavirus infections.

The social media giant will display a link at the top of users’ News Feeds directing them to the survey, which the researchers say will help them predict where medical resources are needed. Facebook said it may make surveys available to users in other countries too, if the approach is successful.

Alphabet Inc’s Google, Facebook’s rival in mobile advertising, began querying users for the Carnegie Mellon project last month through its Opinion Rewards app, which exchanges responses to surveys from Google and its clients for app store credit.

Facebook said in a blog post that the Carnegie Mellon researchers “won’t share individual survey responses with Facebook, and Facebook won’t share information about who you are with the researchers.”

The company also said it would begin making new categories of data available to epidemiologists through its Disease Prevention Maps program, which is sharing aggregated location data with partners in 40 countries working on COVID-19 response.

Researchers use the data to provide daily updates on how people are moving around in different areas to authorities in those countries, along with officials in a handful of U.S. cities and states.

In addition to location data, the company will begin making available a “social connectedness index” showing the probability that people in different locations are Facebook friends, aggregated at the zip code level.

Laura McGorman, who runs Facebook’s Data for Good program, said the index could be used to assess the economic impact of the new coronavirus, revealing which communities are most likely to get help from neighboring areas and others that may need more targeted support.

New “co-location maps” can similarly reveal the probability that people in one area will come in contact with people in another, Facebook said.

Source: Facebook asks users about coronavirus symptoms, releases friendship data to researchers – Reuters

This might actually be a good way to use all that privacy invading data

Dr. Drew Pinsky Played Down COVID-19, Then Tries To DMCA Away The Evidence

Update: The full video is now back up and it’s even worse than the original clip we posted. It’s unclear if it went back up thanks to YouTube deciding it was fair use, or Pinsky removing the bogus takedown. Either way, watch it here:

Copyright system supporters keep insisting to me that copyright is never used for censorship, and yet over and over again we keep seeing examples that prove that wrong. The latest is Dr. Drew Pinsky, the somewhat infamous doctor and media personality, who has been one of the more vocal people in the media playing down the impact of the coronavirus. In a video that had gone viral on Twitter and YouTube, it showed many, many, many clips of Dr. Drew insisting that COVID-19 was similar to the flu, and that it wouldn’t be that bad. Assuming it hasn’t been taken down due to a bogus copyright claim, you can hopefully see it below:

As you can see, for well over a month, deep into March when it was blatantly obvious how serious COVID-19 was, he was playing down the threat. Beyond incorrectly comparing it to the flu (saying that it’s “way less virulent than the flu” on February 4th — by which time it was clearly way more virulent than the flu in China), he said the headlines should say “way less serious than influenza,” he insisted that the lethality rate was probably around “0.02%” rather than the 2% being reported. On February 7th, he said your probability of “dying from coronavirus — much higher being hit by an asteroid.” He also mocked government officials for telling people to stay home, even at one point in March saying he was “angry” about a “press-induced panic.” On March 16th, the same day that the Bay Area in California shut down, he insisted that if you’re under 65 you have nothing to worry about, saying “it’s just like the flu.” This was not in the distant past. At one point, a caller to his show, again on March 16th, said that because it’s called COVID-19 that means there were at least 18 others of them, and that’s why no one should worry — and Drew appeared to agree, making it appear he didn’t even know that the 19 refers to the year not the number of coronaviruses, and even though there are other coronaviruses out there, this one was way more infectious and deadly, so it doesn’t matter.

To give him a tiny bit of credit, on Saturday, Pinsky posted a series of choppy videos on Twitter in which he flat out said that he was wrong and he was sorry for his earlier statements, and said that he regretted his earlier statements. He also claimed that he signed up to help in California and NY if he was needed. But, even that apology seems weak in the face of what else he said in those videos… and, more importantly, his actions. In terms of what he said, he kept saying that he always said to listen to Dr. Fauci and to listen to your public health officials. Amazingly, at one point in his apology video, he insists that he thinks the real reason why New York got hit so bad is because of hallways and trains. Yet, in the video above, at one point he literally mocks NYC Mayor de Blasio for telling people to avoid crowded trains, saying: “de Blasio told them not to ride the trains! So they’re not riding the trains! So I am! [guffaw] I mean, it’s ridiculous.”

Given that, it’s a bit difficult to take him seriously when he claims that all along he always said to listen to your public officials, when just a few weeks ago he was mocking them. Indeed, as multiple people have pointed out, the issue here isn’t so much that Pinsky was wrong — in the early days, when there wasn’t as much info, lots of people got things wrong about COVID-19 (though Pinsky kept it up way way after most others recognized how serious it was), but that he acted so totally sure about his opinions that this was nothing to worry about. It was the certainty with which he said what he said that was so much of the problem, including deep into it already being a pandemic with local officials warning people to stay home.

But, even worse, just as he was doing the right thing and mostly apologizing… he was trying to hide those earlier clips that made him look so, so, so bad. His organization began sending out DMCA notices. If you went to the original YouTube upload you got this:

That says: “This video is no longer available due to a copyright claim by Drew Pinsky Inc.” Now, some might argue that it was just some clueless staffer working for Dr. Drew sending off bogus DMCAs, or maybe an automated bot… but nope. Drew himself started tweeting nonsense about copyright law at people. I originally linked to that tweet, but sometime on Sunday, after thousands of people — including some of the most famous lawyers in the country — explained to him why it was nonsense, he deleted it. But I kept a screenshot:

That says, amazingly:

Infringing copywrite laws is a crime. Hang onto your retweets. Or erase to be safe.

The wrongness-to-words ratio in that tweet is pretty fucking astounding. First of all, the layup: it’s copyright, Drew, not copywrite. Make sure you know the name of the fucking law you’re abusing to censor someone before tossing it out there. Second, no, infringing copyright is not a crime. Yes, there is such a thing as criminal copyright infringement, but this ain’t it. Someone posting a video of you would be, at best, civil infringement. For it to be criminal, someone would have to be making copies for profit — like running a bootleg DVD factory or something. Someone posting a 2 minute clip of your nonsense is not that.

Most important, however, this isn’t even civil infringement, thanks to fair use. Putting up a 2 minute video showing a dozen or so clips of Drew making an ass of himself is not infringing. It’s classic fair use — especially given the topic at hand.

So it’s really difficult to believe that Drew is really owning up to his mistakes when at the same time he says he’s sorry, he’s actively working to abuse the law to try to silence people from highlighting his previous comments. Also, someone should point him to Lenz v. Universal in which a court said that before sending a takedown, you need to take fair use into consideration. It certainly appears that Drew hasn’t the foggiest idea how copyright law works, so it seems unlikely he considered fair use at all.

I certainly understand that he likely regrets his earlier comments. And I appreciate his willingness to admit that he was wrong. But to really take ownership of your previous errors, you shouldn’t then be working doubletime to try to delete them from the internet and hide them from view. That’s not taking ownership of your mistakes, that’s trying to sweep them under the rug.

Source: Dr. Drew Pinsky Played Down COVID-19, Then Tries To DMCA Away The Evidence | Techdirt

For the past five years, every FBI secret spy court request to snoop on Americans has sucked, says watchdog

Analysis The FBI has not followed internal rules when applying to spy on US citizens for at least five years, according to an extraordinary report [PDF] by the Department of Justice’s inspector general.

The failure to follow so-called Woods Procedures, designed to make sure the FBI’s submissions for secret spying are correct, puts a question mark over more than 700 approved applications to intercept and log every phone call and email made by named individuals.

Under the current system, the Feds apply to the Foreign Intelligence Surveillance Court (FISC), which can then grant the investigative agency extraordinary spying powers. These can also be granted retroactively if the agency needs to move quickly.

Back in 2001, however, a number of FISA warrants were found to have been granted on unverified information, driving the creation of the Woods Procedures, named after the FBI official who drew them up, Michael Woods.

Following a review last year of one of those successful applications that targeted a Trump campaign staffer called Carter Page, the FBI was found to have made “fundamental and serious errors” in its application. Inspector general Michael Horowitz then expanded his review to another 29 applications dated from October 2014 to September 2019 out of a pool of over 700 and found the same problems in every single other case he looked at, pointing to a systemic problem.

As a result, more than five years’ worth of secret spying activities by the US government may be illegitimate. Horowitz found the same “basic and fundamental errors” in every application.

Unaccountable

The FISA Court has long been highlighted by critics as an unaccountable body with extraordinary powers. Except for very rare occasions, only one side – the government – can present its case to the judges and as a result the court has approved almost every application. The process is wide open to abuse, critics have argued, and so it turns out to have been the case.

The Woods Procedures include things like sufficient supporting documentation of any assertions, a second review of any facts and assertions, and a re-verification of facts whenever an extension is applied for. They are a check and balance on power.

“We do not have confidence that the FBI has executed its Woods Procedures in compliance with FBI policy,” the report states.

It says that it couldn’t review files for four of the 29 selected FISA applications because the FBI has not been able to locate them and, in three of these instances, did not know if the files ever existed.

All of the 25 applications reviewed had “inadequately supported facts,” and “FBI and NSD officials we interviewed indicated to us that there were no efforts by the FBI to use existing FBI and NSD oversight mechanisms.”

Ah yeah but it’s all fixed now

Somewhat amazingly, the FBI doesn’t dispute the findings. The inspector general provided his report to the FBI and prosecutors for their feedback, and appended their responses to the report.

Neither the Feds nor the Dept of Justice denies the assertion that the FBI has not followed its own rules. And both argue that recent proposed changes, prompted solely by the inspector general’s previous report and which critics assert do not go far enough, have effectively fixed the issues.

There is no mention in either response or in the inspector general’s report of what the implications are for the hundreds of people that have been subject to secret spying orders that allow federal agents to track everything that person does and says.

But then, there may not be any implications because under the FISA rules, the person subjected to the spying is not informed of the order against them, even when the spying is over. And they are not even entitled to know or see any evidence compiled against them as a result of the spying operation, even if they are charged as a result of the spying.

It is, in short, a sign that the FBI cannot be trusted to follow its own rules even when those rules apply to the most invasive powers it can be given

Source: For the past five years, every FBI secret spy court request to snoop on Americans has sucked, says watchdog • The Register

Amazon says it fired a guy for breaking pandemic rules. Same guy who organized a staff protest over a lack of coronavirus protection

On Monday, Amazon fired Chris Smalls, a worker at its Staten Island, New York, warehouse, who had organized a protest demanding more protection for workers amid the coronavirus outbreak.

Smalls, in a statement, said, “Amazon would rather fire workers than face up to its total failure to do what it should to keep us, our families, and our communities safe. I am outraged and disappointed but I am not shocked. As usual, Amazon would rather sweep a problem under the rug than act to keep workers and working communities safe.”

Amazon spokesperson Kristen Kish denied the firing had anything to do with protected labor activity. “We did not terminate Mr Smalls employment for organizing a 15-person protest,” she said in an emailed statement. “We terminated his employment for putting the health and safety of others at risk and violations of his terms of his employment.”

Strike organizers have disputed Amazon’s attendance figures, claiming about 50 people walked out.

Kish said Smalls had received multiple warnings for violating social distancing guidelines and had been asked to remain home with pay for two weeks because he had been in the proximity of another worker confirmed to have COVID-19. By ignoring that instruction and coming on-site, she said, he was putting colleagues at risk.

Concern about health safety has spread across Amazon’s workforce. Workers at Amazon’s Whole Foods grocery chain on Tuesday staged a sick-out, demanding 2x hazard pay for working in stores where they may be exposed to coronavirus.

The company last month boosted pay for Amazon and Whole Foods hourly employees in the US and Canada by $2 an hour and £2 per hour for employees in the UK during the month of April. And it said it would double its hourly base rate – ranging from $17.50 to $23/hour at JFK8, its Staten Island warehouse – for overtime from March 16, 2020 through May 3, 2020. The company has also offered two weeks of pay for workers quarantined for coronavirus.

Source: Amazon says it fired a guy for breaking pandemic rules. Same guy who organized a staff protest over a lack of coronavirus protection • The Register

A Feature on Zoom Secretly Displayed Data From People’s LinkedIn Profiles

But what many people may not know is that, until Thursday, a data-mining feature on Zoom allowed some participants to surreptitiously have access to LinkedIn profile data about other users — without Zoom asking for their permission during the meeting or even notifying them that someone else was snooping on them.

The undisclosed data mining adds to growing concerns about Zoom’s business practices at a moment when public schools, health providers, employers, fitness trainers, prime ministers and queer dance parties are embracing the platform.

An analysis by The New York Times found that when people signed in to a meeting, Zoom’s software automatically sent their names and email addresses to a company system it used to match them with their LinkedIn profiles.

The data-mining feature was available to Zoom users who subscribed to a LinkedIn service for sales prospecting, called LinkedIn Sales Navigator. Once a Zoom user enabled the feature, that person could quickly and covertly view LinkedIn profile data — like locations, employer names and job titles — for people in the Zoom meeting by clicking on a LinkedIn icon next to their names.

The system did not simply automate the manual process of one user looking up the name of another participant on LinkedIn during a Zoom meeting. In tests conducted last week, The Times found that even when a reporter signed in to a Zoom meeting under pseudonyms — “Anonymous” and “I am not here” — the data-mining tool was able to instantly match him to his LinkedIn profile. In doing so, Zoom disclosed the reporter’s real name to another user, overriding his efforts to keep it private.

Reporters also found that Zoom automatically sent participants’ personal information to its data-mining tool even when no one in a meeting had activated it. This week, for instance, as high school students in Colorado signed in to a mandatory video meeting for a class, Zoom readied the full names and email addresses of at least six students — and their teacher — for possible use by its LinkedIn profile-matching tool, according to a Times analysis of the data traffic that Zoom sent to a student’s account.

The discoveries about Zoom’s data-mining feature echo what users have learned about the surveillance practices of other popular tech platforms over the last few years. The video-meeting platform that has offered a welcome window on American resiliency during the coronavirus — providing a virtual peek into colleagues’ living rooms, classmates’ kitchens and friends’ birthday celebrations — can reveal more about its users than they may realize.

“People don’t know this is happening, and that’s just completely unfair and deceptive,” Josh Golin, the executive director of the Campaign for a Commercial-Free Childhood, a nonprofit group in Boston, said of the data-mining feature. He added that storing the personal details of schoolchildren for nonschool purposes, without alerting them or obtaining a parent’s permission, was particularly troubling.

Source: A Feature on Zoom Secretly Displayed Data From People’s LinkedIn Profiles – The New York Times

Thousands of recorded Zoom Video Calls Left Exposed on Open Web

Thousands of personal Zoom videos have been left viewable on the open Web, highlighting the privacy risks to millions of Americans as they shift many of their personal interactions to video calls in an age of social distancing. From a report: Many of the videos appear to have been recorded through Zoom’s software and saved onto separate online storage space without a password. But because Zoom names every video recording in an identical way, a simple online search can reveal a long stream of videos that anyone can download and watch. Zoom videos are not recorded by default, though call hosts can choose to save them to Zoom servers or their own computers. There’s no indication that live-streamed videos or videos saved onto Zoom’s servers are publicly visible. But many participants in Zoom calls may be surprised to find their faces, voices and personal information exposed because a call host can record a large group call without participants’ consent.

Source: Thousands of Zoom Video Calls Left Exposed on Open Web – Slashdot

NSO Group: Facebook tried to license our spyware to snoop on its own addicts – the same spyware it’s suing us over

NSO Group – sued by Facebook for developing Pegasus spyware that targeted WhatsApp users – this week claimed Facebook tried to license the very same surveillance software to snoop on its own social-media addicts.

The Israeli spyware maker’s CEO Shalev Hulio alleged in a statement [PDF] to a US federal district court that in 2017 he was approached by Facebook reps who wanted to use NSO’s Pegasus technology in Facebook’s controversial Onavo Protect app to track mobile users.

Pegasus is designed to, once installed on a device, harvest its text messages, gather information about its apps, eavesdrop on calls, track its location, and harvest passwords, among other things.

Onavo Protect, acquired by Facebook in 2013, was available for Android and iOS. It used VPN tunneling to wrap users’ internet connections in encryption, shielding their information as it traveled over untrusted and insecure Wi-Fi networks and the like. The iOS version also blocked harmful websites. However, the software blabbed telemetry about its users to Facebook as well as routed connections through Onavo servers, which could monitor people’s online activities. The application was forced out of the Apple iOS store in 2018 for siphoning information about other programs installed on devices, and discontinued in May 2019.

According to the NSO chief exec, Onavo Protect needed more surveillance powers on iOS handhelds, and so Facebook turned to the spyware maker for its technology.

“The Facebook representatives stated that Facebook was concerned that its method for gathering user data through Onavo Protect was less effective on Apple devices than on Android devices,” Hulio alleged.

“The Facebook representatives also stated that Facebook wanted to use purported capabilities of Pegasus to monitor users on Apple devices and were willing to pay for the ability to monitor Onavo Protect users.”

Because NSO only sells to governments and not private companies, Hulio claimed, he turned down the Facebook licensing offer.

Facebook, in a statement to The Register, characterized the allegations as a distraction from its legal battle against NSO, which kicked off in October 2019. The web giant claims NSO, working on behalf of its customers, illegally hacked targets via security vulnerabilities in Facebook-owned WhatsApp’s code to install Pegasus on devices.

“NSO is trying to distract from the facts Facebook and WhatsApp filed in court nearly six months ago. Their attempt to avoid responsibility includes inaccurate representations about both their spyware and a discussion with people who work at Facebook,” a Facebook spokesperson said.

“Our lawsuit describes how NSO is responsible for attacking over 100 human rights activists and journalists around the world. NSO CEO Shalev Hulio has admitted his company can attack devices without a user knowing and he can see who has been targeted with Pegasus. We look forward to proving our case against NSO in court and seeking accountability for their actions.”

The case has been unusual from the start, with Facebook filing suit after first deleting NSO workers’ personal Facebook accounts. The spyware maker then missed its scheduled court appearance because, it was alleged, Facebook did not properly serve its paperwork.

NSO reckons Facebook’s accusations are baseless because it only sells its software to government departments and agencies, and does not operate the tools itself. Thus, we’re told, it didn’t hack anyone itself, and it cannot be held accountable for the actions of its customers. NSO also noted it only deals with governments allowed under Israeli export laws.

Further, NSO contended the court, in Oakland, California, does not have jurisdiction to hear this case due to America’s Foreign Sovereign Immunity Act, and it argued that the actions described in the lawsuit wouldn’t even run afoul of its spyware’s terms of service

Source: NSO Group: Facebook tried to license our spyware to snoop on its own addicts – the same spyware it’s suing us over • The Register

Someone Convinced Google To Delist Our Entire Right To Be Forgotten Tag In The EU For Searches On Their Name, which means we can’t tell if they are abusing the system

The very fact that the tag being delisted when searching for this unnamed individual is the “right to be forgotten” tag shows that whoever this person is, they recognize that they are not trying to cover up the record of, say, an FTC case against them from… oh, let’s just say 2003… but rather are now trying to cover up their current effort to abuse the right to be forgotten process.

Anyway, in theory (purely in theory, of course) if someone in the EU searched for the name of anyone, it might be helpful to know if the Director of the FTC’s Bureau of Consumer Protection once called him a “spam scammer” who “conned consumers in two ways.” But, apparently, in the EU, that sort of information is no longer useful. And you also can’t find out that he’s been using the right to be forgotten process to further cover his tracks. That seems unfortunate, and entirely against the supposed principle behind the “right to be forgotten.” No one is trying to violate anyone’s “privacy” here. We’re talking about public court records, and an FTC complaint and later settlement on a fairly serious crime that took place not all that long ago. That ain’t private information. And, even more to the point, the much more recent efforts by that individual to then hide all the details of this public record.

Source: Someone Convinced Google To Delist Our Entire Right To Be Forgotten Tag In The EU For Searches On Their Name | Techdirt

US Officials Use Mobile Ad Location Data to Study How COVID-19 Spreads, not cellphone tower data

Government officials across the U.S. are using location data from millions of cellphones in a bid to better understand the movements of Americans during the coronavirus pandemic and how they may be affecting the spread of the disease…

The data comes from the mobile advertising industry rather than cellphone carriers. The aim is to create a portal for federal, state and local officials that contains geolocation data in what could be as many as 500 cities across the U.S., one of the people said, to help plan the epidemic response… It shows which retail establishments, parks and other public spaces are still drawing crowds that could risk accelerating the transmission of the virus, according to people familiar with the matter… The data can also reveal general levels of compliance with stay-at-home or shelter-in-place orders, according to experts inside and outside government, and help measure the pandemic’s economic impact by revealing the drop-off in retail customers at stores, decreases in automobile miles driven and other economic metrics.

The CDC has started to get analyses based on location data through through an ad hoc coalition of tech companies and data providers — all working in conjunction with the White House and others in government, people said.

The CDC and the White House didn’t respond to requests for comment.
It’s the cellphone carriers turning over pandemic-fighting data in Germany, Austria, Spain, Belgium, the U.K., according to the article, while Israel mapped infections using its intelligence agencies’ antiterrorism phone-tracking. But so far in the U.S., “the data being used has largely been drawn from the advertising industry.

“The mobile marketing industry has billions of geographic data points on hundreds of millions of U.S. cell mobile devices…”

Source: US Officials Use Mobile Ad Location Data to Study How COVID-19 Spreads – Slashdot

I am unsure if this says more about the legality of the move or the technical decentralisation of cell phone tower data making it technically difficult to track the whole population

Israel uses anti-terrorist tech to monitor phones of virus patients

Israel has long been known for its use of technology to track the movements of Palestinian militants. Now, Prime Minister Benjamin Netanyahu wants to use similar technology to stop the movement of the coronavirus.

Netanyahu’s Cabinet on Sunday authorized the Shin Bet security agency to use its phone-snooping tactics on coronavirus patients, an official confirmed, despite concerns from civil-liberties advocates that the practice would raise serious privacy issues. The official spoke on condition of anonymity pending an official announcement.

Netanyahu announced his plan in a televised address late Saturday, telling the nation that the drastic steps would protect the public’s health, though it would also “entail a certain degree of violation of privacy.”

Israel has identified more than 200 cases of the coronavirus. Based on interviews with these patients about their movements, health officials have put out public advisories ordering tens of thousands of people who may have come into contact with them into protective home quarantine.

The new plan would use mobile-phone tracking technology to give a far more precise history of an infected person’s movements before they were diagnosed and identify people who might have been exposed.

In his address, Netanyahu acknowledged the technology had never been used on civilians. But he said the unprecedented health threat posed by the virus justified its use. For most people, the coronavirus causes only mild or moderate symptoms. But for some, especially older adults and people with existing health problems, it can cause more severe illness.

“They are not minor measures. They entail a certain degree of violation of the privacy of those same people, who we will check to see whom they came into contact with while sick and what preceded that. This is an effective tool for locating the virus,” Netanyahu said.

The proposal sparked a heated debate over the use of sensitive security technology, who would have access to the information and what exactly would be done with it.

Nitzan Horowitz, leader of the liberal opposition party Meretz, said that tracking citizens “using databases and sophisticated technological means are liable to result in a severe violation of privacy and basic civil liberties.” He said any use of the technology must be supervised, with “clear rules” for the use of the information.

Netanyahu led a series of discussions Sunday with security and health officials to discuss the matter. Responding to privacy concerns, he said late Sunday he had ordered a number of changes in the plan, including reducing the scope of data that would be gathered and limiting the number of people who could see the information, to protect against misuse.

Source: Israel takes step toward monitoring phones of virus patients – ABC News

What I’m missing is a maximum duration for these powers to be used.

Zoom Removes Code That Sends Data to Facebook – but there is still plenty of nasty stuff in there

On Friday video-conferencing software Zoom issued an update to its iOS app which stops it sending certain pieces of data to Facebook. The move comes after a Motherboard analysis of the app found it sent information such as when a user opened the app, their timezone, city, and device details to the social network giant.

When Motherboard analyzed the app, Zoom’s privacy policy did not make the data transfer to Facebook clear.

“Zoom takes its users’ privacy extremely seriously. We originally implemented the ‘Login with Facebook’ feature using the Facebook SDK in order to provide our users with another convenient way to access our platform. However, we were recently made aware that the Facebook SDK was collecting unnecessary device data,” Zoom told Motherboard in a statement on Friday.

Source: Zoom Removes Code That Sends Data to Facebook – VICE

But there is still pleny of data being hoovered up by Zoom:
Yeah, that Zoom app you’re trusting with work chatter? It lives with ‘vampires feeding on the blood of human data’

Yeah, that Zoom app you’re trusting with work chatter? It lives with ‘vampires feeding on the blood of human data’

As the global coronavirus pandemic pushes the popularity of videoconferencing app Zoom to new heights, one web veteran has sounded the alarm over its “creepily chummy” relationship with tracking-based advertisers.

Doc Searls, co-author of the influential internet marketing book The Cluetrain Manifesto last century, today warned [cached] Zoom not only has the right to extract data from its users and their meetings, it can work with Google and other ad networks to turn this personal information into targeted ads that follow them across the web.

This personal info includes, and is not limited to, names, addresses and any other identifying data, job titles and employers, Facebook profiles, and device specifications. Crucially, it also includes “the content contained in cloud recordings, and instant messages, files, whiteboards … shared while using the service.”

Searls said reports outlining how Zoom was collecting and sharing user data with advertisers, marketers, and other companies, prompted him to pore over the software maker’s privacy policy to see how it processes calls, messages, and transcripts.

And he concluded: “Zoom is in the advertising business, and in the worst end of it: the one that lives off harvested personal data.

“What makes this extra creepy is that Zoom is in a position to gather plenty of personal data, some of it very intimate (for example with a shrink talking to a patient) without anyone in the conversation knowing about it. (Unless, of course, they see an ad somewhere that looks like it was informed by a private conversation on Zoom.)”

The privacy policy, as of March 18, lumps together a lot of different types of personal information, from contact details to meeting contents, and says this info may be used, one way or another, to personalize web ads to suit your interests.

“Zoom does use certain standard advertising tools which require personal data,” the fine-print states. “We use these tools to help us improve your advertising experience (such as serving advertisements on our behalf across the internet, serving personalized ads on our website, and providing analytics services) … For example, Google may use this data to improve its advertising services for all companies who use their services.”

Searls, a former Harvard Berkman Fellow, said netizens are likely unaware their information could be harvested from their Zoom accounts and video conferences for advertising and tracking across the internet: “A person whose personal data is being shed on Zoom doesn’t know that’s happening because Zoom doesn’t tell them. There’s no red light, like the one you see when a session is being recorded.

“Nobody goes to Zoom for an ‘advertising experience,’ personalized or not. And nobody wants ads aimed at their eyeballs elsewhere on the ‘net by third parties using personal information leaked out through Zoom.”

Speaking of Zoom…

Zoom’s iOS app sent analytics data to Facebook even if you didn’t use Facebook, due to the application’s use of the social network’s Graph API, Vice discovered. The privacy policy stated the software collects profile information when a Facebook account is used to sign into Zoom, though it didn’t say anything about what happens if you don’t use Facebook. Zoom has since corrected its code to not send analytics in these circumstances.

It should go without saying but don’t share your Zoom meeting ID and password in public, such as on social media, as miscreants will spot it, hijack it, and bomb it with garbage. And don’t forget to set a strong password, too. Zoom had to beef up its meeting security after Check Point found a bunch of weaknesses, such as the fact it was easy to guess or brute-force meeting IDs.

Source: Yeah, that Zoom app you’re trusting with work chatter? It lives with ‘vampires feeding on the blood of human data’ • The Register

Android Apps Are Transmitting what other apps you have ever installed to marketing peole

At this point we’re all familiar with apps of all sorts tracking our every move and sharing that info with pretty much every third party imaginable. But it actually may not be as simple as tracking where you go and what you do in an app: It turns out that these apps might be dropping details about the other programs you’ve installed on your phone, too.

This news comes courtesy of a new paper out from a team of European researchers who found that some of the most popular apps in the Google Play store were bundled with certain bits of software that pull details of any apps that were ever downloaded onto a person’s phone.

Before you immediately chuck your Android device out the window in some combination of fear and disgust, we need to clarify a few things. First, these bits of software—called IAMs, or “installed application methods”—have some decent uses. A photography app might need to check the surrounding environment to make sure you have a camera installed somewhere on your phone. If another app immediately glitches out in the presence of an on-phone camera, knowing the environment—and the reason for that glitch—can help a developer know which part of his app to tinker with to keep that from happening in the future.

Because these IAM-specific calls are technically for debugging purposes, they generally don’t need to secure permissions the same way an app usually would when, say, asking for your location. Android devices have actually gotten better about clamping down on that form of invasive tracking after struggling with it for years, recently announcing that the Android 11 formally requiring that devs apply for location permissions access before Google grants it.

But at the same time, surveying the apps on a given phone can go the invasive route very easily: The apps we download can tip developers off about our incomes, our sexualities, and some of our deepest fears.

The research team found that, of the roughly 4,200 commercial apps it surveyed making these IAM calls, almost half were strictly grabbing details on the surrounding apps. For context, most other calls—which were for monitoring details about the app like available updates, or the current app version—together made up less than one percent of all calls they observed.

There are a few reasons for the prevalence of this errant app-sniffing behavior, but for the most part it boils down to one thing: money. A lot of these IAMs come from apps that are on-boarding software from adtech companies offering developers an easy way to make quick cash off their free product. That’s probably why the lion’s share—more than 83%—of these calls were being made on behalf of third-party code that the dev onboarded for their commercially available app, rather than code that was baked into that app by design.

And for the most part, these third parties are—as you might have suspected—companies that specialize in targeted advertising. Looking over the top 20 libraries that pull some kind of data via IAMs, some of the top contenders, like ironSource or AppNext, are in the business of getting the right ads in front of the right player at the right time, offering the developer the right price for their effort.

And because app developers—like most people in the publishing space—are often hard-up for cash, they’ll onboard these money-making tools without asking how they make that money in the first place. This kind of daisy-chaining is the same reason we see trackers of every shape and size running across every site in the modern ecosystem, at times without the people actually behind the site having any idea.

Source: Android Apps May Be Snooping on You More Than You Realize

Ring corporate surveillance doorbells Continues To Insist Its Cameras Reduce Crime, But Crime Data Doesn’t Back Those Claims Up

Despite evidence to the contrary, Amazon’s Ring is still insisting its the best thing people can put on their front doors — an IoT camera with PD hookups that will magically reduce crime in their neighborhoods simply by being a mute witness of criminal acts.

Boasting over 1,000 law enforcement partnerships, Ring talks a good game about crime reduction, but its products haven’t proven to be any better than those offered by competitors — cameras that don’t come with law enforcement strings attached.

Last month, Cyrus Farivar undid a bit of Ring’s PR song-and-dance by using public records requests and conversations with law enforcement agencies to show any claim Ring makes about crime reduction probably (and in some cases definitely) can’t be linked to the presence of Ring’s doorbell cameras.

CNET has done the same thing and come to the same conclusion: the deployment of Ring cameras rarely results in any notable change in property crime rates. That runs contrary to the talking points deployed by Dave Limp — Amazon’s hardware chief — who “believes” adding Rings to neighborhoods makes neighborhoods safer. Limp needs to keep hedging.

CNET obtained property-crime statistics from three of Ring’s earliest police partners, examining the monthly theft rates from the 12 months before those partners signed up to work with the company, and the 12 months after the relationships began, and found minimal impact from the technology.

The data shows that crime continued to fluctuate, and analysts said that while many factors affect crime rates, such as demographics, median income and weather, Ring’s technology likely wasn’t one of them.

Worse for Ring — which has used its partnerships with law enforcement agencies to corner the market for doorbell cameras — law enforcement agencies are saying the same thing: Ring isn’t having any measurable impact on crime.

“In 2019, we saw a 6% decrease in property crime,” said Kevin Warych, police patrol commander in Green Bay, Wisconsin, but he noted, “there’s no causation with the Ring partnership.”

[…]

“I can’t put numbers on it specifically, if it works or if it doesn’t reduce crime,” [Aurora PD public information officer Paris] Lewbel said.

But maybe it doesn’t really matter to Ring if law enforcement agencies believe the crime reduction sales pitch. What ultimately matters is that end users might. After all, these cameras are installed on homes, not police departments. As long as potential customers believe crime in their area (or at least their front doorstep) will be reduced by the presence of camera, Ring can continue to increase market share.

But the spin is, at best, inaccurate. Crime rates in cities where Ring has partnered with law enforcement agencies continue to fluctuate. Meanwhile, Ring has fortuitously begun its mass deployment during a time of historically-low crime rates which have dropped steadily for more than 20 years. Hitting the market when things are good and keep getting better makes for pretty good PR, especially when company reps are willing to convert correlation to causation to sell devices.

Source: Ring Continues To Insist Its Cameras Reduce Crime, But Crime Data Doesn’t Back Those Claims Up | Techdirt

After 450 years, the tiny feudal Channel island of Sark will finally earn the right to exist on the internet with a domain

The island of Sark, a United Kingdom royal fiefdom located in the Channel Islands and measuring just two square miles (517 hectares), has succeeded in its 20-year quest to be officially recognized by the International Standards Organization (ISO).

The decision will lead to creation of a new two-letter code for the island and an addition to the internet’s country codes: the .sk code is already taken by Slovakia so Sark may end up with .cq form in reference to the original Norman dialect spelling of the island – Sercq.

That’s something that Sark has been desperate to achieve thanks to the ever-growing impact of the internet on modern life. “In today’s connected world, business and personal matters are increasingly transacted online,” reads a quote at the start of the 54-page submission [PDF] to the ISO, written by the secretary of the group that has spent 21 years trying to make recognition a reality.

“In such a world, it makes it even more important for a small island like ours to have the ability to promote and protect its identity,” Conseiller Nicolas Moloney states.

Even though Sark controls its own budget, taxation, waters, medical register, vehicle registration, licensing, legislature and fishing rights, it doesn’t exist online. Instead everything is currently routed through nearby island of Guernsey, since Sark is officially part of the Bailiwick of Guernsey and has been since 1204 (it’s historically complicated). Guernsey is a 45-minute boat ride away, with its own .gg notation.

With every online form in the world using the ISO’s 3166 list to populate its dropdown list of territories, if you aren’t on that list, you effectively don’t exist on the internet. For an island strongly dependent on tourism, that is a major problem. “Our future depends on this and we therefore request support for our identity so we can be recognised correctly by the world,” its petition reads.

Banking, shipping addresses of goods bought on the internet and geographical identity for trade, tourism and travel are all largely dependent these days on having a unique online identifier. Without it, Sark faced an existential threat.

A determined no

But despite the full backing the UK government, reams of evidence of its autonomy, the European Court of Human Rights specifically recognizing Sark as a dependent territory, and Sark’s application fulfilling every criteria necessary to get on to the official ISO-3166 list, it has gone back and forth with the committee that decides the list for 21 years. At one point the committee even changed its own rules to prevent Sark from being recognized.

In the end, the man behind the push, Register reader Mike Locke, realized that they were never going to get anywhere by going to the same committee over and over again and went above their heads. A meeting of the ISO’s Technical Management Board, in Oslo, Norway, at the end of February heard Sark’s appeal [PDF], presented by the UK government’s British Standards Institution (BSI). Its decision was only announced late on Thursday last week. It reads [PDF, resolution 15]:

Noting the appeal received by BSI on 12 August 2018 against the ISO 3166/MA decision on the Sark request for an alpha code, and having reviewed the process and criteria for assignment of codes, and
Noting that there are islands that are not member states of the UN but have been assigned a code,
Supports the request from Sark, and
Requests the ISO3166/MA to assign Sark the requested code.

On Sark itself, the committee that has spent innumerable hours since 1999 trying to get approval proudly told the Chief Pleas (the parliament of Sark), that: “After much hard work both on and off island the Special Committee for the Top Level Domain is very pleased to announce that the ISO Technical Board has accepted the application and recommended approval of a Country Code for Sark and inclusion on the ISO 3166 Standard.”

Shortly after, the island went into a lockdown over the novel coronavirus.

Source: After 450 years, the tiny feudal Channel island of Sark will finally earn the right to exist on the internet with a domain • The Register

The rest of the story is a bizarre tale of the ISO committee refusing to change an inane decision again and again and again.

HP printers try to send loads of data back to HP about your devices and what you print

NB you can disable outgoing communication in the public network using windows defender by using the instructions here (HP).

They come down to opening windows defender firewall, allowing an app or feature through windows defender firewall, searching for HP and then deselecting the public zone.

At first the setup process was so simple that even a computer programmer could do it. But then, after I had finished removing pieces of cardboard and blue tape from the various drawers of the machine, I noticed that the final step required the downloading of an app of some sort onto a phone or computer. This set off my crapware detector.

It’s possible that I was being too cynical. I suppose that it was theoretically possible that the app could have been a thoughtfully-constructed wizard, which did nothing more than gently guide non-technical users through the sometimes-harrowing process of installing and testing printer drivers. It was at least conceivable that it could then quietly uninstall itself, satisfied with a simple job well done.

Of course, in reality it was a way to try and get people to sign up for expensive ink subscriptions and/or hand over their email addresses, plus something even more nefarious that we’ll talk about shortly (there were also some instructions for how to download a printer driver tacked onto the end). This was a shame, but not unexpected. I’m sure that the HP ink department is saddled with aggressive sales quotas, and no doubt the only way to hit them is to ruthlessly exploit people who don’t know that third-party cartridges are just as good as HP’s and are much cheaper. Fortunately, the careful user can still emerge unscathed from this phase of the setup process by gingerly navigating the UI patterns that presumably do fool some people who aren’t paying attention.

But it is only then, once the user has found the combination of “Next” and “Cancel” buttons that lead out of the swamp of hard sells and bad deals, that they are confronted with their biggest test: the “Data Collection Notice & Settings”.

In summary, HP wants its printer to collect all kinds of data that a reasonable person would never expect it to. This includes metadata about your devices, as well as information about all the documents that you print, including timestamps, number of pages, and the application doing the printing (HP state that they do stop short of looking at the contents of your documents). From the HP privacy policy, linked to from the setup program:

Product Usage Data – We collect product usage data such as pages printed, print mode, media used, ink or toner brand, file type printed (.pdf, .jpg, etc.), application used for printing (Word, Excel, Adobe Photoshop, etc.), file size, time stamp, and usage and status of other printer supplies. We do not scan or collect the content of any file or information that might be displayed by an application.

Device Data – We collect information about your computer, printer and/or device such as operating system, firmware, amount of memory, region, language, time zone, model number, first start date, age of device, device manufacture date, browser version, device manufacturer, connection port, warranty status, unique device identifiers, advertising identifiers and additional technical information that varies by product.

HP wants to use the data they collect for a wide range of purposes, the most eyebrow-raising of which is for serving advertising. Note the last column in this “Privacy Matrix”, which states that “Product Usage Data” and “Device Data” (amongst many other types of data) are collected and shared with “service providers” for purposes of advertising.

HP delicately balances short-term profits with reasonable-man-ethics by only half-obscuring the checkboxes and language in this part of the setup.

At this point everything has become clear – the job of this setup app is not only to sell expensive ink subscriptions; it’s also to collect what apparently passes for informed consent in a court of law. I clicked the boxes to indicate “Jesus Christ no, obviously not, why would anyone ever knowingly consent to that”, and then spent 5 minutes Googling how to make sure that this setting was disabled. My research suggests that it’s controlled by an item in the settings menu of the printer itself labelled “Store anonymous usage information”. However, I don’t think any reasonable person would think that the meaning of “Store anonymous usage information” includes “send analytics data back to HP’s servers so that it can be used for targeted advertising”, so either HP is being deliberately coy or there’s another option that disables sending your data that I haven’t found yet.

I bet there’s also a vigorous debate to be had over whether HP’s definition of “anonymous” is the same as mine.


I imagine that a user’s data is exfiltrated back to HP by the printer itself, rather than any client-side software. Once HP has a user’s data then I don’t know what they do with it. Maybe if they can see that you are printing documents from Photoshop then they can send you spam for photo paper? I also don’t know anything about how much a user’s data is worth. My guess is that it’s depressingly little. I’d almost prefer it if HP was snatching highly valuable information that was worth making a high-risk, high-reward play for. But I can’t help but feel like they’re just grabbing whatever data is lying around because they might as well, it might be worth a few cents, and they (correctly) don’t anticipate any real risk to their reputation and bottom line from doing so.

Recommended for who?

Source: HP printers try to send data back to HP about your devices and what you print | Robert Heaton

NASA makes their entire media library publicly accessible and copyright free

No matter if you enjoy taking or just watching images of space, NASA has a treat for you. They have made their entire collection of images, sounds, and video available and publicly searchable online. It’s 140,000 photos and other resources available for you to see, or even download and use it any way you like.

You can type in the term you want to search for and browse through the database of stunning images of outer space. Additionally, there are also images of astronauts, rocket launches, events at NASA and other interesting stuff. What’s also interesting is that almost every image comes with the EXIF data, which could be useful for astrophotography enthusiasts.

When you browse through the gallery, you can choose to see images, videos or audio. Another cool feature I noticed is that you can narrow down the results by the year. Of course, I used some of my time today to browse through the gallery, and here are some of the space photos you can find:

What I love about NASA is that they make interesting content for average Internet users. They make us feel closer and more familiar with their work and with the secrets of the outer space. For instance, they recently launched a GIPHY account full of awesome animated gifs. It’s also great that photography is an important part of their missions, and so it was even before “pics or it didn’t happen” became the rule. The vast media library they have now published is available to everyone, free of charge and free of copyright. Therefore, you can take a peek at the fascinating mysteries of space, check out what it’s like inside NASA’s premises, or download the images to make something awesome from them. Either way, you’ll enjoy it.

[NASA Image and Video Gallery via SLR Lounge; Credit: NASA/JPL-Caltech]

Source: NASA makes their entire media library publicly accessible and copyright free – DIY Photography

Private By Design: Free and Private Voice Assistants

Science fiction has whetted our imagination for helpful voice assistants. Whether it’s JARVIS from Iron Man, KITT from Knight Rider, or Computer from Star Trek, many of us harbor a desire for a voice assistant to manage the minutiae of our daily lives. Speech recognition and voice technologies have advanced rapidly in recent years, particularly with the adoption of Siri, Alexa, and Google Home.

However, many in the maker community are concerned — rightly — about the privacy implications of using commercial solutions. Just how much data do you give away every time you speak with a proprietary voice assistant? Just what are they storing in the cloud? What free, private, and open source options are available? Is it possible to have a voice stack that doesn’t share data across the internet?

Yes, it is. In this article, I’ll walk you through the options.

WHAT’S IN A VOICE STACK?

Some voice assistants offer a whole stack of software, but you may prefer to pick and choose which layers to use.

» WAKE WORD SPOTTER — This layer is constantly listening until it hears the wake word or hot word, at which point it will activate the speech-to-text layer. “Alexa,” “Jarvis,” and “OK Google” are wake words you may know.

» SPEECH TO TEXT (STT) — Also called automatic speech recognition (ASR). Once activated by the wake word, the job of the STT layer is just that: to recognize what you’re saying and turn it into written form. Your spoken phrase is called an utterance.

» INTENT PARSER — Also called natural language processing (NLP) or natural language understanding (NLU). The job of this layer is to take the text from STT and determine what action you would like to take. It often does this by recognizing entities — such as a time, date, or object — in the utterance.

» SKILL — Once the intent parser has determined what you’d like to do, an application or handler is triggered. This is usually called a skill or application. The computer may also create a reply in human-readable language, using natural language generation (NLG).

» TEXT TO SPEECH — Once the skill has completed its task, the voice assistant may acknowledge or respond using a synthesized voice.

Some layers work on device, meaning they don’t need an internet connection. These are a good option for those concerned about privacy, because they don’t share your data across the internet. Others do require an internet connection because they offload processing to cloud servers; these can be more of a privacy risk.

Before you pick a voice stack for your project you’ll need to ask key questions such as:

• What’s the interface of the software like — how easy is it to install and configure, and what support is available?

• What sort of assurances do you have around the software? How accurate is it? Does it recognize your accent well? Is it well tested? Does it make the right decisions about your intended actions?

• What sort of context, or use case, do you have? Do you want your data going across the internet or being stored on cloud servers? Is your hardware constrained in terms of memory or CPU? Do you need to support languages other than English?

ALL-IN-ONE VOICE SOLUTIONS

If you’re looking for an easy option to start with, you might want to try an all-in-one voice solution. These products often package other software together in a way that’s easy to install. They’ll get your DIY voice project up and running the fastest.

Jasper  is designed from the ground up for makers, and is intended to run on a Raspberry Pi. It’s a great first step for integrating voice into your projects. With Jasper, you choose which software components you want to use, and write your own skills, and it’s possible to configure it so that it doesn’t need an internet connection to function.

Rhasspy also uses a modular framework and can be run without an internet connection. It’s designed to run under Docker and has integrations for NodeRED and for Home Assistant, a popular open source home automation software.

Mycroft is modular too, but by default it requires an internet connection. Skills in Mycroft are easy to develop and are written in Python 3; existing skills include integrations with Home Assistant and Mozilla WebThings. Mycroft also builds open-source hardware voice assistants similar to Amazon Echo and Google Home. And it has a distribution called Picroft specifically for the Raspberry Pi 3B and above.

Almond is a privacy-preserving voice assistant from Stanford that’s available as a web app, for Android, or for the GNOME Linux desktop. Almond is very new on the scene, but already has an integration with Home Assistant. It also has options that allow it to run on the command line, so it could be installed on a Raspberry Pi (with some effort).

The languages supported by all-in-one voice solutions are dependent on what software options are selected, but by default they use English. Other languages require additional configuration.

WAKE WORD SPOTTERS

PocketSphinx is a great option for wake word spotting. It’s available for Linux, Mac, Windows platforms, as well as Android and iOS; however, installation can be involved. PocketSphinx works on-device, by recognizing phonemes, which are the smallest units of sound that make up a word.

For example, hello and world each have four phonemes:

hello H EH L OW

world W ER L D

The downside of PocketSphinx is that its core developers appear to have moved on to a for-profit company, so it’s not clear how long PocketSphinx or its parent CMU Sphinx will be around.

Precise by Mycroft.AI uses a recurrent neural network to learn what are and are not wake words. You can train your own wake words with Precise, but it does take a lot of training to get accurate results.

Snowboy is free for makers to train your own wake word, using Kitt.AI’s (proprietary) training, but also comes with several pre-trained models, and wrappers for several programming languages including Python and Go. Once you’ve got your trained wake word, you no longer need an internet connection. It’s an easier option for beginners than Precise or PocketSphinx, and has a very small CPU footprint, which makes it ideal for embedded electronics. Kitt.AI was acquired by Chinese giant Baidu in 2017, although to date it appears to remain as its own entity.

Porcupine from Picovoice is designed specifically for embedded applications. It comes in two variants: a complete model with higher accuracy, and a compressed model with slightly lower accuracy but a much smaller CPU and memory footprint. It provides examples for integration with several common programming languages. Ada, the voice assistant recently released by Home Assistant, uses Porcupine under the hood.

SPEECH TO TEXT

Kaldi has for years been the go-to open source speech-to-text engine. Models are available for several languages, including Mandarin. It works on-device but is notoriously difficult to set up, not recommended for beginners. You can use Kaldi to train your own speech-to-text model, if you have spoken phrases and recordings, for example in another language. Researchers in the Australian Centre for the Dynamics of Language have recently developed Elpis , a wrapper for Kaldi that makes transcription to text a lot easier. It’s aimed at linguists who need to transcribe lots of recordings.

CMU Sphinx , like its child PocketSphinx, is based on phoneme recognition, works on-device, and is complex for beginners.

DeepSpeech, part of Mozilla’s Common Voice project , is another major player in the open source space that’s been gaining momentum. DeepSpeech comes with a pre-trained English model but can be trained on other data sets — this requires a compatible GPU. Trained models can be exported using TensorFlow Lite for inference, and it’s been tested on an RasPi 4, where it comfortably performs real-time transcriptions. Again, it’s complex for beginners.

INTENT PARSING AND ENTITY RECOGNITION

There are two general approaches to intent parsing and entity recognition: neural networks and slot matching. The neural network is trained on a set of phrases, and can usually match an utterance that “sounds like” an intent that should trigger an action. In the slot matching approach, your utterance needs to closely match a set of predefined “slots,” such as “play the song [songname] using [streaming service].” If you say “play Blur,” the utterance won’t match the intent.

Padatious is Mycroft’s new intent parser, which uses a neural network. They also developed Adapt which uses the slot matching approach.

For those who use Python and want to dig a little deeper into the structure of language, the Natural Language Toolkit is a powerful tool, and can do “parts of speech” tagging — for example recognizing the names of places.

Rasa  is a set of tools for conversational applications, such as chatbots, and includes a robust intent parser. Rasa makes predictions about intent based on the entire context of a conversation. Rasa also has a training tool called Rasa X, which helps you train the conversational agent to your particular context. Rasa X comes in both an open source community edition and a licensed enterprise edition.

Picovoice also has Rhino, which comes with pre-trained intent parsing models for free. However, customization of models — for specific contexts like medical or industrial applications — requires a commercial license.

TEXT TO SPEECH

Just like speech-to-text models need to be “trained” for a particular language or dialect, so too do text-to-speech models. However, text to speech is usually trained on a single voice, such as “British Male” or “American Female.”

eSpeak  is perhaps the best-known open source text-to-speech engine. It supports over 100 languages and accents, although the quality of the voice varies between languages. eSpeak supports the Speech Synthesis Markup Language format, which can be used to add inflection and emphasis to spoken language. It is available for Linux, Windows, Mac, and Android systems, and it works on-device, so it can be used without an internet connection, making it ideal for maker projects.

Festival is now quite dated, and needs to be compiled from source for Linux, but does have around 15 American English voices available. It works on-device. It’s mentioned here out of respect; for over a decade it was considered the premier open source text-to-speech engine.

Mimic2 is a Tacotron fork from Mycroft AI, who have also released the to allow you to build your own text-to-speech voices. To get a high-quality voice requires up to 100 hours of “clean” speech, and Mimic2 is too large to work on-device, so you need to host it on your own server or connect your device to the Mycroft Mimic2 server. Currently it only has a pre-trained voice for American English.

Mycroft’s earlier Mimic TTS can work on-device, even on a Raspberry Pi, and is another good candidate for maker projects. It’s a fork of CMU Flite.

Mary Text to Speech supports several, mainly European languages, and has tools for synthesizing new voices. It runs on Java, so can be complex to install.

So, that’s a map of the current landscape in open source voice assistants and software layers. You can compare all these layers in the chart at the end of this article. Whatever your voice project, you’re likely to find something here that will do the job well — and will keep your voice and your data private from Big Tech.

WHAT’S NEXT FOR OPEN SOURCE VOICE?

As machine learning and natural language processing continue to advance rapidly, we’ve seen the decline of the major open source voice tools. CMU Sphinx, Festival, and eSpeak have become outdated as their supporters have adopted other tools, or maintainers have gone into private industry and startups.

We’re going to see more software that’s free for personal use but requires a commercial license for enterprise, as Rasa and Picovoice do today. And it’s understandable; dealing with voice in an era of machine learning is data intensive, a poor fit for the open source model of volunteer development. Instead, companies are driven to commercialize by monetizing a centralized “platform as a service.”

Another trajectory this might take is some form of value exchange. Training all those neural networks and machine learning models — for STT, intent parsing, and TTS — takes vast volumes of data. More companies may provide software on an open source basis and in return ask users to donate voice samples to improve the data sets.Mozilla’s Common Voice follows this model.

Another trend is voice moving on-device. The newer, machine-learning-driven speech tools originally were too computationally intensive to run on low-end hardware like the Raspberry Pi. But with DeepSpeech now running on a RasPi 4, it’s only a matter of time before the newer TTS tools can too.

We’re also seeing a stronger focus on personalization, with the ability to customize both speech-to-text and text-to-speech software.

WHAT WE STILL NEED

What’s lacking across all these open source tools are user-friendly interfaces to capture recordings and train models. Open source products must continue to improve their UIs to attract both developer and user communities; failure to do so will see more widespread adoption of proprietary and “freemium” tools.

As always in emerging technologies, standards remain elusive. For example, skills have to be rewritten for different voice assistants. Device manufacturers, particularly for smart home appliances, won’t want to develop and maintain integrations for multiple assistants; much of this will fall to an already-stretched open source community until mechanisms for interoperability are found. Mozilla’s WebThings ecosystem (see page 50) may plug the interoperability gap if it can garner enough developer support.

Regardless, the burden rests with the open source community to find ways to connect to proprietary systems (see page 46 for a fun example) because there’s no incentive for manufacturers to do the converse.

The future of open source rests in your hands! Experiment and provide feedback, issues, pull requests, data, ideas, and bugs. With your help, open source can continue to have a strong voice.

click the image to view full size. Alternatively, you can download this data as a spreadsheet by clicking here.

Source: Private By Design: Free and Private Voice Assistants

Pervasive digital locational surveillance of citizens deployed in COVID-19 fight

Pervasive surveillance through digital technologies is the business model of Facebook and Google. And now governments are considering the web giants’ tools to track COVID-19 carriers for the public good.

Among democracies, Israel appears to have gone first: prime minister Benjamin Netanyahu has announced “emergency regulations that will enable the use of digital means in the war on Corona. These means will greatly assist us in locating patients and thereby stop the spread of the virus.”

Speaking elsewhere, Netanyhau said the digital tools are those used by Israeli security agency Shin Bet to observe terrorists. Netanyahu said the tools mean the government “will be able to see who they [people infected with the virus] were with, what happened before and after [they became infected].”

Strict oversight and a thirty-day limit on the use of the tools is promised. But the tools’ use was announced as a fait accompli before Israel’s Parliament or the relevant committee could properly authorise their use. And that during a time of caretaker government!

The idea of using tech to spy on COVID-carriers may now be catching.

The Washington Post has reported that the White House has held talks with Google and Facebook about how the data they hold could contribute to analysis of the virus’ spread. Both companies already share some anonymised location with researchers. The Post suggested anonymised location data be used by government agencies to understand how people are behaving.

Thailand recently added a COVID-19-screening form to the Airports of Thailand app. While the feature is a digital replica of a paper registration form offered to incoming travellers, the app asks for location permission and tries to turn on Bluetooth every time it is activated. The Register has asked the app’s developers to explain the permissions it seeks, but has not received a reply in 48 hours.

Computer Emergency Response Team in Farsi chief incident response officer Nariman Gharib has claimed that the Iranian government’s COVID-diagnosis app tracks its users.

China has admitted it’s using whatever it wants to track its people – the genie has been out of the bottle there for years.

If other nations follow suit, will it be possible to put the genie back in?

Probably not: plenty of us give away our location data to exercise-tracking apps for the sheer fun of it and government agencies gleefully hoover up what they call “open source intelligence

Source: Pervasive digital surveillance of citizens deployed in COVID-19 fight, with rules that send genie back to bottle • The Register

Brave Browser Delivers on Promise, Files GDPR Complaint Against Google

Earlier today, March 16, Brave filed a formal complaint against Google with the lead General Data Protection Regulation (GDPR) enforcer in Europe.

In a February Cointelegraph interview, Dr. Johnny Ryan, Brave’s chief policy and industry relations officer, explained that Google is abusing its power by sharing user data collected by dozens of its distinct services, creating a “free for all” data warehouse. According to Ryan, this was a clear violation of the GDPR.

Aggravated with the situation and the lack of enforcement against the giant, Ryan promised to take Google to court if things don’t change for the better.

Complaint against Google

Now, the complaint is with the Irish Data Protection Commission. It accuses Google of violating Article 5(1)b of the GDPR. Dublin is Google’s European headquarters and, as Dr. Ryan explained to Cointelegraph, the Commission “is responsible for regulating Google’s data protection across the European Economic Area”.

Article 5(1)b of the GDPR requires that data be “collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes”. According to Dr. Ryan:

“Enforcement of Brave’s GDPR ‘purpose limitation’ complaint against Google would be tantamount to a functional separation, giving everyone the power to decide what parts of Google they chose to reward with their data.”

Google is a “black box”

Dr. Ryan has spent six months trying to elicit a response from Google to a basic question: “What do you do with my data?” to no avail.

Alongside the complaint, Brave released a study called “Inside the Black Box”, that:

“Examines a diverse set of documents written for Google’s business clients, technology partners, developers, lawmakers, and users. It reveals that Google collects personal data from integrations with websites, apps, and operating systems, for hundreds ill-defined processing purposes.”

Brave does not need regulators to compete with Google

Cointelegraph asked Dr. Ryan how Google’s treatment of user data frustrates Brave as a competitor, to which  Dr. Ryan replied:

“The question is not relevant. Brave does not —  as far as I am aware — have direct frustrations with Google. Brave is growing nicely by being a particularly fast, excellent, and private browser. (It doesn’t need regulators to help it grow.)”

A recent privacy study indicated that Brave protects user privacy much better than Google Chrome or any other major browser.

In addition to filing a formal complaint with the Irish Data Protection Commission, Brave has reportedly written to the European Commission, German Bundeskartellamt, UK Competition & Markets Authority, and French Autorité de la concurrence.

If none of these regulatory bodies take action against Google, Brave has suggested that it may take the tech giant to court itself.

Source: Brave Browser Delivers on Promise, Files GDPR Complaint Against Google

Data of millions of eBay and Amazon shoppers exposed by VAT analysing 3rd party

Researchers have discovered another big database containing millions of European customer records left unsecured on Amazon Web Services (AWS) for anyone to find using a search engine.

A total of eight million records were involved, collected via marketplace and payment system APIs belonging to companies including Amazon, eBay, Shopify, PayPal, and Stripe.

Discovered by Comparitech’s noted breach hunter Bob Diachenko, the AWS instance containing the MongoDB database became visible on 3 February, where it remained indexable by search engines for five days.

Data in the records included names, shipping addresses, email addresses, phone numbers, items purchased, payments, order IDs, links to Stripe and Shopify invoices, and partially redacted credit cards.

Also included were thousands of Amazon Marketplace Web Services (MWS) queries, an MWS authentication token, and an AWS access key ID.

Because a single customer might generate multiple records, Comparitech wasn’t able to estimate how many customers might be affected.

About half of the customers whose records were leaked are from the UK; as far as we can tell, most if not all of the rest are from elsewhere in Europe.

How did this happen?

According to Comparitech, the unnamed company involved was a third party conducting cross-border value-added tax (VAT) analysis.

That is, a company none of the affected customers would have heard of or have any relationship with:

This exposure exemplifies how, when handing over personal and payment details to a company online, that info often passes through the hands of various third parties contracted to process, organize, and analyze it. Rarely are such tasks handled solely in house.

Amazon queries could be used to query the MWS API, Comparitech said, potentially allowing an attacker to request records from sales databases. For that reason, it recommended that the companies involved should immediately change their passwords and keys.

Why are workers getting smaller pieces of the pie?

It’s one of the biggest economic changes in recent decades: Workers get a smaller slice of company revenue, while a larger share is paid to capital owners and distributed as profits. Or, as economists like to say, there has been a fall in labor’s share of gross domestic product, or GDP.

A new study co-authored by MIT economists uncovers a major reason for this trend: Big companies that spend more on capital and less on workers are gaining market share, while smaller firms that spend more on workers and less on capital are losing market share. That change, the researchers say, is a key reason why the labor share of GDP in the U.S. has dropped from around 67 percent in 1980 to 59 percent today, following decades of stability.

“To understand this phenomenon, you need to understand the reallocation of economic activity across firms,” says MIT economist David Autor, co-author of the paper. “That’s our key point.”

To be sure, many economists have suggested other hypotheses, including new generations of software and machines that substitute directly for workers, the effects of international trade and outsourcing, and the decline of labor union power. The current study does not entirely rule out all of those explanations, but it does highlight the importance of what the researchers term “superstar firms” as a primary factor.

“We feel this is an incredibly important and robust fact pattern that you have to grapple with,” adds Autor, the Ford Professor of Economics in MIT’s Department of Economics.

The paper, “The Fall of the Labor Share and the Rise of Superstar Firms,” appears in advance online form in the Quarterly Journal of Economics.

[…]

For much of the 20th century, labor’s share of GDP was notably consistent. As the authors note, John Maynard Keynes once called it “something of a miracle” in the face of economic changes, and the British economist Nicholas Kaldor included labor’s steady portion of GDP as one of his often-cited six “stylized facts” of growth.

To conduct the study, the researchers scrutinized data for the U.S. and other countries in the Organization of Economic Cooperation and Development (OECD). The scholars used U.S. Economic Census data from 1982 to 2012 to study six economic sectors that account for about 80 percent of employment and GDP: manufacturing, retail trade, wholesale trade, services, utilities and transportation, and finance. The data includes payroll, total output, and total employment.

The researchers also used information from the EU KLEMS database, housed at the Vienna Institute for International Economic Studies, to examine the other OECD countries.

The increase in market dominance for highly competitive top firms in many of those sectors is evident in the data. In the retail trade, for instance, the top four firms accounted for just under 15 percent of sales in 1981, but that grew to around 30 percent of sales in 2011. In utilities and transportation, those figures moved from 29 percent to 41 percent in the same time frame. In manufacturing, this top-four sales concentration grew from 39 percent in 1981 to almost 44 percent in 2011.

At the same time, the average payroll-to-sales ratio declined in five of those sectors—with finance being the one exception. In manufacturing, the payroll-to-sales ratio decreased from roughly 18 percent in 1981 to about 12 percent in 2011. On aggregate, the labor share of GDP declined at most times except the period from 1997 to 2002, the final years of an economic expansion with high employment.

But surprisingly, labor’s share is not falling at the typical firm. Rather, reallocation of between firms is the key. In general, says Autor, the picture is of a “winner-take-most setting, where a smaller number of firms are accounting for a larger amount of economic activity, and those are firms where workers historically got a smaller share of the pie.”

A key insight provided by the study is that the dynamics within industry sectors has powered the drop in the labor share of GDP. The overall change is not just the result of, say, an increase in the deployment of technology in manufacturing, which some economists have suggested. While manufacturing is important to the big picture, the same phenomenon is unfolding across and within many sectors of the economy.

As far as testing the remaining alternate hypotheses, the study found no special pattern within industries linked to changes in trade policy—a subject Autor has studied extensively in the past. And while the decline in union power cannot be ruled out as a cause, the drop in labor share of GDP occurs even in countries where unions remain relatively stronger than they do in the U.S.

Source: Why are workers getting smaller pieces of the pie?

He then goes on to say:

“We shouldn’t presume that just because a market is concentrated—with a few leading firms accounting for a large fraction of sales—it’s a market with low productivity and high prices,” Autor says. “It might be a market where you have some very productive leading firms.” Today, he adds, “more competition is platform-based competition, as opposed to simple price competition. Walmart is a platform business. Amazon is a platform business. Many tech companies are platform businesses. Many financial services companies are platform businesses. You have to make some huge investment to create a sophisticated service or set of offerings. Once that’s in place, it’s hard for your competitors to replicate.”

With this in mind, Autor says we may want to distinguish whether market concentration is “the bad kind, where lazy monopolists are jacking up prices, or the good kind, where the more competitive firms are getting a larger . To the best we can distinguish, the rise of superstar firms appears more the latter than the former. These firms are in more innovative industries—their productivity growth has developed faster, they make more investment, they patent more. It looks like this is happening more in the frontier sectors than the laggard sectors.”

Still Autor adds, the paper does contain policy implications for regulators.

“Once a firm is that far ahead, there’s potential for abuse,” he notes. “Maybe Facebook shouldn’t be allowed to buy all its competitors. Maybe Amazon shouldn’t be both the host of a market and a competitor in that market. This potentially creates regulatory issues we should be looking at. There’s nothing in this paper that says everyone should just take a few years off and not worry about the issue.”

I’d completely disagree – platform businesses are behaving like monopolists, but you need to look beyond product price to understand that selling at a loss is called undercutting and there are many many other reasons that monopoly is a bad thing, as I explain below.

Banjo, the company that will use an AI to spy on all of Utah through all their cams Used a Secret Company and Fake Apps to Scrape Social Media

Banjo, an artificial intelligence firm that works with police used a shadow company to create an array of Android and iOS apps that looked innocuous but were specifically designed to secretly scrape social media, Motherboard has learned.

The news signifies an abuse of data by a government contractor, with Banjo going far beyond what companies which scrape social networks usually do. Banjo created a secret company named Pink Unicorn Labs, according to three former Banjo employees, with two of them adding that the company developed the apps. This was done to avoid detection by social networks, two of the former employees said.

Three of the apps created by Pink Unicorn Labs were called “One Direction Fan App,” “EDM Fan App,” and “Formula Racing App.” Motherboard found these three apps on archive sites and downloaded and analyzed them, as did an independent expert. The apps—which appear to have been originally compiled in 2015 and were on the Play Store until 2016 according to Google—outwardly had no connection to Banjo, but an analysis of its code indicates connections to the company. This aspect of Banjo’s operation has some similarities with the Cambridge Analytica scandal, with multiple sources comparing the two incidents.

“Banjo was doing exactly the same thing but more nefariously, arguably,” a former Banjo employee said, referring to how seemingly unrelated apps were helping to feed the activities of the company’s main business.

[…]

Last year Banjo signed a $20.7 million contract with Utah that granted the company access to the state’s traffic, CCTV, and public safety cameras. Banjo promises to combine that input with a range of other data such as satellites and social media posts to create a system that it claims alerts law enforcement of crimes or events in real-time.

“We essentially do most of what Palantir does, we just do it live,” Banjo’s top lobbyist Bryan Smith previously told police chiefs and 911 dispatch officials when pitching the company’s services.

[…]

Motherboard found the apps developed by Pink Unicorn Labs included code mentioning signing into Facebook, Twitter, Instagram, Russian social media app VK, FourSquare, Google Plus, and Chinese social network Sina Weibo.

[…]

One of the former employees said they saw one of the apps when it was still working and it had a high number of logins.

“It was all major social media platforms,” they added. The particular versions of the apps Motherboard obtained, when opened, asked a user to sign-in with Instagram.

Business records for Pink Unicorn Labs show the company was originally incorporated by Banjo CEO Damien Patton. Banjo employees worked directly on Pink Unicorn Labs projects from Banjo’s offices, several of the former employees said, though they added that Patton made it clear in recent years that Banjo needed to wind down Pink Unicorn Labs’ work and not be linked to the firm.

“There was something about Pink Unicorn that was important for Damien to distance himself from,” another former employee told Motherboard.

[…]

ome similar companies, like Dataminr, have permission from social media sites to use large amounts of data; Twitter, which owns a stake in Dataminr, gives the firm exclusive access to its so-called “fire hose” of public posts.

Banjo did not have that sort of data access. So it created Pink Unicorn Labs, which one former employee described as a “shadow company,” that developed apps to harvest social media data.

“They were shitty little apps that took advantage of some of the data that we had but the catch was that they had a ton of OAuth providers,” one of the former employees said. OAuth providers are methods for signing into apps or websites via another service, such as Facebook’s “Facebook Connect,” Twitter’s “Sign In With Twitter,” or Google’s “Google Sign-In.” These providers mean a user doesn’t have to create a new account for each site or app they want to use, and can instead log in via their already established social media identity.

But once users logged into the innocent looking apps via a social network OAuth provider, Banjo saved the login credentials, according to two former employees and an expert analysis of the apps performed by Kasra Rahjerdi, who has been an Android developer since the original Android project was launched. Banjo then scraped social media content, those two former employees added. The app also contained nonstandard code written by Pink Unicorn Labs: “The biggest red flag for me is that all the code related to grabbing Facebook friends, photos, location history, etc. is directly from their own codebase,” Rahjerdi said.

[…]

“Banjo was secretly farming peoples’ user tokens via these shadow apps,” one of the former employees said. “That was the entire point and plan,” they added when asked if the apps were specifically designed to steal users’ login tokens.

[…]

The apps request a wide range of permissions, such as access to location data, the ability to create accounts and set passwords, and find accounts on the device.

Multiple sources said Banjo tried to keep Pink Unicorn Labs a secret, but Motherboard found several links between the two. An analysis of the Android apps revealed all three had code that contained web links to Banjo’s website; each app contained a set of identical data that appeared to be pulled from social network sites, including repeatedly the Twitter profile of Jennifer Peck, who works for Banjo and is also married to Banjo’s Patton. In registration records for the two companies, both Banjo and Pink Unicorn Labs shared the same address in Redwood, California; and Patton is listed as the creator of Pink Unicorn Labs in that firm’s own public records.

Source: Surveillance Firm Banjo Used a Secret Company and Fake Apps to Scrape Social Media – VICE