...

Public DeepSeek AI database exposes API keys and other user data


gettyimages-2195797164

PETER CATTERALL/Contributor/Getty Images

Barely a week into its new-found fame, DeepSeek — and the story about its development — is evolving at breakneck speed.

The Chinese AI startup made waves last week when it released the full version of R1, the company’s open-source reasoning model that can outperform OpenAI’s o1. On Monday, App Store downloads of DeepSeek’s AI assistant, which runs V3, a model DeepSeek released in December, topped ChatGPT, which had previously been the most downloaded free app. 

Also: Apple researchers reveal the secret sauce behind DeepSeek AI

DeepSeek R1 climbed to the third spot overall on HuggingFace’s Chatbot Arena, battling with several Gemini models and ChatGPT-4o, while releasing a promising new image model

However, it’s not all good news — numerous security concerns have surfaced about the model. Here’s what you need to know.

screenshot-2025-01-27-at-2-27-16pm.png

DeepSeek’s chat page at the time of writing.

Screenshot by Radhika Rajkumar/ZDNET

What is DeepSeek?

Founded by Liang Wenfeng in May 2023 (and thus not even two years old), the Chinese startup has challenged established AI companies with its open-source approach. According to Forbes, DeepSeek’s edge may lie in the fact that it is funded only by High-Flyer, a hedge fund also run by Wenfeng, which gives the company a funding model that supports fast growth and research. 

Also: Perplexity lets you try DeepSeek R1 without the security risk, but it’s still censored

The company’s ability to create successful models by using older chips — a result of the export ban on US-made chips, including Nvidia — is impressive by industry standards. 

What is DeepSeek R1?

Released in full last week, R1 is DeepSeek’s flagship reasoning model, which performs at or above OpenAI’s lauded o1 model on several math, coding, and reasoning benchmarks. 

Built on V3 and based on Alibaba’s Qwen and Meta’s Llama, what makes R1 interesting is that, unlike most other top models from tech giants, it’s open source, meaning anyone can download and use it. That said, DeepSeek has not disclosed R1’s training dataset. So far, all other models it has released are also open source. 

Also: I tested DeepSeek’s R1 and V3 coding skills – and we’re not all doomed (yet)

DeepSeek is cheaper than comparable US models. For reference, R1 API access starts at $0.14 for a million tokens, a fraction of the $7.50 that OpenAI charges for the equivalent tier. 

DeepSeek claims in a company research paper that its V3 model, which can be compared to a standard chatbot model like Claude, cost $5.6 million to train, a number that’s circulated (and disputed) as the entire development cost of the model. As the AP reported, some lab experts believe the paper only refers to the final training run for V3, not its entire development cost (which would be a fraction of what tech giants have spent to build competitive models). Some experts suggest DeepSeek’s costs don’t include earlier infrastructure, R&D, data, and personnel costs.

One drawback that could impact the model’s long-term competition with o1 and US-made alternatives is censorship. Chinese models often include blocks on certain subject matter, meaning that while they function comparably to other models, they may not answer some queries (see how DeepSeek’s AI assistant responds to questions about Tiananmen Square and Taiwan here). As DeepSeek use increases, some are concerned its models’ stringent Chinese guardrails and systemic biases could be embedded across all kinds of infrastructure. 

Even as platforms like Perplexity add access to DeepSeek and claim to have removed its censorship weights, the model refused to answer my question about Tiananmen Square as of Thursday afternoon. 

Also: Is DeepSeek’s new image model another win for cheaper AI?

In December, ZDNET’s Tiernan Ray compared R1-Lite’s ability to explain its chain of thought to that of o1, and the results were mixed. That said, DeepSeek’s AI assistant reveals its train of thought to the user during queries, a novel experience for many chatbot users given that ChatGPT does not externalize its reasoning. 

Of course, all popular models come with red-teaming backgrounds, community guidelines, and content guardrails. However, at least at this stage, American-made chatbots are unlikely to refrain from answering queries about historical events. 

Privacy and security red flags

Data privacy worries that have circulated TikTok — the Chinese-owned social media app now somewhat banned in the US — are also cropping up around DeepSeek. 

On Wednesday, research firm Wiz discovered that an internal DeepSeek database was publicly accessible “within minutes” of conducting a security check. The “completely open and unauthenticated” database contained chat histories, user API keys, and other sensitive data.

“More critically, the exposure allowed for full database control and potential privilege escalation within the DeepSeek environment, without any authentication or defense mechanism to the outside world,” Wiz’s report explains.

According to Wired, which initially published the research, though Wiz did not receive a response from DeepSeek, the database appeared to be taken down within 30 minutes of Wiz notifying the company. It’s unclear how long it was accessible or if any other entity discovered it before it was taken down. 

Even without this alarming development, DeepSeek’s privacy policy raises some flags. “The personal information we collect from you may be stored on a server located outside of the country where you live,” it states. “We store the information we collect in secure servers located in the People’s Republic of China.”

Also: ‘Humanity’s Last Exam’ benchmark is stumping top AI models – can you do any better?

The policy outlines that DeepSeek collects plenty of information, including but not limited to:

  • “IP address, unique device identifiers, and cookies”
  • “date of birth (where applicable), username, email address and/or telephone number, and password”
  • “your text or audio input, prompt, uploaded files, feedback, chat history, or other content that you provide to our model and Services”
  • “proof of identity or age, feedback or inquiries about your use of the Service,” if you contact DeepSeek

The policy continues: “Where we transfer any personal information out of the country where you live, including for one or more of the purposes as set out in this Policy, we will do so in accordance with the requirements of applicable data protection laws.” The policy does not mention GDPR compliance.

Also: How to protect your privacy from Facebook – and what doesn’t work

“Users need to be aware that any data shared with the platform could be subject to government access under China’s cybersecurity laws, which mandate that companies provide access to data upon request by authorities,” Adrianus Warmenhoven, a member of NordVPN‘s security advisory board, told ZDNET via email.

According to some observers, the fact that R1 is open source means increased transparency, allowing users to inspect the model’s source code for signs of privacy-related activity. 

However, DeepSeek also released smaller versions of R1, which can be downloaded and run locally to avoid any concerns about data being sent back to the company (as opposed to accessing the chatbot online). 

Also: ChatGPT privacy tips: Two important ways to limit the data you share with OpenAI

All chatbots, including ChatGPT, collect some degree of user data when queried via the browser. 

Safety concerns

AI safety researchers have long been concerned that powerful open-source models could be applied in dangerous and unregulated ways once out in the wild. Tests by AI safety firm Chatterbox found DeepSeek R1 has “safety issues across the board.” 

Also: We’re losing the battle against complexity, and AI may or may not help

Even in varying degrees, US AI companies employ some kind of safety oversight team. DeepSeek has not publicized whether they have a safety research team, and has not responded to ZDNET’s request for comment on the matter.

“Most companies will keep racing to build the strongest AI they can, irrespective of the risks, and will see enhanced algorithmic efficiency as a way to achieve higher performance faster,” said Peter Slattery, a researcher on MIT’s FutureTech team who led its Risk Repository project. “That leaves us even less time to address the safety, governance, and societal challenges that will come with increasingly advanced AI systems.”

“DeepSeek’s breakthrough in training efficiency also means we should soon expect to see a large number of local, specialized ‘wrappers’ — apps built on top of DeepSeek R1 engine — which will each introduce their own privacy risks, and which could each be misused if they fell into the wrong hands,” added Ryan Fedasiuk, director of US AI governance at The Future Society, an AI policy nonprofit. 

Energy efficiency claims

Some analysts note that DeepSeek’s lower-lift compute model is more energy efficient than that of US AI giants. 

“DeepSeek’s new AI model likely does use less energy to train and run than larger competitors’ models,” said Slattery. “However, I doubt this marks the start of a long-term trend in lower energy consumption. AI’s power stems from data, algorithms, and compute — which rely on ever-improving chips. When developers have previously found ways to be more efficient, they have typically reinvested those gains into making even bigger, more powerful models, rather than reducing overall energy usage.”

“DeepSeek isn’t the only AI company that has made extraordinary gains in computational efficiency. In recent months, U.S.-based Anthropic and Google Gemini have boasted similar performance improvements,” Fedasiuk said. 

Also: $450 and 19 hours is all it takes to rival OpenAI’s o1-preview

“DeepSeek’s achievements are remarkable in that they seem to have independently engineered breakthroughs that promise to make large language models much more efficient and less expensive, sooner than many industry professionals were expecting — but in a field as dynamic as AI, it’s hard to predict just how long the company will be able to bask in the limelight.” 

How will DeepSeek affect the AI industry?  

R1’s success highlights a sea change in AI that could empower smaller labs and researchers to create competitive models and diversify the options. For example, organizations without the funding or staff of OpenAI can download R1 and fine-tune it to compete with models like o1. Just before R1’s release, researchers at UC Berkeley created an open-source model on par with o1-preview, an early version of o1, in just 19 hours and for roughly $450. 

Given how exorbitant AI investment has become, many experts speculate that this development could burst the AI bubble (the stock market certainly panicked). Some see DeepSeek’s success as debunking the thought that cutting-edge development means big models and spending. It also casts Stargate, a $500 billion infrastructure initiative spearheaded by several AI giants, in a new light, creating speculation around whether competitive AI requires the energy and scale of the initiative’s proposed data centers. 

DeepSeek’s ascent comes at a critical time for Chinese-American tech relations, just days after the long-fought TikTok ban went into partial effect. Ironically, DeepSeek lays out in plain language the fodder for security concerns that the US struggled to prove about TikTok in its prolonged effort to enact the ban. The US Navy already banned using DeepSeek last week.



Source link

#Public #DeepSeek #database #exposes #API #keys #user #data