Translation software allows you to efficiently store large amounts of data in DNA


ADS Codex can convert binary data to nucleotides and sequence them intramolecularly as files for later retrieval, resulting in potential cost savings and compact “cold storage”.

Support major collaborative projects for storing large amounts of data DNA A team led by the National Laboratory of Los Alamos National Laboratory has developed an important realization technology for converting digital binary files into the four-letter genetic alphabet required for molecular storage.

“Our software, Adaptive DNA Storage Codec (ADS Codex), transforms data files from what computers understand to what biology understands,” said Latchesar Ionkov, a computer scientist at Los Alamos and a senior researcher at the project. Says. “It’s like an English-to-Chinese translation, but it’s just more difficult.”

“Our software, Adaptive DNA Storage Codec (ADS Codex), transforms data files from what computers understand to what biology understands.” — Latchesar Ionkov

This work is an important part of the Intelligence Advanced Research Projects Activity (IARPA) Molecular Information Storage (MIST) program, which brings cheaper, larger, and longer-lasting storage to government and private sector big data operations. MIST’s short-term goal is to write 1 terabyte (1 trillion bytes) and read 10 terabytes for $ 1,000 within 24 hours. Other teams have improved the writing (DNA synthesis) and retrieval (DNA sequencing) components of the initiative, and Los Alamos is working on coding and decoding.

Bradley Settlemeyer, a storage system researcher and system programmer specializing in high-performance computing at Los Alamos, said: “DNA storage has very long data retention and very high data density, which can confuse thinking about archive storage. You can store all your YouTube in a refrigerator instead of many acres of data centers. However, researchers must first clear some of the difficult technical hurdles associated with the integration of different technologies. “

Not lost in translation

Compared to traditional long-term storage methods that use pizza-sized tape reels, DNA storage is potentially cheaper, much more physically compact, energy efficient, and lasts longer. DNA lasts for hundreds of years and is not needed. maintenance. Files stored in DNA can also be copied very easily at a very low cost.

The storage density of DNA is astounding. Please consider this. Mankind will produce an estimated 33 zettabytes by 2025. That is, 3.3 followed by 22 zeros. All that information fits comfortably in the ping-pong ball. The Library of Congress has about 74 terabytes, or 74 million bytes of information. There are 6,000 such libraries that fit in poppy seed-sized DNA archives. Facebook’s 300 petabytes (300,000 terabytes) can be stored in half the poppy fruit.

Encoding a binary file into a molecule is done by DNA synthesis. Synthesis, a fairly well-understood technique, organizes the components of DNA into different arrangements. These are represented by a sequence of letters A, C, G, and T. These are the basis of all DNA codes and provide instructions for building all living things. Things on earth.

The Los Alamos team’s ADS Codex explains exactly how to convert binary data (all 0s and 1s) into a sequence of four letter combinations of A, C, G, and T. Codex also handles converting decoding back to binary. DNA can be synthesized in several ways, and ADS Codex can handle all of them. The Los Alamos team has completed version 1.0 of the ADS Codex and plans to use it in November 2021 to evaluate storage and search systems developed by other MIST teams.

Unfortunately, DNA synthesis can make coding mistakes, so ADS Codex addresses two major obstacles to creating DNA data files.

First, the error rate during writing to molecular storage was so high compared to traditional digital systems that the team had to come up with a new error correction strategy. Second, DNA storage errors come from sources different from those in the digital world, making it difficult to correct.

“On digital hard disks, inverting 0 to 1 and vice versa causes binary errors, but with DNA, there are additional problems due to insert and delete errors,” said Ionkov. .. “I’m writing A, C, G, T, but when I try to write A, nothing is displayed, so the sequence of letters shifts to the left or I type AAA. Normal error correction code works fine. It doesn’t work. “

ADS Codex adds additional information called error detection code that you can use to validate your data. When the software converts the data back to binary, it tests whether the code matches. Otherwise, ACOMA will attempt to remove or add nucleotides until the validation is successful.

Smart scale up

Large warehouses have the largest data centers today, with over 1 trillion bytes of exabytes of storage. This type of digital-based data center can cost billions of dollars to build, power, and operate, and the need for data storage has grown exponentially, so it may not be the best option. I have.

Long-term storage with cheaper media is important for national security missions such as Los Alamos. “Los Alamos has some of the oldest digital-only data and the largest data stores since the 1940s,” says Settlemyer. “It’s still of tremendous value. We’re holding the data forever, so we’ve been at the tip of the spear for a long time when it comes to finding refrigeration solutions.”

Settlemyer said DNA storage can be a disruptive technology as it moves back and forth between innovative disciplines. The MIST project is among legacy storage vendors that manufacture tapes, DNA synthesizers, DNA sequencing companies, and high-performance computing organizations like Los Alamos that push computers to an unprecedented scale of science-based simulation. Inspiring a new coalition. It gives you an incredible amount of data to analyze.

Dig deeper into DNA

When most people think of DNA, they think of life, not computers. However, DNA is itself a four-letter code that conveys information about living things. DNA molecules are made up of four bases or nucleotides, adenine (A), thymine (T), guanine (G), and cytosine (C), each identified by a letter.

These bases are wrapped in a twisted chain (the familiar double helix) to form a molecule. Placing these letters in a sequence creates code that tells the organism how to form it. The complete set of DNA molecules constitutes the genome, the blueprint for your body.

By synthesizing DNA molecules and creating them from scratch, researchers have found that they can specify or write long strings of letters A, C, G, and T and read back their sequences. This process is similar to how a computer uses 0s and 1s to store information. Although
this method has proven to work, reading and writing DNA-encoded files currently takes a long time, Ionkov said.

“Adding a single nucleotide to DNA is very slow. It takes a minute,” says Ionkov. “Imagine writing a file to a hard drive that takes more than a decade. So that problem is solved by massively parallelizing. You have tens of millions of molecules at the same time to speed it up. write.”

While different companies are working on different synthesis methods to address this issue, ADS Codex can adapt to any approach.

Funding for the ADS Codex was provided by the Intelligence Advanced Research Projects Activity (IARPA), a research institute within the Director of National Intelligence of the State.

Translation software allows you to efficiently store large amounts of data in DNA molecules

Source link Translation software allows you to efficiently store large amounts of data in DNA molecules

(function(d, s, id){
var js, fjs = d.getElementsByTagName(s)[0];
if (d.getElementById(id)) return;
js = d.createElement(s); = id;
js.src = “”;
fjs.parentNode.insertBefore(js, fjs);
}(document, ‘script’, ‘facebook-jssdk’));
{“@context”:””,”@type”:”NewsArticle”,”dateCreated”:”2021-08-11T05:05:07+00:00″,”datePublished”:”2021-08-11T05:05:07+00:00″,”dateModified”:”2021-08-11T05:05:07+00:00″,”headline”:”Translation software allows you to efficiently store large amounts of data in DNA…


Read More:Translation software allows you to efficiently store large amounts of data in DNA