Harnessing eukaryotic retroelement proteins for transgene insertion into human safe-harbor loci


Sequences

Construct sequences used in this work are provided in Supplementary Table 1a. Constructs for producing R2 proteins and GFP transgene template RNA will be available from Addgene. Codon-optimized ORFs and other DNA modules were purchased from GenScript. The ZoAl-RTD mutation is DD644-645AA, EN-dead double mutation is D1041A and D1054A, and EN-low mutations are H1006A, Y1077A and R1103A numbered from a chosen start site for synthetic ZoAl ORF. The TaGu-RTD mutation is DD660-661AA, EN-dead double mutation is D1057A and D1070A, and EN-low mutations are H1022A, Y1093A and R1119A numbered from a chosen start site for TaGu ORF. PCR product sequences used for transcription of short template RNAs are listed in Supplementary Table 1a. Oligonucleotide sequences are listed in Supplementary Table 1b. Design of the minimal polyA signal (minPA) used insights from previous work70. The CMV immediate–early promoter has a single base-pair substitution intended to reduce transcriptional silencing by DNA methylation, based on previous work71. The SV40 immediate–early promoter was also modified, including a more optimal TATA box designed using insights from previous work72.

Cell culture

RPE and ARPE-19 cells were grown in DMEM/F12 (Gibco) supplemented with 10% fetal bovine serum (FBS) (Seradigm) and 100 μg ml−1 Primocin (InvivoGen). HEK293T, HeLa, IMR-90, MRC-5 and C2C12 cells were grown in DMEM (Gibco) supplemented with 10% FBS. Vero cells were cultivated in DMEM supplemented with 10% FBS and 1% nonessential amino acid (Gibco). All cells were cultured at 37 °C under 5% CO2 and tested for mycoplasma contamination. Human cell lines were validated by short tandem repeat profiling (Promega, catalog no. B9510).

Protein expression and purification for biochemical assays

HEK293T cells were transiently transfected with pcDNA3.1 plasmids encoding proteins N-terminally tagged with a single FLAG peptide, unless stated otherwise. Cells at 80% confluency in a 10 cm plate were reverse transfected with 12 μg DNA using Lipofectamine 3000. After 16–24 h, cells were trypsinized, resuspended in 5 ml of media and pelleted at roughly 2,000g for 3 min in 15 ml conical tubes. Pelleted cells were washed by resuspension in 0.5 ml of chilled PBS containing 1 mM phenylmethylsulfonyl fluoride (PMSF), transferred to a 1.5 ml tube and repelleted at roughly 2,000g for 1 min at 4 °C. Cell pellets were resuspended in 4× pellet volume of 1× HLB (20 mM HEPES pH 8.0, 2 mM MgCl2, 200 μM EGTA, 10% glycerol, 1 mM DTT, 0.2% protease inhibitor cocktail (Sigma, catalog no. P8340), 1 mM PMSF) and set on ice for 5 min. Cells were then lysed by three cycles of snap freezing in liquid nitrogen and thawing in a room temperature water bath. Samples were then brought to 400 mM NaCl, gently vortexed and placed on ice for an additional 5 min. Lysed cells were spun at 17,000g for 5 min at 4 °C. The supernatant was collected and the concentration of NaCl lowered to 200 mM and NP-40 added by the addition of an equal volume of 1× HLB containing 0.2% NP-40. Samples were vortexed gently and spun at 17,000g for 10 min at 4 °C to clarify the supernatant.

For affinity purification, 20 μl of FLAG resin per sample (Sigma, catalog no. A2220) was equilibrated and blocked with 1 μg μl−1 molecular grade BSA and 1 μg μl−1 yeast tRNA (Calbiochem, catalog no. 55714) in 200 μl of immunoprecipitation buffer (1× HLB, 200 mM NaCl, 0.1% NP-40) for 30 min at 4 °C. Blocked resin was washed 2× with immunoprecipitation buffer and resuspended in 100 μl of immunoprecipitation buffer per sample, which was added to 700 μl of clarified cell lysate. Binding reactions were rotated at 4 °C for 2 h and then washed with immunoprecipitation buffer four times (two quick washes, two with 5 min of rotation at 4 °C). All buffer was removed with a 30G needle before bound resin was resuspended in 40 μl of immunoprecipitation buffer with 50 ng μl−1 3× FLAG peptide (Sigma, catalog no. F4799) and incubated at room temperature for 1 h. The slurry was aliquoted, snap frozen and stored at −80 °C. Immunoblots used 0.45 μm nitrocellulose membrane (Bio-Rad, catalog no. 1620115) blocked in TBST (10 mM Tris-Cl pH 7.5, 150 mM NaCl, 0.1% Tween 20, 0.02% sodium azide) with 5% BSA and probed in the same buffer with anti-FLAG antibody (Sigma, catalog no. F1804, 1:3,000) and then Alexa Fluor 680 antimouse secondary (Thermo Fisher, catalog no. A21057, 1:2,000). Detection was by LI-COR Odyssey. Coomassie staining of affinity-purified proteins resolved by SDS–PAGE used recombinant MBP-BoMoC as a protein standard73.

IVT

Template RNAs were transcribed with the HiScribe T7 Kit (NEB, catalog no. E2040S) according to the manufacturer’s instructions. RNA templates for biochemical assays of TPRT and for A-tract length change were made using 1 μg of PCR-amplified transcription template per 20 μl of reaction. Transgene template RNAs and mRNAs for cellular transfection were made using 1 μg per 20 μl plasmid fully linearized with Bbs I (NEB) for 4 h at 37 °C and then purified with PCR purification kit (QIAGEN, catalog no. 28106). Templates with TiGu 3′ UTR were instead digested using Sap I (NEB), due to an internal Bbs I site, and gel-purified with QIAEX II Gel Extraction Kit (QIAGEN, catalog no. 20021). R2 protein mRNAs were made with AG Clean cap (TriLink, catalog no. N-7113) per the manufacturer’s protocol74 using UTR sequences and DNA-templated, linker (L)-containing poly-adenosine tail A30L10A70 from the BioNTech COVID-19 vaccine mRNA75. Canonical ribonucleotides were purchased from NEB and uridine analogs were purchased from from TriLink or APExBIO. Transcription reactions were incubated at 37 °C for 2 h, followed by addition of 1 μl of DNase RQ1 (Promega, catalog no. M610A) or 2 μl RNase-free DNase I (Thermo Fisher, catalog no. FEREN0521). Product RNA was purified by desalting with a quick-spin column (Roche, catalog no. 11814397001) or illustra ProbeQuant G-50 Micro Column (Cytiva, catalog no. 28903408) followed by phenol–chloroform–isoamyl alcohol (PCI; Thermo Fisher, catalog no. BP1752I-100) purification and precipitation with final concentration of 2.5 M LiCl or with final concentration 0.3 M sodium acetate (pH 5.2) and 3 volumes of 100% ethanol. After washing with 70% ethanol 2–3 times, RNAs were resuspended in 1 mM sodium citrate (pH 6.5) or in water for RNAs used only for biochemical assays. Concentration was determined by NanoDrop and integrity verified by denaturing urea-PAGE with direct staining using SYBR Gold (Thermo Fisher, catalog no. S11494).

TPRT assays with affinity-purified protein

The primer strand of target-site duplex was 5′ radiolabeled with 32P γ-ATP using T4 PNK (NEB, catalog no. M0201L). Unlabeled nucleotides were removed by spin column (Roche, catalog no. 11814397001 or Cytiva, catalog no. 27-5325-01). Complementary strands were annealed by heating to 95 °C and cooling by 1 °C per min. Unless indicated otherwise, TPRT template RNA was GeFo 3′ UTR_R4A22 with unmodified uridine. TPRT reactions were assembled in 20 μl with final concentrations of 25 mM Tris-HCl pH 7.5, 75 mM KCl, 5 mM MgCl2, 10 mM DTT, 2% PEG-6K, 5 nM target-site duplex, 0.6 μM template RNA, 0.5 mM dNTPs and approximately 10 nM R2 protein in immunoprecipitation elution buffer and then incubated at 37 °C for 30 min before heat inactivation at 70 °C for 5 min and dilution with 80 μl of stop solution (50 mM Tris-HCl pH 7.5, 20 mM EDTA, 0.2% SDS) spiked with 5′ radiolabeled 100-nt loading control oligonucleotide. Nucleic acid was purified by PCI extraction and ethanol precipitated in a dry ice ethanol bath. Samples were then pelleted at roughly 18,000g for 20 min at room temperature and pellets washed with 70% ethanol, resuspended in 5 μl of water and supplemented with 5 μl of formamide loading dye (95% deionized formamide, 0.025% w/v bromophenol blue, 0.025% w/v xylene cyanol, 5 mM EDTA pH 8.0). The sample was heated to 95 °C for 3 min and then placed on ice before loading half of the sample on a 9% urea-PAGE gel. After electrophoresis the gel was dried, exposed to a phosphoimaging screen and imaged by Typhoon Trio (Cytiva).

PRINT by delivery of protein-encoding plasmid and template RNA

HEK293T cells were plated at 2.5 million cells per well in six-well plates and reverse-transfected with 1 µg plasmid using Lipofectamine 3000 at 1/2 mass/volume ratio as per the manufacturer’s instructions. On the next day, cells were split at a 1/2 ratio, keeping half. On the subsequent day, each well was reverse-transfected with 2 µg template RNA using Lipofectamine MessengerMAX (Thermo Fisher, catalog no. LMRNA015) at 1/2 mass/volume ratio as per the manufacturer’s instructions. Cells were collected 1 d after RNA transfection and the cell pellets were stored at −80 °C after snap freezing in liquid nitrogen.

PRINT by 2-RNA delivery

RPE cells in log-phase growth at 50% confluency were replated at 0.75–1 million cells per well in six-well plates. Cells were reverse-transfected with mRNA and template RNA using Lipofectamine MessengerMAX at 1/2 mass/volume ratio as per the manufacturer’s instructions. For some experiments, ~0.03 µg of 5moU mCherry mRNA (TriLink, catalog no. L-7203) per 1 µg total RNA mixture was used as a spike-in transfection control. Cells were collected 20–24 h (1 d) after transfection unless noted otherwise. The same transfection protocol was used for other cell lines except with different cell density per well of a six-well plate: ARPE-19, HeLa, IMR-90, MRC-5 and Vero cells were plated at 1 million per well; C2C12 was plated at 0.5 milion per well and HEK293T was plated at 2 million per well. Unless noted otherwise, 2.5 µg RNA was transfected per well of six-well plate and mRNA/template molar ratio was 1/3. For transfections followed by sorting to single cells or graded GFP intensity cell pools, RNA dose ranged from 1–1.5 µg with 1/2 or 1/3 MessengerMAX.

Sequences for mRNA and template RNA transcription are provided in Supplementary Table 1a. Unless noted otherwise, the mRNAs encoding R2 proteins had 100% uridine substitution with 1mψ or 5moU. Protein expression used the FLAG-tagged ORF unless noted otherwise, except ZoAl-RTD was untagged. The template RNA 5′ module was either a minimal ribozyme (TCARZ) or a slightly longer 5′ UTR region (TCA5). In Extended Data Fig. 2, template RNAs were unmodified uridine with TCA5, the indicated 3′ UTR, and R4A22. In Extended Data Fig. 3d, template RNAs were unmodified uridine with TCA5, GeFo 3′ UTR, R4 and the indicated A-tract. Other template RNA transcripts were unmodified uridine with hairpinleader_TCA5_CBh_ORF_SV40PA_GeFo3′ UTR_R4A22 (Fig. 2c–g and Extended Data Fig. 3a,b) or pseudouridine with rRNAleader_TCARZ_CMV_ORF_minPA_GeFo3′ UTR_R4A22 (Figs. 2h–k and 3; CMV-promoter transgene WSG; Extended Data Figs. 3c,f,g,i–n, 4–6 and 9b). In Extended Data Fig. 3e,h and CBh promoter transgene WSG, template RNA was hairpinleader_TCARZ_CBh_ORF_SV40PA_GeFo3′ UTR_R4A22. In Extended Data Fig. 3f,g, all template RNAs had the TCARZ 5′ module for consistency; the CBh promoter had SV40PA and the CMV and SV40 promoters had minPA.

Flow cytometry and cell sorting

Cells were trypsinized to collect, and trypsin was inactivated by addition of the cell-appropriate medium with 5% FBS. Cell samples were analyzed by Attune NxT Flow Cytometer (Thermo Fisher) under the voltage setting of FSC 70V, SSC 280V, BL1 250V (for GFP) and YL2 250V (for mCherry). Cell sorting was done on Sony Sorter LE-SH800 equipped with 488 and 561 nm lasers using the 130 µm chip under ultra-purity sorting mode. Data analysis was performed in FlowJo (v.10.8.1). When gating for GFP+ or mCherry+ population, cells transfected with only template RNA or template RNA and ZoAl-RTD were used as negative controls. The %GFP+ was calculated by subtracting template-alone %GFP+ from the parallel 2-RNA transfection %GFP+. Median GFP intensity was determined using only the GFP+ cells in a population. For overlaid histograms of GFP intensity profiles, ‘Normalized to mode’ was used to scale the y axis for better cross-comparison. Error bars are from three technical replicates. Every assay had independent experimental replicates. Gating strategy for flow cytometry and cell sorting is visualized in Extended Data Fig. 10.

To make clonal cell lines, transfections used an RNA dose of 1–1.5 µg with 1/2 or 1/3 MessengerMAX. Single GFP+ cells were sorted to 96-well microtiter plates 1 d after transfection. Cells were allowed to proliferate for approximately 3 weeks before screening for GFP expression: 24% of expanded cell lines for ZoAl-WT were GFP+, whereas 94% of expanded cell lines for ZoAl-ENT were GFP+. At 6–7 weeks postsorting, cells were used for genotyping and ddPCR. For GFP intensity stability measurements, the time points were postsorting roughly 7 and 15 weeks of proliferation. Cells for the early time point were frozen at around 5 weeks and then returned to culture for 2 weeks before the 15 week time point to be able to measure GFP intensity on the same day.

gDNA purification and junction PCR

Frozen cell pellets were thawed on ice and resuspended in 200 µl of RIPA lysis buffer (150 mM NaCl, 50 mM Tris-HCl pH 7.5, 1 mM EDTA, 1% Tx-100, 0.5% sodium deoxycholate, 0.1% SDS, 1 mM DTT). Each 200 µl of lysate was treated with 10 µl of 10 mg ml−1 RNase A (Thermo Fisher, catalog no. FEREN0531) at 37 °C for 30–60 min, followed by incubation with 5 µl of 20 mg ml−1 Proteinase K (Thermo Fisher, catalog no. FEREO0491) at 50 °C overnight. gDNA was then isolated by extraction with PCI and ethanol precipitation. After centrifugation, the aqueous layer was transferred to a fresh tube containing 50 µg glycogen, to which 1/10 volume 5 M NaCl and 3 vol 100% ethanol were added. gDNA was precipitated at −20 °C for at least 30 min. After a 30 min spin, gDNA pellets were washed 2–3 times with 75% ethanol, air-dried and resuspended in TE (10 mM Tris-HCl pH 8.0, 1 mM EDTA). gDNA prepared for WGS was instead dissolved in nuclease-free water. For PCR, 100–250 ng gDNA was used in a 25 µl of reaction with Q5 DNA polymerase (NEB). PCR primer sequences are listed in Supplementary Table 1b. PCR was as follows: 98 °C, 3 min (98 °C, 10 s; 65 °C, 30 s; 72 °C, 40 s per 1 kb) five times with annealing temperature decreasing by 1 °C per cycle (98 °C, 10 s; 60 °C, 30 s; 72 °C, 40 s per 1 kb) 25 times; 72 °C for 20 s. PCR products were analyzed on 1–2% agarose gels containing ethidium bromide and imaged using the Bio-Rad gel doc XR+ imaging system.

Telomerase activity assays

One day after transfecting ARPE-19 cells with mRNA and RNA template, cells were collected for protein extraction. RNA dose was 1.5 μg with 6 μl of MessengerMax. Cell extract was prepared by hypotonic freeze–thaw lysis as described above, except with a final concentration of 150 mM NaCl. Quantitative telomeric repeat amplification protocol was performed using 2 µl of approximately 2 mg ml−1 cell extract by standard protocol76 with iTaq universal SYBR green Supermix (Bio-Rad) and a CFX96 Touch Real-Time PCR machine (Bio-Rad). Radiolabeled-nucleotide telomeric repeat amplification protocol assays were performed using 5 µl of 3-, 9- and 27-fold extract dilutions using standard protocol77 of primer extension followed by PCR, with imaging by Typhoon Trio (Cytiva).

DNA damage assays

For relevant samples, drug treatment began 12 h before transfection. Medium was not changed and no additional drug was added at later time points. At indicated time points after 2-RNA delivery, cells were washed in PBS and trypsinized using minimal amounts of trypsin. Cells were resuspended in full-serum medium and allowed to recover for 20 min at 37 °C and 5% CO2. Cells were pelleted and washed in ice-cold PBS, and then resuspended in ice-cold Annexin binding buffer (10 mM HEPES pH 7.4, 140 mM NaCl, 2.5 mM CaCl2). A fraction of cells was subjected to Annexin V-AF594 (Invitrogen, catalog no. A13202) and SYTOX Blue (Thermo Fisher, catalog no. S34587) staining at room temperature for 15 min and then diluted for flow cytometry analysis. Collected data were analyzed according to R. Duggan’s method from the University of Chicago Flow Cytometry Core (https://voices.uchicago.edu/ucflow/2012/07/08/my-3-step-approach-to-gating-annexin-v-data-appropriately/). The double-negative fraction was gated for debris by very low forward and side scatter, well resolved from the live cell double-negative population.

For immunoblot analysis, 6 μg total of ZoAl mRNA and template RNA was transfected per 10 cm dish of RPE cells. At the indicated time points post-transfection, cells were washed and trypsinized and lysed as described above for protein purification. The supernatant was collected, and samples were normalized by total protein using Protein Assay Dye (Bio-Rad, catalog no. 5000006). Next, 60 μg of total protein was loaded in each lane of precast 4–15% TGX gels (Bio-Rad, 4561084). Protein was transferred to 0.2 μm nitrocellulose membrane (Bio-Rad, catalog no. 1620147), blocked in TBST (10 mM Tris-Cl pH 7.5, 150 mM NaCl, 0.1% Tween 20, 0.02% sodium azide) with 5% BSA and probed in the same buffer with rabbit anti-phospho-P53 (Ser15) (Invitrogen, catalog no. 14H61L24, 1:1,000), mouse anti-tubulin (Abcam, catalog no. ab44928, 1:1,000), or mouse anti-phospho-histone H2A.X (Ser139) (Invitrogen, catalog no. 6T2311, 1:1,000), followed by appropriate secondary, either Alexa Fluor 680 goat anti-rabbit (Invitrogen, catalog no. A21109, 1:2,000) or Alexa Fluor Plus 800 goat anti-mouse (Invitrogen, catalog no. A32730, 1:2,000). Detection was by LI-COR Odyssey. Because p-P53 and tubulin migrate similarly in SDS–PAGE, p-P53 was probed first and then tubulin.

ddPCR

gDNA was digested overnight with Bam HI and Xmn I (NEB). Multiplex 24 µl ddPCR reactions were prepared by mixing 12 µl of ddPCR supermix (no dUTP; Bio-Rad, catalog no. 1863024), forward and reverse primers for target and reference genes (IDT, 833 nM final concentration each), probes complementary to target and reference amplicons (IDT; FAM for target and HEX for reference, 250 nM final concentration each) and digested gDNA at 1–5 ng µl−1 final concentration. Oligonucleotide sequences are listed in Supplementary Table 1b. Reaction mix was transferred to a DG8 cartridge (Bio-Rad, catalog no. 1864007) along with 70 µl of droplet generation oil (Bio-Rad, catalog no. 1863005), and droplets were generated in a Bio-Rad QX200 Droplet Generator. Following droplet generation, 40 µl was transferred into a 96-well plate and heat-sealed with pierceable foil. The droplets were thermal-cycled under the manufacturer’s recommended conditions with an annealing and/or extension temperature of 56 °C and analyzed using QX Manager software with default settings.

RPP30 was used as the reference gene for all copy number analysis experiments. The copy number of RPP30 in each cell line was inferred using a panel of additional reference genes (ALB, MRTFB and RPPH1). We discovered that RPE and HeLa cells have an RPP30 copy number per genome of three, whereas ARPE-19, 293T, IMR-90, MRC-5 and monkey Vero cells have an RPP30 copy number of two. We were unable to determine RPP30 copy number in mouse C2C12 cells, so quantification assumed a copy number of two per genome. Primers to detect RPP30, ALB, MRTFB and RPPH1, and rDNA were adapted from sequences previously described78,79,80,81,82. Inferred transgene copy number was adjusted to an integer assuming slight under-replication of rDNA relative to reference genes in the asynchronous cell populations.

Genome sequencing and analysis

Cells were collected 1 d post-transfection. Purified gDNA was fragmented to 400–500 bp by Covaris shearing as part of Illumina library construction and NovaSeq 6000 PE150 sequencing performed by QB3 genomics facilities at UC Berkeley. Bioinformatic analyses were performed on the Berkeley Research Computing Savio cluster with SLURM job scheduling or on an Apple M1 Max processor. PCR and optical duplicates were removed with BBMap v.38.97 (https://sourceforge.net/projects/bbmap/) and reads were trimmed for quality with Trimmomatic v.0.39 (ref. 83). Reads shorter than 36 bp or with an overall PHRED quality less than 30 were discarded. All alignments were performed with bwa mem v.0.7.17 using default parameters84. Paired reads were aligned to transgene sequence precisely inserted between flanking 840 bp tracts of rDNA. Unmapped mates or portions of reads exceeding 20 bp were aligned to a complete rDNA unit using a consensus rDNA scaffold (GenBank KY962518.1). Read portions remaining unaligned were then mapped to the T2T-CHM13v2.0 human genome reference85. Finally, still-unaligned portions of reads too short for alignment by bwa mem were aligned to the rDNA reference or transgene template sequence with approximate string matching using fuzzysearch (https://github.com/taleinat/fuzzysearch). The following reads were then discarded: mate pairs without both reads mapped and spurious transgene-aligned reads (for example, reads aligning better to the human genome than to the transgene). To detect contaminating genetic material from pooled sequencing, reads were mapped to a curated list of observed contaminants, including the SARS-CoV-2 genome; reads mapping to these nonhuman sequences were discarded.

On-target reads were defined as those with transgene sequence and downstream rDNA beginning within 3 bp of the target-site nick. Off-target reads were defined as those with transgene sequence and (1) rDNA sequence not at the target site, or (2) downstream sequence mapping elsewhere in the human genome. Loci of putative off-target insertions were aligned to the reference target site with T-Coffee on the EMBL-EBI webserver86. The base frequencies at each position across aligned candidate TaGu off-target insertion sites were tallied and depicted with visualization tools from DeepLIFT87. To determine the initiation site of TPRT within on-target reads, fuzzysearch was used to find the 3′ end of transgene sequence (query sequence TGTTCGG on top strand after second-strand synthesis) and downstream rDNA sequence (query sequence TAGCCAA) within the read. The intervening sequence was used to infer nicking and initiation of TPRT.

Determination of the rDNA position of 5′ junction formation used the join category of junctions because anneal junctions are not informative. The 5′ junction category snap-back reads were identified by transgene-adjacent sequence mapping to the opposite strand of the transgene or rDNA scaffold. The 5′ junction category ‘other’ contained upstream sequences mapping somewhere in the genome other than rDNA joined to a transgene 5′ end. If sequence upstream of a 5′ transgene junction did not map, it was not classified. Only a strict subset of these reads were reclassified to the 5′ junction category ‘extra’ template, if by manual evaluation NCBI BLAST revealed that (1) the sequence mapped unambiguously to a single transcript or class of transcripts, (2) the insertion had correct strandedness for reverse transcribing an RNA and (3) reverse transcription began near or at the 3′ end of the annotated RNA transcript. The 5′ junction category tandem insertion reads was defined by the presence of upstream sequence mapping to the 3′ end of a transgene and downstream sequence mapping to the 5′ end of a full-length or truncated template cDNA. Any 5′ junction transgene reads without mapping portion were excluded. Internally gapped transgenes were identified by reads with upstream and downstream portions mapping noncontiguously to the same strand of the transgene reference. Microhomology at the junction was identified by comparing whether the last base on the upstream-aligned portion of the read matched the base on the reference sequence immediately before the downstream-aligned portion of the read. This procedure was repeated iteratively until the first nonmatching base was found. The same procedure was repeated for the other side of the junction, beginning with first downstream-aligned base and the base on the reference sequence immediately after the upstream-aligned portion of the read. The sum of these two iterative matching procedures was considered maximum possible microhomology.

Plasmid insertion assays

Target plasmid backbone was pRSF-1, which confers kanamycin resistance. The added rDNA target site was composed of rDNA sequence −43 to +21 relative to the initial nick. Template RNA was made with unmodified uridine and had TCARZ 5′ module, chloramphenicol acetyltransferase promoter and ORF, a termination signal for E. coli RNAP and 3′ module GeFo 3′ UTR_R4A22. RPE cells, 1 million per treatment, were reverse-transfected in six-well plates with 1.5 µg RNA at a 1/3 molar ratio mRNA/template and 1 µg target plasmid. RNAs and DNA were added together to Lipofectamine 3000; then that mixture was added to cells. Cells were collected 1 d post-transfection and plasmids were separated from chromosomal DNA largely as described88,89. Cells were washed twice with Dulbecco’s PBS (Thermo Fisher, catalog no. J67802) and then lysed in the dish by incubation with 400 µl of lysis buffer (10 mM Tris-HCl pH 8.0, 10 mM EDTA, 0.6% SDS) for 5 min at room temperature. Lysates were transferred into 1.5 ml tubes followed by addition of 1/4 volume 5 M NaCl and overnight incubation at 4 °C to precipitate gDNA. Lysates were then spun at 18,000g for 30 min and plasmid DNA in the supernatant was purified using PCI followed by a chloroform back-extraction and ethanol precipitation. Pellets were resuspended in 7 µl of nuclease-free water and 1 µl was electroporated into 20 µl of ElectroMAX DH10B competent cells (Thermo Fisher, catalog no. 18290015) following the recommended settings of the manufacturer. Following a 2 h recovery period shaking at 37 °C, 1/30 of the transformation was plated on Luria-Bertani agar plates containing kanamycin and chloramphenicol. Colonies were manually counted and picked at random for full-plasmid nanopore sequencing (Primordium Laboratories).

AB1 files were converted to fastq format with biopython90 (v.1.79) and then aligned with minimap2 (refs. 91,92) to a reference sequence containing the transgene precisely inserted at the target site. Unmapped portions of reads exceeding 20 nt were aligned again to the reference plasmid (using bwa mem v.0.7.17) to map any duplicated segments. Portions of reads remaining unaligned were then investigated manually using NCBI BLAST. Plasmids with inferred recombination during E. coli growth or inverted transgene insertions were excluded from further analysis. To estimate the error rate of transgene sequence insertion, individual plasmid consensus sequences were aligned in a pairwise fashion to the reference plasmid using biopython pairwise2.align.globalms with match score of 2, mismatch penalty of −1, gap opening penalty of −2 and gap extension penalty of −1. From pairwise alignments, the number of substitutions or additional nucleotides was counted. Because homopolymer sequences are a known source of error for full-plasmid sequencing, any changes within homopolymers were excluded from analysis. The error rate reported is the ratio of observed substitutions (1) to the total number of sequenced 3′ UTR bp (13,872). As a control, the same procedure was used to search for substitutions in the plasmid backbone, with none found.

Statistics and reproducibility

Each experiment described in this paper was repeated with at least one biological replicate, with similar results. This includes all experiments for which a representative gel is shown, as well as bar graphs providing results from triplicate technical assays.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.



Source link

Leave a Comment