Table of Contents
Retrosynthesis versus forward synthesis
Traditionally, chemists are accustomed to analysing how to make desired target molecules (retrosynthesis) rather than what molecules can be made from a given set of substrates (forward synthesis). However, a computerized retrosynthesis approach25,26,27,28,29 is ill suited for our purpose because it is not a priori known which valuable products are synthesizable from the waste substrates: If retrosynthetic searches to these targets do not terminate after a long time, it is impossible to distinguish whether they simply need more iterations28 or whether a given drug molecule cannot be navigated to waste precursors (and in this case, the searches will never terminate). By contrast, forward searches can exhaustively delineate the networks of molecules synthesizable from a given set of substrates including these (and only these) valuable products that are makeable from waste. Moreover, such networks are highly interconnected16, ensuring that large numbers of possible synthetic solutions can be identified.
Choice of substrates
As ‘chemical waste’, we considered 189 small molecules which we identified to be waste by-products of large-scale industrial processes. Within this ‘basic set’, we further identified a ‘commercial’ subset of 56 molecules that are recycled from chemical waste or biomass, and are available commercially from companies located mostly in Asia, North America and Europe (see coloured star markers in Fig. 1 and full list in Supplementary Information section 1). For example, Chinese Jiangsu Kesheng Chemical Machinery company makes resorcinol as part of aramid fibre production process; USA-based BioCellection produces succinic, glutaric and adipic acids from plastic wastes, and European conglomerate Global Industrial Dynamics offers ethylene derived from waste biomass. All of these molecules are pre-loaded into the Allchemy software (https://waste.allchemy.net) and additional entities can be proposed via https://wastedb.allchemy.net portal (for details, see Supplementary Fig. 12). We note that although some of the ‘wastes’ are widely used as solvents, we are not interested in their uses as such—instead, they should be used as reaction substrates. In some searches, we also consider auxiliary sets—notably, 1,000 basic reagents most often used (as quantified in ref. 34) in literature-reported syntheses and including molecules such as nitromethane, phthalimide and di-tert-butyl dicarbonate (for full list, see https://github.com/rmrmg/wasteRepo/blob/main/popular_reagents.smi).
Definitions of process variables X
Detailed definitions of the process variables discussed in the text are as follows.
X1 is a penalty assigned to reactions using harmful reagents as defined by GSK criteria17,18. The GSK’s original scores are rescaled to the range 0–1 (10 = most harmful). In most cases, alternative reagents are also suggested, and the final value is calculated as weighted average of the ‘primary’ and alternative conditions (0.3:0.7 weights).
X2 penalizes problematic solvents as defined by GSK19. The specific value is assigned on the 0–10 scale as for X1.
X3 assigns a +10 penalty for extreme reaction temperatures below −20 °C or above 150 °C.
X4 expresses a penalty that is linearly proportional to the exothermicity, ΔH/2, or endothermicity, ΔH/5, of reactions. The penalty is bounded to +10; ΔH is calculated using Benson’s group contributions method and is expressed in kcal mol−1.
X5 assigns a +10 ‘cost’ for executing each reaction step (this variable simply promotes shorter pathways). If consecutive steps can be performed in the same solvent (one pot), the penalty is reduced to 3.
X6 penalizes reactions that are characterized for low atom economy, defined as in ref. 32, and takes into account both substrates and reagents. Its role is to promote reactions that produce the least amount of by-products and/or waste. Each reaction gets a score ranging from 0 to 10.
X7 promotes convergent rather than linear pathways. This variable is defined to account for the position of the convergence point, and is expressed as an average of two terms, (linearity penalty + convergence location)/2. In this expression, the ‘linearity penalty’ is defined by the ratio of the longest linear sequence to the total number of reactions. The ‘convergence location’ term promotes routes in which convergence point(s) are closer to the final product, and is expressed as \(1-\exp (-0.1\times \sum _i\rmavgYield^-N_i)\), where avgYield is the average yield of a typical organic reaction (taken here as 75%)38, Ni is a distance measured in synthetic steps from substrate i to the target, and the sum is over all substrates. The average of the two terms is multiplied by 10 to give a final score of a pathway between 0 and 10 (for examples of this scoring scheme for different pathway structures, see Supplementary Information section 4.3).
X8 is a ‘geolocation’ variable that assigns a penalty to pathways in which the waste substrates come from different continents (see the stars in Fig. 1), implying increased transportation costs and/or longer delivery times. The overall pathway score is divided by a coefficient >1 if all ‘waste’ substrates are on the same continent. Here we promote such pathways by up to 20% (coefficient 1.25). If, for the substrates we considered, the location of production could not be determined, the geolocation was assigned to the company’s country of origin (although, in the Allchemy web application, the variable can also be calculated for user-defined locations, see Supplementary Fig. 6).
X9 penalizes pathways with high estimated cumulative PMI, calculated based on a previous methodology39 and using tables40 of PMI values for individual reactions. The raw value of cumulative PMI is rescaled to a range 1–1.5 based on the user-selected purification method. The overall pathway score is then multiplied by \(X_9^w_9\), promoting pathways with the lowest cumulative PMI (for calculation details see Supplementary Information section 4.1).
Allchemy is a software platform for forward synthesis—that is, for iterative generation of synthetically plausible products and synthetic routes starting from arbitrary, user-defined substrates. The software can be run in either batch or web application modes; the web app can be used to visualize pathways obtained via both of these modalities. Allchemy’s web-app is based on the Django (https://www.djangoproject.com) framework and uses the d3.js library (https://d3js.org) for graph representation. Substrates can be input as SMILES or drawn in Chemwriter (https://chemwriter.com). Results of synthetic calculations are stored using PostgreSQL (https://postgresql.org). Communication between the web app and Allchemy’s backend is supported by Redis (https://redis.io) and RQ queue systems (https://python-rq.org).
The software has different modules focused on various aspects of forward synthesis: from the generation and exploration of networks created by prebiotic chemistries16, to in silico combinatorial chemistry and scaffold optimization, to targeted searches towards specific molecules (here, drugs and agrochemicals). The prebiotic-chemistry module is based on ~600 reaction rules generally accepted as plausible under conditions of primitive Earth; other modules are based on ~10,000 rules covering reactions commonly used in pharmaceutical chemistry (including stereoselective ones) as well as those most capable of generating molecular diversity in as few synthetic generations as possible (multicomponent reactions, rearrangements). All rules are coded in the SMARTS notation and each has a much broader scope than any particular literature precedent underlying it (see section ‘Reaction rules’ and references16,23,25,26).
In the ‘targeted’ searches implemented in this work, at each synthetic generation (Fig. 2a, b), the rules are applied to the original substrates and to the subset of intermediates retained (that is, those that can still serve as useful building blocks and those above a certain similarity threshold to the ‘target’ molecules). A molecule is deemed suitable for a given reaction if it contains the core of at least one substrate as defined by the reaction rule but, at the same time, does not contain any groups incompatible with the reaction. These matching conditions are evaluated using the ‘GetSubstructMatches’ function from the RDKit library (www.rdkit.org). Reactions are executed using the ‘RunReactants’ function from the ChemicalReaction class of the RDKit library with in-house enhancements to enforce proper stereochemistry and/or tautomeric forms. If a reaction template matches more than one locus on the substrate, RunReactants is executed at each and all of them. The products generated by RunReactants are filtered by algorithms developed in-house to recognize and eliminate chemically invalid molecules (for example, those violating Bredt’s rules) as well as molecules that do not satisfy user-specified constraints (for example, those exceeding a certain allowed molecular mass). As the network of reactions is being generated, reaction paths leading to each molecule are stored as an ordered list of reaction steps, each of which is a tuple of reaction SMILES and reaction name.
With reference to Fig. 4, we first considered synthesis of the antibiotic dapsone (Extended Data Fig. 4, bottom) from lactic acid and phenol. Unlike in a traditional route based on double aromatic nucleophilic substitution of 4-chloronitrobenzene with sodium sulfide, this synthesis relies on the Smiles rearrangement involving bisphenol S 1 and 2-bromopropionamide 2, the latter prepared from lactic acid as described previously53. We validated this transformation, which is to our knowledge previously unreported, under benign conditions (K2CO3, KI, 50 °C in DMSO followed by NaOH, 130 °C in DMSO), achieving 82% yield (Fig. 4a, starred step I).
The second example was synthesis of carvedilol used to treat high blood pressure, congestive heart failure, and left ventricular dysfunction. Its proposed waste-to-drug synthesis (starting from aniline from biomass, guaiacol from lignin waste, and resorcinol from textile industry) features only one previously undescribed reaction, reductive amination of 2-(2-methoxyphenoxy)acetaldehyde 4. We carried out this transformation, denoted by a star II in Fig. 4b in 86% yield using a previously proposed environmentally friendly approach54 (Rh/Al2O3 catalyst and 25% aqueous solution of ammonia).
In the synthesis of a heart medication bisoprolol, four steps, denoted by stars III–VI in Fig. 4c, lacked direct literature precedent. Straightforward esterification of 4-(allyloxy)benzoic acid 6 (from 4-hydroxybenzoic acid recyclable from lignin processing) proceeded in 72% yield (star III), followed by quantitative reduction of ethyl 4-allyloxybenzoate 7 (star IV). Subsequent conversion of 8 to the corresponding 4-allyloxybenzyl chloride 9 was based on a published procedure and also proceeded in quantitative yield. This chloride was then alkylated with 2-isopropoxyethanol 10 (under phase transfer catalysis conditions with 50% NaOHaq) to give allyl ether of 4-(2-isopropoxyethoxymethyl)-phenol 11 in 85% yield (star V). Finally, the unsaturated product was treated with Oxone in aqueous solution of phosphate buffer resulting in 4-(2-isopropoxy-ethoxymethyl)phenyl glycidyl 12 ether in 81% yield (star VI).
In the synthesis of the topical anaesthetic proxymetacaine (starting from p-hydroxybenzoic acid from lignin waste and four other waste substrates: propanol, formaldehyde, acetaldehyde and acetonitrile; see Supplementary Table 1), three steps required experimental validation. With reference to Fig. 4d, 2-(diethylamino)ethanol 15 was obtained from 1,4-dioxane-2,5-diol (dimer of 14) and diethyl amine 13 in 48% yield (star VII) via reductive amination in ethyl acetate using NaBH(OAc)3 as reducing agent. Esterification reaction between 2-(diethylamino)ethanol 15 and 4-hydroxy-3-nitrobenzoic acid 16 in dry toluene in the presence of catalytic amount of HCl followed to give 2-(diethylamino)ethyl 4-hydroxy-3-nitrobenzoate 17 in 67% yield (star VIII). Subsequently, this product engaged in alkylation reaction with n-propyl chloride 18 in acetonitrile providing 2-(diethylamino)ethyl 3-nitro-4-propoxybenzoate 19 in 89% yield or in 54% yield in greener acetone (star IX). Further synthetic details of this and other routes discussed in this section are provided in Supplementary Information section 5.
Regarding larger-scale validations, the processes for cisatracurium, midazolam, and propofol precursors were all conducted on ODP’s reconfigurable platforms. Sub-kits utilized plug flow reactors with perfluoroalkoxy tubing flow paths, commercial continuous stirred tank reactors, and in-house designed filter–washer–dryers that have been described previously20. Reagents were purchased from their respective vendors and used as is without any need for additional purification. Simulated waste streams were created as described in Supplementary Information section 6, and analysis was carried out through HPLC versus a commercial standard.