0024 - OptoGPT论文精读：光学多层薄膜逆向设计的基础模型

一、文献核心速览

1. 文献基本信息

项目	内容
标题	OptoGPT: A foundation model for inverse design in optical multilayer thin film structures
作者	Taigao Ma, Haozhu Wang, L. Jay Guo
发表期刊	Opto-Electron Advances (光电进展)
发表年份	2024
卷期页	Vol. 7, No. 7, 240062
DOI	https://doi.org/10.29026/oea.2024.240062
收稿/接受/在线发表	2024-03-19 / 2024-06-03 / 2024-07-10

2. 核心结论

统一框架解决多类型结构设计问题：提出"结构标记"(structure token)和"结构序列化"(structure serialization)方法，首次实现用统一模型处理不同材料组合、不同层数的多层薄膜结构逆向设计，突破传统方法输出维度固定的限制。
卓越的泛化与设计能力：在1000个验证集目标上的平均绝对误差(MAE)为0.0258，优于训练集中最接近结构的0.0296；经厚度微调后MAE降至0.0192，设计速度仅需约0.1秒，与TMM正向仿真速度相当。
支持多样化与灵活化设计：通过概率采样机制，单次运行可输出多个满足目标的差异化结构；通过"概率重采样"技术可灵活施加材料和厚度约束，无需重新训练模型即可适应不同 fabrication 需求。
角度与偏振扩展能力：基于预训练模型，仅需1%的算力通过微调即可适配不同入射角和偏振态；提出"混合采样"方法实现多角度/多偏振同时设计。
自动学习物理规律：t-SNE可视化显示模型隐式学到了材料折射率高低分类、金属与非金属区别、薄层光学行为相似性等物理知识，无需显式输入折射率数据。

3. 核心价值

创新点：

首次将decoder-only Transformer架构应用于光学多层薄膜逆向设计领域
提出structure token表示法，将材料和厚度信息融合为统一序列
开发概率采样+微调的联合设计范式，兼顾设计多样性与精度

对领域的贡献：

建立了光学多层薄膜逆向设计的基础模型(foundation model)，可作为该领域的通用设计平台
解决了传统优化方法需逐目标重新计算、传统深度学习方法结构类型受限的双重瓶颈
提出了一套从理论到实践的完整逆向设计方法论

适用的研究场景：

结构色(structural color)设计
光谱滤波器(spectrum filter)设计
完美吸收体(perfect absorber)设计
分布布拉格反射器(DBR)设计
法布里-珀罗(FP)谐振器设计
光伏、辐射制冷等多层薄膜应用场景

4. 研究方法

一句话概括：基于decoder-only Transformer架构，通过"结构标记序列化"将多层薄膜结构表示为可变长序列，利用1000万组结构-光谱配对数据进行预训练，采用自回归概率采样机制实现从目标光谱到材料-厚度序列的条件生成，并结合微调与约束采样技术扩展至角度/偏振/多目标设计场景。

二、文献中英对照全文

OptoGPT: A foundation model for inverse design in optical multilayer thin film structures
OptoGPT：光学多层薄膜结构逆向设计的基础模型

Optical multilayer thin film structures have been widely used in numerous photonic applications. However, existing inverse design methods have many drawbacks because they either fail to quickly adapt to different design targets, or are difficult to suit for different types of structures, e.g., designing for different materials at each layer. These methods also cannot accommodate versatile design situations under different angles and polarizations. In addition, how to benefit practical fabrications and manufacturing has not been extensively considered yet. In this work, we introduce OptoGPT (Opto Generative Pretrained Transformer), a decoder-only transformer, to solve all these drawbacks and issues simultaneously.
光学多层薄膜结构已被广泛应用于众多光子学应用中。然而，现有的逆向设计方法存在诸多缺陷，因为它们要么无法快速适应不同的设计目标，要么难以适用于不同类型的结构，例如为每层设计不同的材料。这些方法也无法适应不同角度和偏振下的多样化设计场景。此外，如何有利于实际制造和加工尚未得到充分考虑。在本工作中，我们引入了OptoGPT（光学生成式预训练Transformer），一种仅解码器的Transformer，来同时解决所有这些缺陷和问题。

Introduction / 引言

Optical multilayer thin film structure is one of the most vital photonic structures widely used in many applications, including structural color, filters, absorbers, distributed Bragg reflectors (DBR), Fabry–Pérot (FP) resonators, photovoltaic and radiative cooling, among others. Inverse design seeks to identify the best material arrangements and obtain thickness combinations to achieve user-desired optical targets, which is critical to enable many of the above applications.
光学多层薄膜结构是最重要的光子结构之一，广泛应用于许多应用领域，包括结构色、滤波器、吸收体、分布布拉格反射器（DBR）、法布里-珀罗（FP）谐振器、光伏和辐射制冷等。逆向设计旨在确定最佳的材料排列并获得厚度组合，以实现用户期望的光学目标，这对实现上述许多应用至关重要。

Currently, there are two types of mainstream inverse design methods: 1) optimization-based methods, which rely on numerical simulations and iterative searches to minimize the difference between designed and targeted optical responses; and 2) deep learning-based methods, which use neural networks to learn a general mapping from the space of target responses to the space of optical multilayer thin film structures after training on a large dataset.
目前，主流的逆向设计方法有两种类型：1）基于优化的方法，依赖数值仿真和迭代搜索来最小化设计与目标光学响应之间的差异；2）基于深度学习的方法，使用神经网络在大规模数据集上训练后学习从目标响应空间到光学多层薄膜结构空间的通用映射。

Although widely used, both methods have their own limitations, either from the perspective of design targets or types of designed structures. Optimization-based methods require running the algorithm from scratch when given a new or a different design target, which can be time-consuming. Deep learning-based methods are versatile for design targets, but existing works lack the ability to design for different types of structures (e.g., different material combinations at each layer; different total number of layers, etc). In addition, both methods seldomly examine how to expand the inverse design capabilities for angled incidence with different polarizations that are important for many applications, as well as simultaneous design under multiple conditions required for certain applications.
尽管被广泛使用，两种方法都有各自的局限性，无论从设计目标还是设计结构的类型角度来看。基于优化的方法在面对新的或不同的设计目标时需要从头运行算法，这可能非常耗时。基于深度学习的方法对设计目标具有通用性，但现有工作缺乏设计不同类型结构的能力（例如，每层不同的材料组合；不同的总层数等）。此外，两种方法都很少研究如何扩展不同偏振斜入射的逆向设计能力，这对许多应用很重要，以及某些应用所需的多种条件下的同时设计。

In addition to the above drawbacks, both methods also fail to accommodate the following two features that are vital for practical fabrications: diversity and flexibility. By diversity we mean that a single method can output multiple designs so that researchers can select for their fabrication based on the availability of materials and deposition methods, while flexibility allows researchers to arbitrarily impose restrictions on the material selection and thickness range at any layers for their fabrication or design needs. An inverse design method that can effectively meet these requirements will significantly bridge the gap between design and fabrication, making the design algorithm more practical.
除了上述缺陷外，两种方法还都无法满足对实际制造至关重要的两个特性：多样性和灵活性。多样性是指单一方法可以输出多个设计方案，以便研究人员可以根据材料可用性和沉积方法选择用于制造；灵活性允许研究人员根据制造或设计需求对任意层的材料选择和厚度范围施加任意限制。能够有效满足这些要求的逆向设计方法将显著弥合设计与制造之间的差距，使设计算法更具实用性。

In this work, we propose OptoGPT (Opto Generative Pretrained Transformer), a decoder-only transformer model that can potentially address all these issues and unify the multilayer structure inverse design. To do so, first, we introduce “structure token” to fuse the representation of material and thickness and “structure serialization” to unify different types of structures. Next, we propose several techniques to unify the design target in different tasks as a combined reflection and transmission spectrum target. Further, a series of techniques based on “finetuning” and “probability sampling” are developed to unify the design under different angles and polarization, simultaneous design under multiple incident angles, as well as achieving diversity and flexibility for structure fabrication. Based on the empirical results demonstrated, we believe that OptoGPT can serve as a foundation model for the design of optical multilayer thin films across a diverse array of applications.
在本工作中，我们提出了OptoGPT（光学生成式预训练Transformer），一种仅解码器的Transformer模型，可以潜在地解决所有这些问题并统一多层结构逆向设计。为此，首先，我们引入了"结构标记"来融合材料和厚度的表示，并引入"结构序列化"来统一不同类型的结构。接下来，我们提出了几种技术，将不同任务中的设计目标统一为组合的反射和透射光谱目标。此外，开发了一系列基于"微调"和"概率采样"的技术，以统一不同角度和偏振下的设计、多个入射角下的同时设计，以及实现结构制造的多样性和灵活性。基于所展示的经验结果，我们相信OptoGPT可以作为基础模型，服务于各种应用中光学多层薄膜的设计。

Methods / 方法

Designing a multilayer structure involves determining the material choice at each layer and the corresponding thicknesses of these layers. The major reason that existing deep learning-based methods cannot deal with different types of structures is that the output of these neural networks has fixed size that corresponds to a pre-defined structure, e.g., the three-layer structure of Ag/SiO2/Ag, the six-layer structure of MgF2/SiO2/Al2O3/TiO2/Si/Ge, and the twenty-layer structure of alternating SiO2/Si3N4, etc. Therefore, these models can only design thickness for each layer and do not allow different material choices. This also make these models fail to accommodate structures with different number of layers.
设计多层结构涉及确定每层的材料选择以及这些层的相应厚度。现有的基于深度学习的方法无法处理不同类型结构的主要原因是这些神经网络的输出具有固定大小，对应于预定义的结构，例如Ag/SiO2/Ag的三层结构、MgF2/SiO2/Al2O3/TiO2/Si/Ge的六层结构、以及交替SiO2/Si3N4的二十层结构等。因此，这些模型只能为每层设计厚度，不允许不同的材料选择。这也使这些模型无法适应不同层数的结构。

Here, we propose structure tokens and structure serialization to obtain the collaborative representation of materials and their thicknesses on the same footing, and treat the inverse design task as a conditional sequence generation problem.
在这里，我们提出了结构标记和结构序列化，以获得材料及其厚度的协同表示，并将逆向设计任务视为条件序列生成问题。

Structure tokens and structure serialization / 结构标记与结构序列化

To address the aforementioned issues of existing approaches, we propose to treat material and thickness equally by concatenating them together to form a “structure token”. Adding these tokens one by one, we can covert a multilayer structure into a sequence, which will be referred as “structure serialization”. Figure 1(d) gives one example of serializing a N-layer structure on the glass substrate using a sequence with N + 1 tokens. The first N tokens describe the material and thickness at each layer and the last token is a special ‘EoS’ token that denotes the end of the sequence. Utilizing this approach, we can remove the limitation of fixed output size in the previous work and represent different types of structures (e.g., different material combinations at each layer; different total number of layers) in a unified approach.
为解决现有方法的上述问题，我们建议通过将材料和厚度连接在一起形成"结构标记"来同等对待它们。逐个添加这些标记，我们可以将多层结构转换为序列，这被称为"结构序列化"。图1(d)给出了使用N+1个标记序列化玻璃基底上N层结构的一个示例。前N个标记描述每层的材料和厚度，最后一个标记是特殊的’EoS’标记，表示序列的结束。利用这种方法，我们可以消除先前工作中固定输出大小的限制，并以统一的方式表示不同类型的结构（例如，每层不同的材料组合；不同的总层数）。

In this work, we consider 18 different materials, and discretize the thickness in the range of [10, 500] nm with a step size of 10 nm. Although reducing the step size to a smaller value can improve the design performance, significantly much more dataset is needed for training. Also, considering the fabrication variation during layered deposition process, we choose 10 nm step size by balancing these factors. Note that we can always use thickness finetuning (discussed later) based on the designed structure from our model to remove such restrictions and obtain a more accurate structure. Therefore, for each layer, there are 18 × 50 + 1 = 901 possible tokens, corresponding to 900 different combinations of material and thickness plus one special ‘EoS’ token. We set the maximum number of layers to be 20, making the total number of multilayer structures under design consideration to be (901)^20 ∼ 10^59.
在本工作中，我们考虑了18种不同的材料，并将厚度在[10, 500] nm范围内以10 nm为步长离散化。虽然将步长减小到更小值可以改善设计性能，但需要更多的数据集进行训练。此外，考虑到层状沉积过程中的制造变化，我们通过平衡这些因素选择10 nm步长。请注意，我们可以始终基于模型设计的结构使用厚度微调（稍后讨论）来消除此类限制并获得更精确的结构。因此，对于每层，有18 × 50 + 1 = 901个可能的标记，对应于900种不同的材料和厚度组合加上一个特殊的’EoS’标记。我们将最大层数设置为20，使得设计考虑中的多层结构总数达到(901)^20 ∼ 10^59。

Conditional sequence generation / 条件序列生成

Since the output format is now a sequence of tokens, the inverse design problem is equivalent to the sequence generation problem conditioned on the input of design targets. This is a problem that has been extensively researched and resolved in the Natural Language Processing (NLP) field using the Generative Pretrained Transformer (GPT) model, especially the widely known ChatGPT. Given some texts as input, e.g., a question or task description, GPT models can generate and output a text sequence that relates to the input. We propose to use similar GPT models to solve our problem. Differently, the input is the optical targets in the form of optical spectra (as a function of wavelength) while the output is the serialized physical structure of material and thickness.
由于输出格式现在是标记序列，逆向设计问题等价于以设计目标输入为条件的序列生成问题。这是一个在NLP领域使用生成式预训练Transformer（GPT）模型已被广泛研究和解决的问题，尤其是广为人知的ChatGPT。给定一些文本作为输入，例如问题或任务描述，GPT模型可以生成并输出与输入相关的文本序列。我们建议使用类似的GPT模型来解决我们的问题。不同的是，输入是以光谱形式（作为波长的函数）的光学目标，而输出是材料和厚度的序列化物理结构。

In this work, we set the reflection and transmission spectrum under normal incidence as our design target. The wavelength range covers the whole visible and near-infrared (NIR) region, spanning from 400 nm to 1100 nm with 10 nm step. We further propose a series of techniques that can expand the design target to absorption spectrum, and reflective/transmissive structural color with minimal adaptations.
在本工作中，我们将正入射下的反射和透射光谱设置为我们的设计目标。波长范围覆盖整个可见光和近红外（NIR）区域，从400 nm到1100 nm，步长为10 nm。我们进一步提出了一系列技术，可以通过最小的调整将设计目标扩展到吸收光谱以及反射/透射结构色。

Model architecture / 模型架构

Figure 2(a) shows the architecture of our OptoGPT model. For the input, the spectrum target will go through a spectrum embedding to obtain its high-dimension hidden representation. For the output, the structure tokens will first go through a physical embedding layer to obtain its high-dimension hidden representation and then go through positional embeddings to obtain the relative position of each token inside this sequence. After that, both hidden representations of the input spectrum and output structures will go through a series of decoder blocks which contains attention layers, the major working mechanism behind GPT. The first self-attention layer is used to learn the relationship between layered structures, while the second cross-attention layer can capture the relationship between the input spectrum and the multilayer structure. Their output will further be used to give a probability distribution over all tokens. Our model is trained for ~200 epochs based on “next-word prediction” using this probability output.
图2(a)展示了我们OptoGPT模型的架构。对于输入，光谱目标将通过光谱嵌入以获得其高维隐藏表示。对于输出，结构标记将首先通过物理嵌入层以获得其高维隐藏表示，然后通过位置嵌入以获得序列中每个标记的相对位置。之后，输入光谱和输出结构的隐藏表示都将通过一系列包含注意力层的解码器块，这是GPT背后的主要工作机制。第一个自注意力层用于学习分层结构之间的关系，而第二个交叉注意力层可以捕获输入光谱与多层结构之间的关系。它们的输出将进一步用于给出所有标记上的概率分布。我们的模型基于此概率输出，通过"下一个词预测"训练约200个epoch。

We generate a large training dataset with 10 million samples and a validation dataset with 1 million samples. The total number of datasets is only ∼ 1/10^52 of the possible structures. Each sample is a pair of a randomly sampled multilayer thin film structure on a glass substrate and the corresponding spectra simulated using Transfer Matrix Methods (TMM). Our model is trained on a single NVIDIA3090 GPU for roughly two weeks.
我们生成了一个包含1000万个样本的大型训练数据集和一个包含100万个样本的验证数据集。数据集的总数仅为可能结构的∼ 1/10^52。每个样本是一对随机采样的玻璃基底上的多层薄膜结构和使用转移矩阵方法（TMM）模拟的相应光谱。我们的模型在单个NVIDIA 3090 GPU上训练了大约两周。

Inverse design / 逆向设计

Once trained, our model can be used to design for a given input spectra target, specifically, our model finishes the design layer-by-layer in an auto-regressive way. When designing the ith layer, our model takes in the target spectrum together with the previously designed i − 1 tokens, and outputs a probability distribution for all 900+1 tokens. Sampling from this distribution gives the design at the ith layer. These tokens will again be used as the input when designing the (i + 1)th layer. This design process will keep going until reaching the maximum layer of 20 or ‘EoS’ is sampled.
训练完成后，我们的模型可用于为给定的输入光谱目标进行设计，具体而言，我们的模型以自回归方式逐层完成设计。在设计第i层时，我们的模型接收目标光谱以及先前设计的i-1个标记，并输出所有900+1个标记的概率分布。从该分布采样得到第i层的设计。这些标记将在设计第(i+1)层时再次用作输入。这个设计过程将持续进行，直到达到最大层数20或采样到’EoS’。

This probability sampling has many advantages. First, because of the randomness during sampling, running each separate design process can output different structures. Therefore, the method inherently introduces diversity in the designed structure, capable of output multiple structures that satisfy the design target. In addition, it enables our model to design structures with different number of layers. For example, when ‘EoS’ is sampled at the fifth layer, our model terminates the design process and output the existing four-layer structure. The probability sampling will also be used to handle design with constraints in the latter section.
这种概率采样有许多优点。首先，由于采样过程中的随机性，运行每个独立的设计过程可以输出不同的结构。因此，该方法固有地在设计的结构中引入了多样性，能够输出多个满足设计目标的结构。此外，它使我们的模型能够设计不同层数的结构。例如，当在第五层采样到’EoS’时，我们的模型终止设计过程并输出现有的四层结构。概率采样还将在后文用于处理带约束的设计。

Results / 结果

Visualization of structure tokens / 结构标记可视化

Before presenting the training and inverse design results, it is instructive to examine if the proposed structure tokens can capture the material and thickness information. We use the t-distributed stochastic neighbor embedding (t-SNE) to visualize their hidden embeddings by reducing the high-dimensional embeddings to 2-D.
在展示训练和逆向设计结果之前，检验所提出的结构标记是否能够捕获材料和厚度信息是有指导意义的。我们使用t分布随机邻域嵌入（t-SNE）通过将高维嵌入降维到2维来可视化它们的隐藏嵌入。

We visualize the result of dimension reduction in Fig. 3(b). Several interesting features are immediately observed. First, the physical structures (colored traces consisting of individual dots representing the structure token) and optical spectra responses (encircled cluster of green crosses) are well separated in this 2-D representation, even though they were fed into training on the equal footing. This demonstrates that our model has learned to distinguish the attributes of material structures and optical spectra while mapping them into the same hidden representation space.
我们在图3(b)中可视化降维结果。几个有趣的特征立即被观察到。首先，物理结构（由表示结构标记的单个彩色点组成的轨迹）和光谱响应（绿色十字的聚类）在这种2维表示中很好地分离，尽管它们在训练中被平等对待。这证明了我们的模型已经学会区分材料结构和光谱的属性，同时将它们映射到相同的隐藏表示空间。

Second, the 900 structure tokens are easily distinguishable, either as colored curves (the starting and ending points correspond to thickness of 500 nm and 10 nm respectively), or cluster of dots, with no overlap between different materials. Upon close examination, it is clear that our model has intelligently separated the low refractive index dielectrics from the high refractive dielectrics. Within these two groups, all curves converge to the center region representing the lowest thickness 10 nm. This is anticipated from optical physics: when the dielectric layer thickness is reduced to the minimal all materials will behave similarly as they contribute to negligible optical phrase change or optical absorption. In other words, our model has learned the fact that thin dielectric layers of different materials all have similar effect on light propagation in multilayer thin films. Equally interesting is that all the metals cluster into their own territories in this 2-D map. This can be understood because as the metal layer thickness is greater than the optical penetration depth, its contribution to the optical response has little dependence on the thicknesses. These observations demonstrate that even though our model does not directly take in any refractive index nor thickness, it can capture this information and learning hidden representations from a large dataset, validating the usage of structure serialization and spectrum embedding.
其次，900个结构标记很容易区分，无论是作为彩色曲线（起点和终点分别对应500 nm和10 nm的厚度）还是点簇，不同材料之间没有重叠。仔细观察可以清楚地看到，我们的模型已经智能地将低折射率电介质与高折射率电介质分开。在这两组中，所有曲线都收敛到代表最小厚度10 nm的中心区域。根据光学物理这是预料之中的：当电介质层厚度减到最小时，所有材料的表现都类似，因为它们对光程变化或光吸收的贡献可以忽略不计。换句话说，我们的模型已经学到不同材料的薄电介质层对多层薄膜中光传播都有相似效果这一事实。同样有趣的是，所有金属在这种2维地图中都聚集到它们自己的区域。这可以理解，因为当金属层厚度大于光穿透深度时，其对光学响应的贡献与厚度关系很小。这些观察结果表明，即使我们的模型不直接接收任何折射率或厚度信息，它也能够从大规模数据集中捕获这些信息并学习隐藏表示，验证了结构序列化和光谱嵌入的使用。

Inverse design performance / 逆向设计性能

Now we will examine our model’s inverse design performance on different application situations. We want to mention that in this section, our model will be fixed and all these design tasks can be finished instantaneously by feeding different inputs of target optical response into our model. However, in case of higher accuracy is required, we run a thickness finetuning to improve the performance because the 10 nm discretization of thickness may lead to sub-optimal performance for certain materials.
现在我们将检验模型在不同应用场景下的逆向设计性能。我们要提到的是，在本节中，我们的模型将被固定，所有这些设计任务都可以通过向我们的模型输入不同的目标光学响应瞬时完成。然而，如果需要更高的精度，我们运行厚度微调来提高性能，因为10 nm的厚度离散化可能导致某些材料的次优性能。

Performance on the validation dataset / 验证集性能

Here, we evaluate the averaged inverse design performance on 1000 spectra targets randomly selected from the validation dataset. Based on the multilayer design output from our model, we simulate their corresponding spectrum using TMM and calculate the Mean Absolute Error (MAE) between the input spectrum and the simulated spectrum to quantify the design accuracy. The closest spectrum with smallest MAE in the training dataset is treated as the design baseline, i.e., the best spectrum we could get by simply referring to the training dataset. A good machine learning model should be able to learn from and outperform the training dataset.
在这里，我们评估从验证集中随机选择的1000个光谱目标上的平均逆向设计性能。基于模型输出的多层设计，我们使用TMM模拟它们相应的光谱，并计算输入光谱与模拟光谱之间的平均绝对误差（MAE）以量化设计精度。训练集中MAE最小的最接近光谱被视为设计基线，即仅通过参考训练数据集我们能得到的最佳光谱。一个好的机器学习模型应该能够从训练数据集中学习并超越它。

In Fig. 4(a), we compare the MAEs of the closest structures in the training dataset (orange dots), designed structures (blue dots), and finetuned structures (red dots). On average, the MAE of the designed structures is 0.0258, which is lower than the MAE of the closest structures (0.0296) in the training set; finetuning the thickness can further reduce the MAE to 0.0192 (~24% reduction).
在图4(a)中，我们比较了训练集中最接近结构（橙色点）、设计结构（蓝色点）和微调结构（红色点）的MAE。平均而言，设计结构的MAE为0.0258，低于训练集中最接近结构的MAE（0.0296）；微调厚度可以进一步将MAE降低到0.0192（~24%减少）。

In Fig. 4(b), we compare the number of layers in the target structure vs. the number of layers in the designed structure. The zero upper diagonal matrix implies that our model learns to solve design tasks using a simplified structure with fewer layers (~6 layers on average), which can facilitate the fabrication process as structures with fewer layers are easier to make.
在图4(b)中，我们比较了目标结构中的层数与设计结构中的层数。零上对角矩阵意味着我们的模型学会使用具有更少层数的简化结构（平均约6层）来解决设计任务，这可以促进制造过程，因为层数更少的结构更容易制作。

Finally, we record the time-consumption in Fig. 4(c). On average, our model completes each design within 0.1 s, which is comparable as running a TMM simulation.
最后，我们在图4(c)中记录了时间消耗。平均而言，我们的模型在0.1秒内完成每次设计，与运行TMM仿真的速度相当。

Spectrum filter / 光谱滤波器

Now we will evaluate our model on practical inverse design tasks. One such application is the spectrum filter which is used to selectively reflect or transmit specific band of light. Many deep learning-based methods have been proposed to inverse design these filters. Here, several examples are tested: a band-notch filter at 550 nm, a band-notch filter at 700 nm, high reflection in NIR, double high reflection in 500–600 nm and 800–1000 nm, etc. We set the input to be the perfect rectangular spectrum, which has 0% transmission in the desired region and 100% transmission in the rest region. In all these artificial spectrum design targets, our model can output designs that outperform the training dataset. Thickness finetuning can further improve the accuracy.
现在我们将在实际逆向设计任务上评估我们的模型。其中一个应用是光谱滤波器，用于选择性反射或透射特定波段的光。已经提出了许多基于深度学习的方法来逆向设计这些滤波器。在这里，测试了几个示例：550 nm处的带阻滤波器、700 nm处的带阻滤波器、NIR中的高反射、500-600 nm和800-1000 nm的双高反射等。我们将输入设置为完美矩形光谱，在期望区域具有0%透射，在其余区域具有100%透射。在所有这些人造光谱设计目标中，我们的模型可以输出超越训练数据集的设计。厚度微调可以进一步提高精度。

Absorber / 吸收体

Perfect absorbers have been widely used in photovoltaics, radiative cooling, detecting and solar-thermal harvesting, etc. Although our model is trained on reflection and transmission spectrum, it also demonstrates good performance for perfect absorbers. This is done by simply setting the input spectrum as zero for both reflection and transmission. Apart from perfect absorbers, our model can also design for arbitrary absorption. Since energy conservation guarantees that reflection + transmission + absorption = 1, we can tailor the input spectrum by setting reflection to be one minus absorption and setting transmission to be zero.
完美吸收体已广泛应用于光伏、辐射制冷、探测和太阳能热收集等领域。虽然我们的模型是在反射和透射光谱上训练的，但它对完美吸收体也表现出良好的性能。这是通过简单地将反射和透射的输入光谱都设置为零来实现的。除了完美吸收体外，我们的模型还可以设计任意吸收。由于能量守恒保证反射+透射+吸收=1，我们可以通过将反射设置为1减去吸收并将透射设置为零来调整输入光谱。

Structural color / 结构色

Compared to dyes and chemical pigments, structural colors exhibit unique advantages on high resolution, stability and sustainability and have been widely used in color printing, information encryption, sensors, etc. Usually, colors can be represented by a three-dimensional color coordinate, e.g., Lab, RGB or xyY values. In order to make our model well suit for this application, here, we propose an algorithm that can convert a color coordinate into a continuous spectrum in a generalized way. These converted spectra can be pre-calculated and will not impact the design process.
与染料和化学颜料相比，结构色在高分辨率、稳定性和可持续性方面表现出独特优势，已广泛应用于彩色打印、信息加密、传感器等领域。通常，颜色可以用三维颜色坐标表示，例如Lab、RGB或xyY值。为了使我们的模型很好地适用于这一应用，在这里，我们提出了一种能够以通用方式将颜色坐标转换为连续光谱的算法。这些转换后的光谱可以预先计算，不会影响设计过程。

Design flexibility / 设计灵活性

Design flexibility adds extra freedom to the design process because researchers can impose restrictions on the material selection and thickness range for any specific layer to meet the fabrication or design needs. We propose and apply a fast but still generalized method of “probability resampling” to impose restrictions in the design process. As illustrated in Fig. 6(a), this is done by removing these structures that do not satisfy constraints from probability sampling. Since this method is independent of spectra targets, it can be used to design for any applications.
设计灵活性为设计过程增加了额外的自由度，因为研究人员可以对任何特定层的材料选择和厚度范围施加限制，以满足制造或设计需求。我们提出并应用了一种快速但仍通用的"概率重采样"方法，在设计过程中施加限制。如图6(a)所示，这是通过从概率采样中移除不满足约束的结构来完成的。由于这种方法与光谱目标无关，它可以用于任何应用的设计。

As an example, we use our model to inverse design a FP resonator. Here, the target spectrum has a resonance absorption at 610 nm and corresponds to a three-layer 20 nm Ag/150 nm SiO2/50 nm Ag resonator on a glass substrate. We consider adding four different constraints separately: 1) Fix the first layer to be 100 nm SiO2; 2) Remove Ag in the third layer; 3) Limit the thickness of the first layer within [10, 150] nm range and remove Ag/Al in the first layer; 4) Specify the material arrangement to be a three-layer Ag/Si3N4/Ag structure and design the thickness only. The design results demonstrate that our model can finish designs that satisfy desired constraints while still guaranteeing spectrum performance.
作为示例，我们使用我们的模型来逆向设计FP谐振器。这里，目标光谱在610 nm处有共振吸收，对应于玻璃基底上的三层20 nm Ag/150 nm SiO2/50 nm Ag谐振器。我们分别考虑添加四种不同的约束：1）将第一层固定为100 nm SiO2；2）在第三层中移除Ag；3）将第一层厚度限制在[10, 150] nm范围内并在第一层中移除Ag/Al；4）指定材料排列为三层Ag/Si3N4/Ag结构并仅设计厚度。设计结果表明，我们的模型能够完成满足期望约束的设计，同时仍能保证光谱性能。

Generalization ability / 泛化能力

Although our model is trained on normal incident spectrum, its strong generalization ability enables the design towards different angles and polarization states, expanding allowable applications significantly. This is achieved through finetuning our entire model on a small dataset. We further propose the idea of “mixed sampling” to design structures that satisfy multiple requirements simultaneously.
虽然我们的模型是在正入射光谱上训练的，但其强大的泛化能力使其能够设计不同角度和偏振态，显著扩展了允许的应用。这是通过在小数据集上微调我们的整个模型来实现的。我们进一步提出了"混合采样"的思想，以设计同时满足多个要求的结构。

Finetuning / 微调

Starting with the OptoGPT model trained on a 10 M dataset, we can finetune this model on a smaller dataset to suit for light incidence of different angles and polarization states. For example, in order to design for s-polarized spectrum under 20° incident angle, we first prepare a small 1M dataset with such spectrum and then update entire model by 10 epochs. This only requires 1% computing resources compared with training the entire model from scratch.
从在1000万数据集上训练的OptoGPT模型开始，我们可以在较小的数据集上微调该模型，以适应不同角度和偏振态的光入射。例如，为了设计20°入射角下的s偏振光谱，我们首先准备一个具有这种光谱的100万小数据集，然后通过10个epoch更新整个模型。与从头训练整个模型相比，这只需要1%的计算资源。

Mixed sampling / 混合采样

Instead of designing the spectrum for a specific angle/polarization, in some situations we hope the designed structure can simultaneously realize multiple spectra under different incident angles/polarizations, which has not been extensively explored. Benefited from our model’s probability output, we can simply add up these outputs from multiple models that are specific to each situation, and then do a probability sampling based on this mixed output. This is called “Mixed sampling”.
与为特定角度/偏振设计光谱不同，在某些情况下，我们希望设计的结构能够在不同入射角/偏振下同时实现多个光谱，这一点尚未得到广泛探索。得益于我们模型的概率输出，我们可以简单地将来自针对每种情况的多个模型的这些输出相加，然后基于这个混合输出进行概率采样。这被称为"混合采样"。

Discussion and conclusion / 讨论与结论

By converting the multilayer structure into a sequence using structure tokens and structure serialization, we propose OptoGPT to effectively deal with the non-trivial inverse design problem in multilayer structure. Combined with many proposed techniques, our model can unify the inverse design under different types of input targets under different incident angle/polarization, be versatile to different types of structures, as well as facilitating the fabrication process by providing the diversity and flexibility.
通过使用结构标记和结构序列化将多层结构转换为序列，我们提出了OptoGPT来有效处理多层结构中复杂的逆向设计问题。结合许多提出的技术，我们的模型可以统一不同入射角/偏振下不同类型输入目标的逆向设计，对不同类型的结构具有通用性，并通过提供多样性和灵活性来促进制造过程。

The interesting findings of the hidden representations of OptoGPT suggest that it has acquired domain-specific knowledge pertaining to optical multilayer structures through the training process. Furthermore, the model has demonstrated the capacity to apply this acquired knowledge effectively in the inverse design process. However, the current framework still lacks explain ability and does not allow users to directly understand the physical principles involved in its designs.
关于OptoGPT隐藏表示的有趣发现表明，它通过训练过程获得了与光学多层结构相关的领域特定知识。此外，该模型展示了在逆向设计过程中有效应用所获得知识的能力。然而，当前框架仍然缺乏可解释性，不允许用户直接理解其设计中涉及的物理原理。

In addition, using similar methods, our model can be expanded towards high-dimension complicated photonic structures, e.g., 2D metasurfaces or 3D waveguides, using similar tokenization method in Vision Transformer. However, one limitation is that our model requires a large dataset for training, which is also a common criticism for many GPT models. For example, ChatGPT is trained on billions of tokens using ~10000 GPUs, costing ~ $10M for a single training. In this work, because of the constraint on computation resources, we need to simplify our design problems, including using limited types of materials, limited spectrum range, thickness discretization as well as the maximum number of layers that can be designed, all of which can be extended with more computation resources.
此外，使用类似的方法，我们的模型可以使用Vision Transformer中类似的标记化方法扩展到高维复杂光子结构，例如二维超表面或三维波导。然而，一个限制是我们的模型需要大规模数据集进行训练，这也是许多GPT模型的常见批评。例如，ChatGPT使用约10000个GPU在数十亿个标记上进行训练，单次训练成本约1000万美元。在本工作中，由于计算资源的限制，我们需要简化我们的设计问题，包括使用有限类型的材料、有限的光谱范围、厚度离散化以及可以设计的最大层数，所有这些都可以通过更多的计算资源来扩展。

Despite using a large-scale dataset with 10 million samples for training, it is important to recognize that this dataset only covers a small fraction (∼ 10^-52) of the expansive and complex design space associated with optical multilayer thin film structures. Due to this limitation of its training dataset, OptoGPT may fail to find a design that lies outside the boundaries of the sampled design space. Close collaboration across multiple research groups is needed to obtain a better model for a more general and better photonic inverse design that expands to more complicated structures.
尽管使用了1000万个样本的大规模数据集进行训练，但重要的是要认识到该数据集仅覆盖了与光学多层薄膜结构相关的广阔而复杂的设计空间的一小部分（∼ 10^-52）。由于其训练数据集的这一限制，OptoGPT可能无法找到位于采样设计空间边界之外的设计。需要多个研究小组之间的密切合作，以获得更好的模型，实现更通用、更好的光子逆向设计，扩展到更复杂的结构。

三、语言学习与写作积累专区

1. 【超纲通用词汇&雅思高频词】

按原文出现顺序排序：

单词	音标	文中释义	词性
`multilayer`	/ˌmʌltiˈleɪər/	多层的	adj.
`numerous`	/ˈnjuːmərəs/	众多的，许多的	adj.
`drawback`	/ˈdrɔːbæk/	缺点，不利条件	n.
`simultaneously`	/ˌsɪmlˈteɪniəsli/	同时地	adv.
`vital`	/ˈvaɪtl/	至关重要的	adj.
`photovoltaic`	/ˌfəʊtəʊvɒlˈteɪɪk/	光伏的	adj.
`iterative`	/ˈɪtərətɪv/	迭代的	adj.
`versatile`	/ˈvɜːsətaɪl/	多功能的，通用的	adj.
`polarization`	/ˌpəʊləraɪˈzeɪʃn/	偏振，极化	n.
`fabrication`	/ˌfæbrɪˈkeɪʃn/	制造，加工	n.
`impose`	/ɪmˈpəʊz/	施加，强加	v.
`arbitrarily`	/ˈɑːbɪtrəli/	任意地	adv.
`potentially`	/pəˈtenʃəli/	潜在地	adv.
`unify`	/ˈjuːnɪfaɪ/	统一，使一致	v.
`concatenate`	/kɒnˈkætɪneɪt/	连接，串联	v.
`aforementioned`	/əˈfɔːmenʃnd/	前述的，上述的	adj.
`discretize`	/ˈdɪskrətaɪz/	离散化	v.
`negligible`	/ˈneɡlɪdʒəbl/	可忽略的，微不足道的	adj.
`penetration`	/ˌpenɪˈtreɪʃn/	穿透，渗透	n.
`empirical`	/ɪmˈpɪrɪkl/	经验的，实证的	adj.
`instantaneously`	/ˌɪnstənˈteɪniəsli/	瞬时地	adv.
`correspond`	/ˌkɒrəˈspɒnd/	对应，符合	v.
`quantify`	/ˈkwɒntɪfaɪ/	量化	v.
`baseline`	/ˈbeɪslaɪn/	基线，基准	n.
`facilitate`	/fəˈsɪlɪteɪt/	促进，使便利	v.
`illustrate`	/ˈɪləstreɪt/	说明，阐明	v.
`arbitrary`	/ˈɑːbɪtrəri/	任意的	adj.
`conservation`	/ˌkɒnsəˈveɪʃn/	守恒，保护	n.
`robust`	/rəʊˈbʌst/	稳健的，鲁棒的	adj.
`pretrained`	/ˌpriːˈtreɪnd/	预训练的	adj.
`constraint`	/kənˈstreɪnt/	约束，限制	n.
`designate`	/ˈdezɪɡneɪt/	指定，标明	v.
`scenario`	/səˈnɑːriəʊ/	场景，情景	n.
`collaboration`	/kəˌlæbəˈreɪʃn/	合作，协作	n.
`expansive`	/ɪkˈspænsɪv/	广阔的，扩展的	adj.

2. 【科研专业核心术语】

英文术语	标准中文译法	术语简要解释
`inverse design`	逆向设计	从目标光学响应反推物理结构参数的设计方法，与传统正向仿真相反
`multilayer thin film structure`	多层薄膜结构	由多种材料交替堆叠形成的层状光学结构，每层厚度通常在纳米至微米量级
`transfer matrix method (TMM)`	转移矩阵法	计算多层薄膜光学响应的数值方法，通过矩阵连乘求解电磁场传播
`structure token`	结构标记	本论文提出的概念，将材料名称和厚度值编码为离散标记（如’TiO2_200’表示200 nm厚的TiO2层）
`structure serialization`	结构序列化	将多层结构转换为标记序列的过程，使变长结构可以用统一方式表示
`decoder-only transformer`	仅解码器Transformer	GPT类模型架构，通过自注意力机制和自回归生成实现序列到序列映射
`self-attention`	自注意力	Transformer核心机制，允许模型学习序列中各位置之间的依赖关系
`cross-attention`	交叉注意力	连接不同模态（如光谱和结构）的注意力机制，实现条件生成
`autoregressive generation`	自回归生成	逐个元素生成序列的方式，每个新元素依赖于已生成的元素
`probability sampling`	概率采样	从模型输出的概率分布中随机采样获得离散标记的方法
`finetuning`	微调	在预训练模型基础上，用特定任务的小数据集进一步训练以适应新场景
`distributed Bragg reflector (DBR)`	分布布拉格反射器	由高低折射率材料交替组成的多层结构，对特定波长具有高反射率
`Fabry-Pérot (FP) resonator`	法布里-珀罗谐振器	由两个平行反射镜构成的光学谐振腔
`t-SNE`	t分布随机邻域嵌入	一种非线性降维算法，常用于高维数据可视化
`structural color`	结构色	由微纳结构对光的散射、干涉等物理机制产生的颜色，区别于化学颜料
`refractive index`	折射率	描述光在介质中传播速度变化的物理量，是光学设计的关键参数
`radiative cooling`	辐射制冷	通过向太空辐射热量实现被动降温的技术
`photonic`	光子的，光子学	研究光的产生、操控和探测的学科领域
`band-notch filter`	带阻滤波器	阻止特定波长范围光通过而允许其他波长透过的光学器件
`polarization state`	偏振态	描述电磁波电场矢量振动方向的特性（s偏振/p偏振/非偏振）
`penetration depth`	穿透深度	光在吸收性材料中强度衰减到表面值的1/e时的传播距离
`foundation model`	基础模型	在大规模数据上预训练、可迁移到多种下游任务的通用AI模型

3. 【地道学术表达&可复用句式/短语】

英文表达	中文释义	适用场景
`have been widely used in numerous applications`	已被广泛应用于众多领域	描述研究背景中技术/方法的普及性
`seek to identify`	旨在确定/寻找	表达研究目标或目的
`critical to enable`	对实现…至关重要	强调某因素的重要性
`rely on... to minimize the difference between... and...`	依赖…来最小化…与…之间的差异	描述优化方法的核心机制
`have their own limitations, either from the perspective of... or...`	无论从…角度还是…角度都有其局限性	多维度分析现有方法不足
`solve all these drawbacks and issues simultaneously`	同时解决所有这些缺陷和问题	强调方法的全面性
`To do so, first... Next... Further...`	为此，首先…接下来…进一步…	分层阐述研究方法步骤
`On the equal footing`	在同一起跑线上，平等地	描述不同因素在模型中被同等对待
`Upon close examination, it is clear that...`	仔细观察可以清楚地看到…	引导对结果的深入分析
`This is anticipated from...`	根据…这是预料之中的	用理论解释实验观察
`In other words,...`	换句话说…	对前述内容进行同义转述，增强可理解性
`validating the usage of...`	验证了…的使用	表明实验结果支持某种方法
`serving as a foundation model for...`	作为…的基础模型	描述模型的通用性和基础性地位
`outperform the training dataset`	超越训练数据集	表明模型具有泛化能力而非简单记忆
`facilitate the fabrication process`	促进制造过程	连接设计与实际应用
`be comparable as...`	与…相当/可比	描述性能指标的对比
`with minimal adaptations`	只需最小调整	强调方法的通用性和易扩展性
`demonstrating that our model can finish designs that satisfy... while still guaranteeing...`	证明我们的模型能够完成满足…的设计，同时仍能保证…	展示多目标达成能力
`This is achieved through...`	这是通过…实现的	解释某种能力的技术途径
`significantly expand the allowable applications`	显著扩展允许的应用范围	描述方法带来的可能性扩展
`can be extended with more computation resources`	可以通过更多计算资源扩展	描述方法的scalability
`lies outside the boundaries of the sampled design space`	位于采样设计空间的边界之外	描述模型的设计空间限制
`Close collaboration across multiple research groups is needed to...`	需要多个研究小组之间的密切合作来…	指出未来工作所需的合作方向

一、文献核心速览#

1. 文献基本信息#

2. 核心结论#

3. 核心价值#

4. 研究方法#

二、文献中英对照全文#

Introduction / 引言#

Methods / 方法#

Structure tokens and structure serialization / 结构标记与结构序列化#

Conditional sequence generation / 条件序列生成#

Model architecture / 模型架构#

Inverse design / 逆向设计#

Results / 结果#

Visualization of structure tokens / 结构标记可视化#

Inverse design performance / 逆向设计性能#

Performance on the validation dataset / 验证集性能#

Spectrum filter / 光谱滤波器#

Absorber / 吸收体#

Structural color / 结构色#

Design flexibility / 设计灵活性#

Generalization ability / 泛化能力#

Finetuning / 微调#

Mixed sampling / 混合采样#

Discussion and conclusion / 讨论与结论#

三、语言学习与写作积累专区#

1. 【超纲通用词汇&雅思高频词】#

2. 【科研专业核心术语】#

3. 【地道学术表达&可复用句式/短语】#

一、文献核心速览

1. 文献基本信息

2. 核心结论

3. 核心价值

4. 研究方法

二、文献中英对照全文

Introduction / 引言

Methods / 方法

Structure tokens and structure serialization / 结构标记与结构序列化

Conditional sequence generation / 条件序列生成

Model architecture / 模型架构

Inverse design / 逆向设计

Results / 结果

Visualization of structure tokens / 结构标记可视化

Inverse design performance / 逆向设计性能

Performance on the validation dataset / 验证集性能

Spectrum filter / 光谱滤波器

Absorber / 吸收体

Structural color / 结构色

Design flexibility / 设计灵活性

Generalization ability / 泛化能力

Finetuning / 微调

Mixed sampling / 混合采样

Discussion and conclusion / 讨论与结论

三、语言学习与写作积累专区

1. 【超纲通用词汇&雅思高频词】

2. 【科研专业核心术语】

3. 【地道学术表达&可复用句式/短语】