lyd186568 发表于 2020-10-29 23:05:05

请问如何读取TXT文档特定信息,并提取出导入到excel中

本帖最后由 lyd186568 于 2020-10-29 23:59 编辑

各位大佬,python小白一枚,最近想利用python提取出本地TXT文档(文档中含有2000多组固定格式的数据)中特定关键词后的数据,并导入到Excel中,请问应该怎么弄啊

疾风怪盗 发表于 2020-10-29 23:28:36

open打开读取txt,openpyxl循环写入excel,就是这样

你这问题太笼统了,要么自己写,有问题再问,要么把文档传上来看看

lyd186568 发表于 2020-10-29 23:50:55

疾风怪盗 发表于 2020-10-29 23:28
open打开读取txt,openpyxl循环写入excel,就是这样

你这问题太笼统了,要么自己写,有问题再问,要么把 ...

大佬您好,文档的固定格式如下,前面的大写字母是固定格式,我需要将所需要的固定格式(AU,EM,FU)后的内容读取出来并写入Excel文档,主要是不太清楚怎么写,谢谢大佬。
PT J
AU Fu, L
   Engqvist, H
   Xia, W
AF Fu, Le
   Engqvist, Hakan
   Xia, Wei
TI Glass-Ceramics in Dentistry: A Review
SO MATERIALS
LA English
DT Review
DE glass-ceramics; dental prostheses; strength; translucency; strengthening
   mechanisms
ID X-RAY-DIFFRACTION; MECHANICAL-PROPERTIES; LI2O-AL2O3-SIO2 GLASS;
   FRACTURE-TOUGHNESS; FLEXURAL STRENGTH; NUCLEATING-AGENT;
   CRYSTALLIZATION; MICROSTRUCTURE; ZIRCONIA; DENSIFICATION
AB In this review, we first briefly introduce the general knowledge of glass-ceramics, including the discovery and development, the application, the microstructure, and the manufacturing of glass-ceramics. Second, the review presents a detailed description of glass-ceramics in dentistry. In this part, the history, property requirements, and manufacturing techniques of dental glass-ceramics are reviewed. The review provided a brief description of the most prevalent clinically used examples of dental glass-ceramics, namely, mica, leucite, and lithium disilicate glass-ceramics. In addition, we also introduce the newly developed ZrO2-SiO2 nanocrystalline glass-ceramics that show great potential as a new generation of dental glass-ceramics. Traditional strengthening mechanisms of glass-ceramics, including interlocking, ZrO2-reinforced, and thermal residual stress effects, are discussed. Finally, a perspective and outlook for future directions in developing new dental glass-ceramics is provided to offer inspiration to the dental materials community.
C1 Cent South Univ, Sch Mat Sci & Engn, Changsha 410083, Peoples R China.
    Uppsala Univ, Appl Mat Sci, Dept Engn Sci, S-75121 Uppsala, Sweden.
RP Fu, L (corresponding author), Cent South Univ, Sch Mat Sci & Engn, Changsha 410083, Peoples R China.; Xia, W (corresponding author), Uppsala Univ, Appl Mat Sci, Dept Engn Sci, S-75121 Uppsala, Sweden.
EM fule2019@csu.edu.cn; Hakan.Engqvist@angstrom.uu.se;
   wei.xia@angstrom.uu.se
FU Central South University
FX This research is funded by the start-up funding of Central South
   University.
NR 88
TC 2
Z9 2
U1 6
U2 7
PU MDPI
PI BASEL
PA ST ALBAN-ANLAGE 66, CH-4052 BASEL, SWITZERLAND
EI 1996-1944
J9 MATERIALS
JI Materials
PD MAR
PY 2020
VL 13
IS 5
AR 1049
DI 10.3390/ma13051049
PG 22
WC Materials Science, Multidisciplinary
SC Materials Science
GA LA6ML
UT WOS:000524060200027
PM 32110874
OA DOAJ Gold, Green Published
DA 2020-10-26
ER

疾风怪盗 发表于 2020-10-30 00:09:37

本帖最后由 疾风怪盗 于 2020-10-30 00:23 编辑

AU Fu, L
EM fule2019@csu.edu.cn; Hakan.Engqvist@angstrom.uu.se;
FU Central South University

相当于,只需要提取出这三行内容?然后写入?最简单的就是这样

from openpyxl import load_workbook

with open('html.txt','r',encoding='utf-8') as f:
    data=f.readlines()

result=[]
for i in data:
    if i[:2] in ['AU','EM','FU']:
      result.append(i)
print(result)
wb=load_workbook('result.xlsx')
ws=wb.active
for j in range(len(result)):
    ws.cell(row=j+1,column=1).value=result
wb.save('result.xlsx')

lyd186568 发表于 2020-10-30 00:38:04

疾风怪盗 发表于 2020-10-30 00:09
AU Fu, L
EM ; ;
FU Central South University


大佬,本地的TXT文档如何引进去啊{:10_266:},我上传了一个文档到服务器,地址为:http://www.lydmaterial.com/file/1.txt,,应该贴在哪啊{:10_266:}{:10_266:}{:10_266:}

疾风怪盗 发表于 2020-10-30 10:46:20

lyd186568 发表于 2020-10-30 00:38
大佬,本地的TXT文档如何引进去啊,我上传了一个文档到服务器,地址为:http://www.lydmateria ...

?你不是传了个压缩包么?就是这个文件咯?你运行下代码试试看,里面的html.txt换成你的文档的文件名

lyd186568 发表于 2020-10-30 15:00:25

疾风怪盗 发表于 2020-10-30 10:46
?你不是传了个压缩包么?就是这个文件咯?你运行下代码试试看,里面的html.txt换成你的文档的文件名

您好,运行后报错{:10_269:}

疾风怪盗 发表于 2020-10-30 15:12:35

lyd186568 发表于 2020-10-30 15:00
您好,运行后报错

这个错误提示很明显啊,你要先有一个result.xlsx的文件啊

Stubborn 发表于 2020-10-30 15:53:01

import re
path = r"C:\Users\Administrator\Desktop\1.txt"
f = open(path,"r" ,encoding="UTF-8")
txt = f.read()
result = re.findall("AU (.*?)\n.*?EM (.*?)\n.*?FU (.*?)\n", txt, flags=re.DOTALL)
for r in result:
    print(r)

lyd186568 发表于 2020-10-30 17:11:00

疾风怪盗 发表于 2020-10-30 15:12
这个错误提示很明显啊,你要先有一个result.xlsx的文件啊

谢谢啊,确实是啦{:5_109:}

lyd186568 发表于 2020-10-30 17:11:46

lyd186568 发表于 2020-10-30 17:12:30

Stubborn 发表于 2020-10-30 15:53


谢谢您的回复。
页: [1]
查看完整版本: 请问如何读取TXT文档特定信息,并提取出导入到excel中