如何用python抓取網頁上的數據,python爬取flash數據_爬取flash數據

 2023-12-06 阅读 30 评论 0

摘要:關于html爬取數據的文章已經有很多了,我今天主要和大家交流的是如何爬取flash網頁的數據。這方面資料相對比較少,主要是html5興起后現在flash站很少了,不過用于技術研究還是可以嘗試一下,這篇文章就主要介紹我爬取數據的整個過程。以房產透明網為

關于html爬取數據的文章已經有很多了,我今天主要和大家交流的是如何爬取flash網頁的數據。這方面資料相對比較少,主要是html5興起后現在flash站很少了,不過用于技術研究還是可以嘗試一下,這篇文章就主要介紹我爬取數據的整個過程。

以房產透明網為例,該網站的一房一價數據就是通過flash顯示,接下來將一步步介紹如何獲取對應的數據。

特別聲明,本文章僅做相關技術學習交流,數據版權為成都透明網,個人或企業請勿用于商業或非法用途,如該文章有不妥之處請聯系本人刪除。

我找了一個樓盤用瀏覽器自帶的工具查看,可以看到返回的數據是亂碼,如下圖。

如何用python抓取網頁上的數據?這個主要是返回的數據格式是application/x-amf,瀏覽器無法正常解析,接下來就需要用的抓包工具Charles了,這個工具沒給錢的話30分鐘會關閉,我覺得30分鐘也夠用了,目前一直忍受著。

1.首先打開Charles

2.打開透明網一房一價頁面,點擊一個單元后就可以看到請求的數據了

這里面比較重要的幾個部分我都截取了一下,最后HOUSEITEMLIST,就我們需要處理的數據了。

3.可以看到通過抓包工具已經可以看到請求的數據了,接下來就需要用java模擬amf的請求。

python excel數據處理、org.apache.flex.blazeds

flex-messaging-core

4.7.2

org.apache.flex.blazeds

flex-messaging-common

爬蟲爬取數據、4.7.2

先要引入這兩個包,這個請求代碼如下,部分參數我設置為******,如果需要測試自行粘貼對應的參數。

public static void main(String[] args) {

try {

URL urlObject = new URL("http://cd.funi.com/messagebroker/amf");

利用python爬取、HttpURLConnection urlConnection = (HttpURLConnection) urlObject.openConnection();

urlConnection.setDoOutput(true);

urlConnection.setRequestProperty("Content-type", "application/x-amf;charset=gb2312");

urlConnection.setRequestProperty("Host", "cd.funi.com");

urlConnection.setRequestProperty("Origin", "http://user.funi.com");

python爬蟲教程。urlConnection.setRequestProperty("Referer", "http://user.funi.com/resource/swf/house/FundateClient_www.swf?communityId=DAZXiSEGhWZLhWIrVooMiDNjk4UzP3et1CztbkK1SZrXmBDQfGR%2BAFaCxnPg5MFf&t=20181131/[[DYNAMIC]]/1");

urlConnection.setRequestProperty("Cookie", "pgv_pvi=9961606144; pgv_si=s9152640000; Hm_lvt_77be290eccb6ceb57b524a860b6faadc=1545658648,1545745229,1545917030,1546227366; Hm_lpvt_77be290eccb6ceb57b524a860b6faadc=1546227368");

OutputStream outputStream = urlConnection.getOutputStream();

SerializationContext serializationContext = new SerializationContext();

ActionContext actionContext = new ActionContext();

python編程、//構建請求信息(0-amf0 3-amf3)

ActionMessage requestMessage = new ActionMessage();

AmfTrace amfTrace = new AmfTrace();

RemotingMessage remotingMessage = new RemotingMessage();

remotingMessage.setOperation("***********************************");

python3,remotingMessage.setSource(null);

remotingMessage.setClientId("FF66DFC9-B00D-2C39-E122-6B6752416543");

remotingMessage.setDestination("dEEDOCService");

remotingMessage.setMessageId("******************************");

remotingMessage.setHeader("DSEndpoint", "my-amf");

用python爬取網站數據。remotingMessage.setHeader("DSId", "*************************");

remotingMessage.setTimeToLive(0);

remotingMessage.setTimestamp(0);

remotingMessage.setBody(new Object[]{"kezlmwCvdjGPckPbY1SmeL3frogB2sfc7IgjBssaFJ2ihf5M93DgMgf5mIqLiWgMNvNwBsVQKuDfTympu4bAjLV9/3mGEHK+MfNqVZKTY0xC3uGOkDg+i2Pt9oTDxBm1xU5Cvmjmd/9mXzN/v3UOvSoqKlLNYy42g8uGAq+JFczhHpdRi7LBtP56E8OJaGq4VksJJnPhGLtMLt1T3wZZKzcV4MqJ2U7NTg7q5AmyCC89nvetx/5Gop8mUBe0tHQdSop8mhHerHn+n7y5O1BL3sRS8T3e1B9F2txtWzcNX0NBzDgAMpfa3AJAhaZ7yuhwd5VtLYD+KquXCUmxJAd/YSjjZGAYYomWjZqRMfO5x5cP/SH8AeI4BiKbTQ+2UygOvYCiTAzy+8GNG0oKpTDCnP2/j2CFhISaMutwAFTF7CZw6HCzJq+2iA8sVnNmCePQMieuZOyq7LG0PppzHRkQYGpUzGynN4FJ8Dz7TBXmuKu7bWJ7jlrYdHbsexEGhoI2fEh/hivzSuCaBfWojChwMQOrtiYKG/YYEgtxNmEUYVdDH5XUiFHVH0V3W+O16fluHZUoaJdvZ+Fbm9oJIB2cz1X9hQSOcs3Cc7i95hhJ0SdQGa1yMw7c2vJSWzbTKuc6rnFm8IDmR6qm6sEIUHRokN56IsDqS+ZHaXWNoOG4q0xR97tFCPlrURWxLcJX3tIJ4xl/imVVlifcAZX4/gXkykAGpM7tdGOy0J/hegAZqCY="});

MessageBody amfMessage = new MessageBody(null, "/3", new Object[]{remotingMessage});

requestMessage.addBody(amfMessage);

// Setup for AMF message serializer

actionContext.setRequestMessage(requestMessage);

ByteArrayOutputStream outBuffer = new ByteArrayOutputStream();

AmfMessageSerializer amfMessageSerializer = new AmfMessageSerializer();

amfMessageSerializer.initialize(serializationContext, outBuffer, amfTrace);

amfMessageSerializer.writeMessage(requestMessage);

outBuffer.writeTo(outputStream);

outBuffer.flush();

outBuffer.close();

InputStream inputStream = urlConnection.getInputStream();

BufferedInputStream urlConnectionInputStream = new BufferedInputStream(inputStream);

serializationContext = new SerializationContext();

actionContext = new ActionContext();

ActionMessage message = new ActionMessage();

actionContext.setRequestMessage(message);

ClassAliasRegistry.getRegistry().registerAlias("DSK", "com.funi.frontend.dto.HouseTable");//需在項目中設置對應的類包名需一致

MessageDeserializer deserializer = new AmfMessageDeserializer();

deserializer.initialize(serializationContext, urlConnectionInputStream, amfTrace);

deserializer.readMessage(message, actionContext);

Object result = null;

for (MessageBody msg : (ArrayList) message.getBodies()) {

java.lang.String targetURI = msg.getTargetURI();

if (targetURI.endsWith(MessageIOConstants.RESULT_METHOD)) {

result = msg.getData();

AcknowledgeMessage acknowledgeMessage=(AcknowledgeMessage)result;

Object body = acknowledgeMessage.getBody();

ASObject asObject=(ASObject)body;

ArrayCollection houseitemlist =(ArrayCollection) asObject.get("HOUSEITEMLIST");

for (Object o : houseitemlist) {

HouseTable houseTable=(HouseTable)o;

System.out.println(DecodeUtils.decode(houseTable.getUnitNo()));

System.out.println(DecodeUtils.decode(houseTable.getUsage()));

System.out.println(DecodeUtils.decode(houseTable.getTotalArea()));

}

} else if (targetURI.endsWith(MessageIOConstants.STATUS_METHOD)) {

java.lang.String exMessage = "Server error";

result = exMessage;

}

}

} catch (Exception e) {

System.out.print("error");

}

}

packagecom.funi.frontend.dto;public classHouseTable {privateBoolean isMortgage;privateString status;privateString roomNo;privateString listWaterPrice;privateString typeHouse;privateString huxId;privateString buildingNo;privateString fitmentPrice;privateString floorNo;privateString listPrice;privateBoolean isSealUp;privateString usage;privateString totalArea;privateObject houseTableList;privateObject phase;privateString unitNo;privateString buildingId;privateString communityId;publicBoolean getMortgage() {returnisMortgage;

}public voidsetMortgage(Boolean mortgage) {

isMortgage=mortgage;

}publicString getStatus() {returnstatus;

}public voidsetStatus(String status) {this.status =status;

}publicString getRoomNo() {returnroomNo;

}public voidsetRoomNo(String roomNo) {this.roomNo =roomNo;

}publicString getListWaterPrice() {returnlistWaterPrice;

}public voidsetListWaterPrice(String listWaterPrice) {this.listWaterPrice =listWaterPrice;

}publicString getTypeHouse() {returntypeHouse;

}public voidsetTypeHouse(String typeHouse) {this.typeHouse =typeHouse;

}publicString getHuxId() {returnhuxId;

}public voidsetHuxId(String huxId) {this.huxId =huxId;

}publicString getBuildingNo() {returnbuildingNo;

}public voidsetBuildingNo(String buildingNo) {this.buildingNo =buildingNo;

}publicString getFitmentPrice() {returnfitmentPrice;

}public voidsetFitmentPrice(String fitmentPrice) {this.fitmentPrice =fitmentPrice;

}publicString getFloorNo() {returnfloorNo;

}public voidsetFloorNo(String floorNo) {this.floorNo =floorNo;

}publicString getListPrice() {returnlistPrice;

}public voidsetListPrice(String listPrice) {this.listPrice =listPrice;

}publicBoolean getSealUp() {returnisSealUp;

}public voidsetSealUp(Boolean sealUp) {

isSealUp=sealUp;

}publicString getUsage() {returnusage;

}public voidsetUsage(String usage) {this.usage =usage;

}publicString getTotalArea() {returntotalArea;

}public voidsetTotalArea(String totalArea) {this.totalArea =totalArea;

}publicObject getHouseTableList() {returnhouseTableList;

}public voidsetHouseTableList(Object houseTableList) {this.houseTableList =houseTableList;

}publicObject getPhase() {returnphase;

}public voidsetPhase(Object phase) {this.phase =phase;

}publicString getUnitNo() {returnunitNo;

}public voidsetUnitNo(String unitNo) {this.unitNo =unitNo;

}publicString getBuildingId() {returnbuildingId;

}public voidsetBuildingId(String buildingId) {this.buildingId =buildingId;

}publicString getCommunityId() {returncommunityId;

}public voidsetCommunityId(String communityId) {this.communityId =communityId;

}

}

最后獲取到對應數據后用base64解密一下即可。

特別聲明,本文章僅做相關技術學習交流,數據版權為成都透明網,個人或企業請勿用于商業或非法用途,如該文章有不妥之處請聯系本人刪除。

喜歡java開發的可以加我qq3369245209,后面會建立一個java開發高級群,下期將介紹如何爬取app數據。

版权声明:本站所有资料均为网友推荐收集整理而来,仅供学习和研究交流使用。

原文链接:https://hbdhgg.com/4/187560.html

发表评论:

本站为非赢利网站,部分文章来源或改编自互联网及其他公众平台,主要目的在于分享信息,版权归原作者所有,内容仅供读者参考,如有侵权请联系我们删除!

Copyright © 2022 匯編語言學習筆記 Inc. 保留所有权利。

底部版权信息