foreach方法,ssis for循環容器_使用SSIS ForEach Loop容器以日期順序處理文件

 2023-10-18 阅读 28 评论 0

摘要:ssis for循環容器 One positive thing to come out of my recent project that involved rewriting one of the Data Marts from our Data Warehouse environment was a confirmation of my suspicions with regards to the behavior of SQL Server Integration Services’ (S

ssis for循環容器

One positive thing to come out of my recent project that involved rewriting one of the Data Marts from our Data Warehouse environment was a confirmation of my suspicions with regards to the behavior of SQL Server Integration Services’ (SSIS) ForEach Loop Container. You see, I have long suspected that the ForEach File Enumerator type in SSIS’s ForEach Loop Container does not process time stamped text files in an order that could be deemed correct to the human eye. For instance, Figure 1 shows a list of text files containing data relating to Marital Statuses of FIFA 2016 Ballon D’Or nominees.

我最近的一個項目涉及從我們的數據倉庫環境中重寫一個數據市場,這是一個積極的事情,這證實了我對SQL Server Integration Services(SSIS) ForEach循環容器的行為的懷疑。 您會發現,我長期以來一直懷疑SSIS的ForEach循環容器中的ForEach File Enumerator類型不會以人眼認為正確的順序處理帶時間戳的文本文件。 例如, 圖1顯示了一個文本文件列表,其中包含與FIFA 2016 Ballon D'Or提名人的婚姻狀況有關的數據。

The data contained in the files created in the morning of June 30th (suffixed with “AM”) is similar – with Lionel Messi’s marital status set to Single as shown in Figure 2.

foreach方法、 6月30 早上創建的文件中包含的數據(后綴為“ AM”)是相似的– Lionel Messi的婚姻狀況設置為“ 單身 ”, 如圖2所示。

Later on, that day, Lionel Messi got married and as a result the “2PM” file contains changes to Leonel Messi’s marital status as shown in Figure 3.

那天晚些時候,Lionel Messi結婚了,因此“ 2PM”文件包含了Leonel Messi婚姻狀況的更改, 如圖3所示。

It is interesting to note that the default sort order of these text files in Windows Explorer is by file name – which looks to be incorrect as the file suffixed with “2PM” is listed ahead of the “7AM” file. This means that if we were to load data from these files into a Type 2 Marital Status dimension, the latest version of the data would come from the “7AM” file.

java中foreach循環、 有趣的是, Windows資源管理器中這些文本文件的默認排序順序是按文件名排序–由于在“ 7AM”文件之前列出了帶有“ 2PM”后綴的文件,因此看起來不正確。 這意味著,如果我們要將這些文件中的數據加載到2型婚姻狀況維度中,則數據的最新版本將來自“ 7AM”文件。

The correct order of processing these text files is to have them sorted by date modified as shown in Figure 4 wherein the “2PM” file will be the last one to be imported.

處理這些文本文件的正確順序是按修改的日期對它們進行排序, 如圖4所示,其中“ 2PM”文件將是最后一個要導入的文件。

按文件名處理文本文件 (Processing of Text Files by File Name)

The screenshot in Figures 1 and 4, indicates that listing of the file name can appear differently depending on whether you are sorting by file name or date modified. Whilst Windows Explorer shows that the listing of files can be sorted in multiple ways, it looks like the ForEach File Enumerator type only processes text files in an order sorted by file name which – as we had indicated in Figure 1 – is incorrect. To demonstrate this, we make use of a sample SSIS package shown in Figure 5. The package begins by using an Execute SQL Task to clear the staging table. The next step involves using a Data Flow Task inside a ForEach loop container that iteratively loads the text files.

圖14中的屏幕快照表明,根據您是按文件名排序還是修改日期排序,文件名列表的顯示方式可能會有所不同。 盡管Windows資源管理器顯示了可以用多種方式對文件列表進行排序,但是看起來ForEach File Enumerator類型僅按按文件名排序的順序處理文本文件( 如圖1所示 )是不正確的。 為了證明這一點,我們使用了圖5所示的示例SSIS包。 該程序包首先使用Execute SQL Task清除登臺表。 下一步涉及在迭代加載文本文件的ForEach循環容器內使用數據流任務

foreach遍歷map,

As shown in Figure 6, ForEach Loop Container is configured to use ForEach File Enumerator type and it processes files with file name like MaritalStatus_FIFA*.

如圖6所示, ForEach循環容器被配置為使用ForEach File Enumerator類型,并且它處理文件名如MaritalStatus_FIFA *的文件。

Following the successful execution of the SSIS package shown in Figure 5, we are able to view all data that was imported into the staging table as shown in Figure 7. As already predicted, the ForEach loop container using ForEach File Enumerator type processed the files in a file name order. This is incorrect as the latest record for Lionel Messi (at line 5 in Figure 7) is loaded ahead of the 7AM file.

成功執行圖5所示的SSIS包之后,我們就可以查看導入到登臺表中的所有數據, 如圖7所示。 如前所述,使用ForEach File Enumerator類型的ForEach循環容器按文件名順序處理文件。 這是不正確的,因為Lionel Messi的最新記錄( 圖7中的第5行)已加載到7AM文件之前。

foreach遍歷。

通過文件創建時間處理文本文件 (Processing of Text Files by File Creation Time)

The dangers of relying on the ForEach File Enumerator type is that we don’t have control in the way files are processed. We can get around this limitation in two ways:

依靠ForEach File Enumerator類型的危險在于,我們無法控制文件的處理方式。 我們可以通過兩種方式解決此限制:

  1. Renaming Text Files to Military Time

    將文本文件重命名為軍事時間

    The simplest way of getting your time stamped text files processed in the correct order, is to adopt a file naming convention that uses military time instead of the standard hour clock. As it can be seen in Figure 8, renaming of the hour clock part of the file names to military time has resulted into the files listed in the correct order in the Windows Explorer.

    foreach循環遍歷list集合、 以正確的順序處理帶時間戳的文本文件的最簡單方法是采用文件命名約定,該約定使用軍用時間而不是標準小時鐘。 如圖8所示 ,將文件名的小時部分重命名為軍用時間已導致Windows資源管理器中以正確的順序列出文件。

    Figure 8: Text files with Military Time
    圖8:帶有軍事時間的文本文件

    Following the SSIS package execution, the data in our staging table is updated as shown in Figure 9. As it can be seen, the processed text file contains accurate marital status for Lionel Messi.

    foreach遍歷數組的兩種方法? 在執行SSIS包之后,登臺表中的數據將更新, 如圖9所示。 可以看出,處理后的文本文件包含Lionel Messi的準確婚姻狀況。

    Figure 9: Data in Staging Table in the correct order
    圖9:登臺表中的數據以正確的順序

    One significant limitation of the military time approach is that as SSIS developers we often don’t have control in terms of naming the text files. I recall several instances whereby my SSIS solution processed files that were prepared and dumped to an FTP location by a legacy 3rd party program. In such instances, you are usually given read permission on the FTP location and thereby prevented from editing the files.

    java中foreach循環用法、 軍用時間方法的一個重大局限性是,作為SSIS開發人員,我們常常無法控制文本文件的命名。 我記得幾個實例,由此我的SSIS解決方案處理由傳統的第三方程序制備和轉儲到一個FTP位置的文件。 在這種情況下,通常會授予您對FTP位置的讀取權限,從而阻止您編輯文件。

  2. Processing Text Files using Foreach ADO Enumerator

    使用Foreach ADO枚舉器處理文本文件

    The recommended approach in terms of processing multiple time-stamped text files is using Foreach ADO Enumerator type instead of ForEach File Enumerator. The switch to Foreach ADO Enumerator type requires several changes to your SSIS package as shown in Figure 10. Again, the first step involves using the Execute SQL Task to clear staging tables.

    就處理多個帶時間戳的文本文件而言,推薦的方法是使用Foreach ADO Enumerator類型而不是ForEach File Enumerator類型。 切換到Foreach ADO Enumerator類型需要對SSIS包進行幾處更改, 如圖10所示。 同樣,第一步涉及使用Execute SQL Task清除登臺表。

    foreach是按照順序遍歷的嗎?

    Figure 10: SSIS Package using ADO Enumerator
    圖10:使用ADO枚舉器的SSIS包

    I then use a Script Task (ST – Populate ListOfFiles) that uses methods from LINQ to sort text files by creation time and insert the output into a staging table. The main code of the Script Task is shown in Script 1.

    然后,我使用腳本任務ST – Populate ListOfFiles ),該任務使用LINQ中的方法按創建時間對文本文件進行排序,并將輸出插入到臨時表中。 腳本任務的主要代碼顯示在腳本1中

    ?
    public void Main()
    {SqlConnection cnn = new SqlConnection("Data Source=localhost;Initial 
    Catalog=SQLShack;Integrated Security=SSPI;");cnn.Open();string query = "INSERT INTO dbo.ListOfFiles (FileName) VALUES (@FileName)";			var sorted = Directory.GetFiles(@"C:\temp", "Marit*").OrderBy(f => new 
    FileInfo(f).CreationTime);foreach (string file in sorted){var getFileName = Path.GetFileName(file);SqlCommand myCommand = new SqlCommand(query, cnn);myCommand.Parameters.AddWithValue("@FileName", getFileName);					myCommand.ExecuteNonQuery();}				cnn.Close();Dts.TaskResult = (int)ScriptResults.Success;
    }

    c++for循環、

    Script 1
    腳本1

    The Execute SQL Task (ESTPopulate Object Variable) retrieves a list that was built by the Script Task and stores this list into a local package object variable type. As shown in Figure 11, the ForEach loop container is then configured to use Foreach ADO Enumerator type and sources its data from local object variable – varObj.

    執行SQL任務EST- 填充對象變量 )檢索由腳本任務構建的列表,并將該列表存儲到本地包對象變量類型中。 如圖11所示,然后將ForEach循環容器配置為使用Foreach ADO枚舉器類型,并從本地對象變量varObj中獲取其數據。

    內循環,

    Figure 11: ForEach ADO Enumerator type
    圖11:ForEach ADO枚舉器類型

    The rest of the settings inside the ForEach loop container are similar to the package using the Foreach File Enumerator. Following the package execution, the data stored in the staging table will be similar to what is shown in Figure 9.

    ForEach循環容器內的其余設置與使用Foreach File Enumerator的程序包相似。 在執行包之后,存儲在登臺表中的數據將類似于圖9所示。

結論 (Conclusion)

If your SSIS solution does not process multiple text files using the Foreach File Enumerator type, then you are probably not affected by the issue that has been discussed. However, for those dealing with multiple text files, consider switching over to the Foreach ADO Enumerator type.

如果您的SSIS解決方案未使用Foreach文件枚舉器類型處理多個文本文件,則您可能不受所討論問題的影響。 但是,對于處理多個文本文件的用戶,請考慮切換到Foreach ADO Enumerator類型。

資料下載 (Downloads)

  • SQLShackETL SQLShackETL
  • MaritalStatus_FIFABallonDOr_HourClock MaritalStatus_FIFABallonDOr_HourClock
  • MaritalStatus_FIFABallonDOr_MilitaryTime MaritalStatus_FIFABallonDOr_MilitaryTime

參考資料 (References)

  • Foreach Loop Container Foreach循環容器
  • LINQ – Overview LINQ –概述
  • Execute SQL Task 執行SQL任務

翻譯自: https://www.sqlshack.com/using-ssis-foreach-loop-containers-process-files-date-order/

ssis for循環容器

版权声明:本站所有资料均为网友推荐收集整理而来,仅供学习和研究交流使用。

原文链接:https://hbdhgg.com/5/144554.html

发表评论:

本站为非赢利网站,部分文章来源或改编自互联网及其他公众平台,主要目的在于分享信息,版权归原作者所有,内容仅供读者参考,如有侵权请联系我们删除!

Copyright © 2022 匯編語言學習筆記 Inc. 保留所有权利。

底部版权信息