Explore the datasets used in this cookbook#
These datasets includes financial data in the text format to fine-tune language model for specific financial understanding tasks.
Note: This is an exploratory data analysis notebook. We will not perform any data cleaning or preprocessing. This is just to understand the datasets and get some insights.
Import the Libraries#
Let’s import the necessary libraries to explore the datasets.
from IPython.core.display import display, HTML
import pandas as pd
from io import StringIO
from datasets import load_dataset
from tqdm.notebook import tqdm
!mkdir ../../data
mkdir: ../../data: File exists
Financial-PhraseBank#
Homepage: Financial PhraseBank
Three-class (positive, negative, neutral) sentiment dataset of sentences from financial news. It consists a total of 4840 sentences from English language. The dataset is divided by agreement rate of 5-8 annotators. We used the HF repository to import the dataset.
Note: The sentiment analysis is a well-established research area and financial phrasebank is only one dataset of this domain. If you would like to use more datasets to train the model, you can check the following links: Financial News Dataset, Financial Tweets, Financial News Sentiment
# Load the financial sentiment dataset
fin_sentiment = load_dataset("financial_phrasebank", "sentences_50agree") # 2.26K rows with text and labels, all agreed by annotators.
# Check the training data
fin_sentiment_df = pd.DataFrame(fin_sentiment['train'])
# View the first few rows of the dataset
fin_sentiment_df.head()
sentence | label | |
---|---|---|
0 | According to Gran , the company has no plans t... | 1 |
1 | Technopolis plans to develop in stages an area... | 1 |
2 | The international electronic industry company ... | 0 |
3 | With the new production plant the company woul... | 2 |
4 | According to the company 's updated strategy f... | 2 |
# Check the distribution of the labels
fin_sentiment_df['label'].value_counts()
label
1 2879
2 1363
0 604
Name: count, dtype: int64
# Check the normalised distribution of the labels
fin_sentiment_df['label'].value_counts(normalize=True)
label
1 0.594098
2 0.281263
0 0.124639
Name: proportion, dtype: float64
# If you want to save the dataset to a csv file
fin_sentiment_df.to_csv('../../data/financial_phrasebank_50agree.csv', index=False)
Financial Q&A Dataset#
Homepage: FinQA
A large scale dataset to develop analysis capability on business financials, numerical reasoning, and understand heterogenous representation.
!mkdir ../../data/finqa
!wget -O ../../data/finqa/train.json https://raw.githubusercontent.com/czyssrs/FinQA/main/dataset/train.json # Approximately 15MB
!wget -O ../../data/finqa/dev.json https://raw.githubusercontent.com/czyssrs/FinQA/main/dataset/dev.json # Approximately 2MB
!wget -O ../../data/finqa/test.json https://raw.githubusercontent.com/czyssrs/FinQA/main/dataset/test.json # Approximately 2MB
finqa_train = pd.read_json('../../data/finqa/train.json')
finqa_train.head()
pre_text | post_text | filename | table_ori | table | qa | id | table_retrieved | text_retrieved | table_retrieved_all | text_retrieved_all | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | [interest rate to a variable interest rate bas... | [fair value of forward exchange contracts afte... | ADI/2009/page_49.pdf | [[, October 31, 2009, November 1, 2008], [Fair... | [[, october 31 2009, november 1 2008], [fair v... | {'question': 'what is the the interest expense... | ADI/2009/page_49.pdf-1 | [{'score': -0.620767951011657, 'ind': 'table_1... | [{'score': 1.251369595527649, 'ind': 'text_1'}... | [{'score': -0.620767951011657, 'ind': 'table_1... | [{'score': 1.251369595527649, 'ind': 'text_1'}... |
1 | [abiomed , inc ., and subsidiaries notes to co... | [the remaining unrecognized compensation expen... | ABMD/2012/page_75.pdf | [[, Number of Shares (in thousands), Weighted ... | [[, number of shares ( in thousands ), weighte... | {'question': 'during the 2012 year , did the e... | ABMD/2012/page_75.pdf-1 | [{'score': 1.944458127021789, 'ind': 'table_2'}] | [{'score': 1.835455536842346, 'ind': 'text_15'... | [{'score': 1.944458127021789, 'ind': 'table_2'... | [{'score': 1.835455536842346, 'ind': 'text_15'... |
2 | [the following table shows annual aircraft fue... | [as of december 31 , 2018 , we did not have an... | AAL/2018/page_13.pdf | [[Year, Gallons, Average Priceper Gallon, Airc... | [[year, gallons, average priceper gallon, airc... | {'question': 'what was the total operating exp... | AAL/2018/page_13.pdf-2 | [{'score': 1.610554456710815, 'ind': 'table_1'... | [{'score': -1.64792251586914, 'ind': 'text_9'}... | [{'score': 1.610554456710815, 'ind': 'table_1'... | [{'score': -1.64792251586914, 'ind': 'text_9'}... |
3 | [the fair value of our grants receivable is de... | [in the third quarter of 2013 , we sold our sh... | INTC/2013/page_71.pdf | [[(In Millions), Dec 28,2013, Dec 29,2012], [A... | [[( in millions ), dec 282013, dec 292012], [a... | {'question': 'what percentage of total cash an... | INTC/2013/page_71.pdf-4 | [{'score': 2.9937365055084233, 'ind': 'table_8... | [{'score': -2.141725540161133, 'ind': 'text_9'... | [{'score': 2.9937365055084233, 'ind': 'table_8... | [{'score': -2.141725540161133, 'ind': 'text_9'... |
4 | [entergy louisiana , llc management's financia... | [the retail electric price variance is primari... | ETR/2008/page_313.pdf | [[, Amount (In Millions)], [2007 net revenue, ... | [[, amount ( in millions )], [2007 net revenue... | {'question': 'what is the growth rate in net r... | ETR/2008/page_313.pdf-3 | [{'score': 3.095985174179077, 'ind': 'table_6'... | [{'score': -0.5041980147361751, 'ind': 'text_2'}] | [{'score': 3.095985174179077, 'ind': 'table_6'... | [{'score': -0.5041980147361751, 'ind': 'text_2... |
finqa_train.describe()
pre_text | post_text | filename | table_ori | table | qa | id | table_retrieved | text_retrieved | table_retrieved_all | text_retrieved_all | |
---|---|---|---|---|---|---|---|---|---|---|---|
count | 6251 | 6251 | 6251 | 6251 | 6251 | 6251 | 6251 | 6251 | 6251 | 6251 | 6251 |
unique | 2090 | 1862 | 2110 | 2098 | 2097 | 6203 | 6251 | 5758 | 5580 | 6192 | 6194 |
top | [.] | [.] | CME/2017/page_83.pdf | [[Current assets, $1,922], [Long-term assets, ... | [[current assets, $ 1922], [long-term assets, ... | {'question': 'what percent did purchase issuan... | ADI/2009/page_49.pdf-1 | [] | [] | [{'score': -0.5908989310264581, 'ind': 'table_... | [{'score': 2.742092132568359, 'ind': 'text_17'... |
freq | 20 | 639 | 6 | 9 | 9 | 2 | 1 | 440 | 620 | 2 | 2 |
finqa_train.dtypes
pre_text object
post_text object
filename object
table_ori object
table object
qa object
id object
table_retrieved object
text_retrieved object
table_retrieved_all object
text_retrieved_all object
dtype: object
entry = finqa_train.iloc[0]
question = entry["qa"]["question"]
print(question)
context = ""
for ind, each_sent in entry["qa"]["model_input"]:
context += each_sent
context += " "
print(context)
context = ""
for each_con in entry["qa"]["gold_inds"]:
context += entry["qa"]["gold_inds"][each_con]
context += " "
print(context)
table = entry["table"]
print(table)
what is the the interest expense in 2009?
interest rate to a variable interest rate based on the three-month libor plus 2.05% ( 2.05 % ) ( 2.34% ( 2.34 % ) as of october 31 , 2009 ) . if libor changes by 100 basis points , our annual interest expense would change by $ 3.8 million . dollar , would have on the fair value of our forward exchange contracts as of october 31 , 2009 and november 1 , 2008: .
if libor changes by 100 basis points , our annual interest expense would change by $ 3.8 million .
[['', 'october 31 2009', 'november 1 2008'], ['fair value of forward exchange contracts asset ( liability )', '$ 6427', '$ -23158 ( 23158 )'], ['fair value of forward exchange contracts after a 10% ( 10 % ) unfavorable movement in foreign currency exchange rates asset ( liability )', '$ 20132', '$ -9457 ( 9457 )'], ['fair value of forward exchange contracts after a 10% ( 10 % ) favorable movement in foreign currency exchange rates liability', '$ -6781 ( 6781 )', '$ -38294 ( 38294 )']]
Flare FinQA#
Part of the PIXIU project, researchers and practitioners from China, UK and US created datasets that can be ready-to-feed in the LLM prompting format. Called FLARE Evaluation datasets, they present a set of prompt-based datasets: see HF Repo. In the Flare-FinQA, they convert the FinQA dataset to work with prompt-based strategies.
dataset = load_dataset("ChanceFocus/flare-finqa") #Approximate Size: 16MB
print(dataset)
DatasetDict({
train: Dataset({
features: ['id', 'query', 'answer', 'text'],
num_rows: 6251
})
test: Dataset({
features: ['id', 'query', 'answer', 'text'],
num_rows: 1147
})
valid: Dataset({
features: ['id', 'query', 'answer', 'text'],
num_rows: 883
})
})
entry = dataset['train'][5]
query = entry["query"]
answer = entry["answer"]
print(query)
print(answer)
Please answer the given financial question based on the context.
Context: the significant changes from december 31 , 2008 to december 31 , 2009 in level 3 assets and liabilities are due to : a net decrease in trading securities of $ 10.8 billion that was driven by : 2022 net transfers of $ 6.5 billion , due mainly to the transfer of debt 2013 securities from level 3 to level 2 due to increased liquidity and pricing transparency ; and net settlements of $ 5.8 billion , due primarily to the liquidations of 2013 subprime securities of $ 4.1 billion . the change in net trading derivatives driven by : 2022 a net loss of $ 4.9 billion relating to complex derivative contracts , 2013 such as those linked to credit , equity and commodity exposures . these losses include both realized and unrealized losses during 2009 and are partially offset by gains recognized in instruments that have been classified in levels 1 and 2 ; and net increase in derivative assets of $ 4.3 billion , which includes cash 2013 settlements of derivative contracts in an unrealized loss position , notably those linked to subprime exposures . the decrease in level 3 investments of $ 6.9 billion primarily 2022 resulted from : a reduction of $ 5.0 billion , due mainly to paydowns on debt 2013 securities and sales of private equity investments ; the net transfer of investment securities from level 3 to level 2 2013 of $ 1.5 billion , due to increased availability of observable pricing inputs ; and net losses recognized of $ 0.4 billion due mainly to losses on non- 2013 marketable equity securities including write-downs on private equity investments . the decrease in securities sold under agreements to repurchase of 2022 $ 9.1 billion is driven by a $ 8.6 billion net transfers from level 3 to level 2 as effective maturity dates on structured repos have shortened . the decrease in long-term debt of $ 1.5 billion is driven mainly by 2022 $ 1.3 billion of net terminations of structured notes . transfers between level 1 and level 2 of the fair value hierarchy the company did not have any significant transfers of assets or liabilities between levels 1 and 2 of the fair value hierarchy during 2010 . items measured at fair value on a nonrecurring basis certain assets and liabilities are measured at fair value on a nonrecurring basis and therefore are not included in the tables above . these include assets measured at cost that have been written down to fair value during the periods as a result of an impairment . in addition , these assets include loans held-for-sale that are measured at locom that were recognized at fair value below cost at the end of the period . the fair value of loans measured on a locom basis is determined where possible using quoted secondary-market prices . such loans are generally classified as level 2 of the fair value hierarchy given the level of activity in the market and the frequency of available quotes . if no such quoted price exists , the fair value of a loan is determined using quoted prices for a similar asset or assets , adjusted for the specific attributes of that loan . the following table presents all loans held-for-sale that are carried at locom as of december 31 , 2010 and 2009 : in billions of dollars aggregate cost fair value level 2 level 3 .
|in billions of dollars|aggregate cost|fair value|level 2|level 3|
|december 31 2010|$ 3.1|$ 2.5|$ 0.7|$ 1.8|
|december 31 2009|$ 2.5|$ 1.6|$ 0.3|$ 1.3|
.
Question: what was the growth rate of the loans held-for-sale that are carried at locom from 2009 to 2010
Answer:
0.97656
flare_finqa_df = pd.DataFrame(dataset['train'])
flare_finqa_df.to_csv('../../data/flare_finqa_train.csv', index=False)
flare_finqa_df = pd.DataFrame(dataset['valid'])
flare_finqa_df.to_csv('../../data/flare_finqa_valid.csv', index=False)
flare_finqa_df = pd.DataFrame(dataset['test'])
flare_finqa_df.to_csv('../../data/flare_finqa_test.csv', index=False)
You can also directly download the HF datasets, using Github LFS and read the parquet files. See the example below:
import pyarrow.parquet as pq
# columns=['col1', 'col2'] to restrict loaded columns
pds = pq.read_pandas('../../data/flare-finqa/data/train.parquet', columns=None).to_pandas()
# path_or_buf='output.jsonl.gz' to output to a file instead of stdout
pds.to_json(path_or_buf="../../data/flare_finqa_train.json", orient='records', lines=True, date_format='iso', date_unit='us', compression='gzip')
ConvFinQA Dataset#
Homepage: czyssrs/ConvFinQA
From the creators of the FinQA, this dataset still aims to improve “numerical reasoning” skills of language models, presented in a conversational format.
!mkdir ../../data/convfinqa
!wget -O ../../data/convfinqa/data.zip https://github.com/czyssrs/ConvFinQA/raw/main/data.zip #Approximate Size: 17MB
!unzip ../../data/convfinqa/data.zip -d data/convfinqa
!rm ../../data/convfinqa/data.zip
convfinqa_train = pd.read_json('../../data/convfinqa/data/train.json')
convfinqa_train.head()
pre_text | post_text | filename | table_ori | table | qa | id | annotation | qa_0 | qa_1 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | [26 | 2009 annual report in fiscal 2008 , reve... | [year ended june 30 , cash provided by operati... | JKHY/2009/page_28.pdf | [[, Year ended June 30, 2009], [2008, 2007], [... | [[2008, year ended june 30 2009 2008, year end... | {'question': 'what was the percentage change i... | Single_JKHY/2009/page_28.pdf-3 | {'amt_table': '<table class='wikitable'><tr><t... | NaN | NaN |
1 | [substantially all of the goodwill and other i... | [the above unaudited pro forma financial infor... | RSG/2008/page_114.pdf | [[, Year Ended December 31, 2008 (Unaudited), ... | [[, year ended december 31 2008 ( unaudited ),... | {'question': 'what was the percent of the grow... | Single_RSG/2008/page_114.pdf-2 | {'amt_table': '<table class='wikitable'><tr><t... | NaN | NaN |
2 | [in a new business model such as the retail se... | [.] | AAPL/2002/page_23.pdf | [[, 2002, 2001, 2000], [Net sales, $5,742, $5,... | [[, 2002, 2001, 2000], [net sales, $ 5742, $ 5... | {'question': 'what was the percentage change i... | Single_AAPL/2002/page_23.pdf-1 | {'amt_table': '<table class='wikitable'><tr><t... | NaN | NaN |
3 | [( 1 ) includes shares repurchased through our... | [.] | UPS/2009/page_33.pdf | [[, 12/31/04, 12/31/05, 12/31/06, 12/31/07, 12... | [[, 12/31/04, 12/31/05, 12/31/06, 12/31/07, 12... | {'question': 'what was the difference in perce... | Single_UPS/2009/page_33.pdf-2 | {'amt_table': '<table class='wikitable'><tr><t... | NaN | NaN |
4 | [( 1 ) includes shares repurchased through our... | [.] | UPS/2009/page_33.pdf | [[, 12/31/04, 12/31/05, 12/31/06, 12/31/07, 12... | [[, 12/31/04, 12/31/05, 12/31/06, 12/31/07, 12... | NaN | Double_UPS/2009/page_33.pdf | {'amt_table': '<table class='wikitable'><tr><t... | {'question': 'what is the roi of an investment... | {'question': 'what was the difference in perce... |
entry = convfinqa_train.iloc[0]
question = entry["qa"]["question"]
pre_text = entry["pre_text"]
table = entry["table"]
post_text = entry["post_text"]
print(question)
print(table)
print(pre_text)
print(post_text)
what was the percentage change in the net cash from operating activities from 2008 to 2009
[['2008', 'year ended june 30 2009 2008', 'year ended june 30 2009 2008', 'year ended june 30 2009'], ['net income', '$ 103102', '$ 104222', '$ 104681'], ['non-cash expenses', '74397', '70420', '56348'], ['change in receivables', '21214', '-2913 ( 2913 )', '-28853 ( 28853 )'], ['change in deferred revenue', '21943', '5100', '24576'], ['change in other assets and liabilities', '-14068 ( 14068 )', '4172', '17495'], ['net cash from operating activities', '$ 206588', '$ 181001', '$ 174247']]
['26 | 2009 annual report in fiscal 2008 , revenues in the credit union systems and services business segment increased 14% ( 14 % ) from fiscal 2007 .', 'all revenue components within the segment experienced growth during fiscal 2008 .', 'license revenue generated the largest dollar growth in revenue as episys ae , our flagship core processing system aimed at larger credit unions , experienced strong sales throughout the year .', 'support and service revenue , which is the largest component of total revenues for the credit union segment , experienced 34 percent growth in eft support and 10 percent growth in in-house support .', 'gross profit in this business segment increased $ 9344 in fiscal 2008 compared to fiscal 2007 , due primarily to the increase in license revenue , which carries the highest margins .', 'liquidity and capital resources we have historically generated positive cash flow from operations and have generally used funds generated from operations and short-term borrowings on our revolving credit facility to meet capital requirements .', 'we expect this trend to continue in the future .', 'the company 2019s cash and cash equivalents increased to $ 118251 at june 30 , 2009 from $ 65565 at june 30 , 2008 .', 'the following table summarizes net cash from operating activities in the statement of cash flows : 2009 2008 2007 .']
['year ended june 30 , cash provided by operations increased $ 25587 to $ 206588 for the fiscal year ended june 30 , 2009 as compared to $ 181001 for the fiscal year ended june 30 , 2008 .', 'this increase is primarily attributable to a decrease in receivables compared to the same period a year ago of $ 21214 .', 'this decrease is largely the result of fiscal 2010 annual software maintenance billings being provided to customers earlier than in the prior year , which allowed more cash to be collected before the end of the fiscal year than in previous years .', 'further , we collected more cash overall related to revenues that will be recognized in subsequent periods in the current year than in fiscal 2008 .', 'cash used in investing activities for the fiscal year ended june 2009 was $ 59227 and includes $ 3027 in contingent consideration paid on prior years 2019 acquisitions .', 'cash used in investing activities for the fiscal year ended june 2008 was $ 102148 and includes payments for acquisitions of $ 48109 , plus $ 1215 in contingent consideration paid on prior years 2019 acquisitions .', 'capital expenditures for fiscal 2009 were $ 31562 compared to $ 31105 for fiscal 2008 .', 'cash used for software development in fiscal 2009 was $ 24684 compared to $ 23736 during the prior year .', 'net cash used in financing activities for the current fiscal year was $ 94675 and includes the repurchase of 3106 shares of our common stock for $ 58405 , the payment of dividends of $ 26903 and $ 13489 net repayment on our revolving credit facilities .', 'cash used in financing activities was partially offset by proceeds of $ 3773 from the exercise of stock options and the sale of common stock ( through the employee stock purchase plan ) and $ 348 excess tax benefits from stock option exercises .', 'during fiscal 2008 , net cash used in financing activities for the fiscal year was $ 101905 and includes the repurchase of 4200 shares of our common stock for $ 100996 , the payment of dividends of $ 24683 and $ 429 net repayment on our revolving credit facilities .', 'cash used in financing activities was partially offset by proceeds of $ 20394 from the exercise of stock options and the sale of common stock and $ 3809 excess tax benefits from stock option exercises .', 'beginning during fiscal 2008 , us financial markets and many of the largest us financial institutions have been shaken by negative developments in the home mortgage industry and the mortgage markets , and particularly the markets for subprime mortgage-backed securities .', 'since that time , these and other such developments have resulted in a broad , global economic downturn .', 'while we , as is the case with most companies , have experienced the effects of this downturn , we have not experienced any significant issues with our current collection efforts , and we believe that any future impact to our liquidity will be minimized by cash generated by recurring sources of revenue and due to our access to available lines of credit. .']
FLARE ConvFinQA#
Similar to other FLARE datasets, it is the prompt-ready version of the ConvFinQA dataset.
dataset = load_dataset("ChanceFocus/flare-convfinqa") #Approximate Size: 12 MB
entry = dataset['train'][0]
query = entry["query"]
answer = entry["answer"]
print(query)
print(answer)
In the context of this series of interconnected finance-related queries and the additional information provided by the pretext, table data, and posttext from a company's financial filings, please provide a response to the final question. This may require extracting information from the context and performing mathematical calculations. Please take into account the information provided in the preceding questions and their answers when formulating your response:
Context: 26 | 2009 annual report in fiscal 2008 , revenues in the credit union systems and services business segment increased 14% ( 14 % ) from fiscal 2007 . all revenue components within the segment experienced growth during fiscal 2008 . license revenue generated the largest dollar growth in revenue as episys ae , our flagship core processing system aimed at larger credit unions , experienced strong sales throughout the year . support and service revenue , which is the largest component of total revenues for the credit union segment , experienced 34 percent growth in eft support and 10 percent growth in in-house support . gross profit in this business segment increased $ 9344 in fiscal 2008 compared to fiscal 2007 , due primarily to the increase in license revenue , which carries the highest margins . liquidity and capital resources we have historically generated positive cash flow from operations and have generally used funds generated from operations and short-term borrowings on our revolving credit facility to meet capital requirements . we expect this trend to continue in the future . the company 2019s cash and cash equivalents increased to $ 118251 at june 30 , 2009 from $ 65565 at june 30 , 2008 . the following table summarizes net cash from operating activities in the statement of cash flows : 2009 2008 2007 . <table class='wikitable'><tr><td>1</td><td>2008</td><td>year ended june 30 2009 2008</td><td>year ended june 30 2009 2008</td><td>year ended june 30 2009</td></tr><tr><td>2</td><td>net income</td><td>$ 103102</td><td>$ 104222</td><td>$ 104681</td></tr><tr><td>3</td><td>non-cash expenses</td><td>74397</td><td>70420</td><td>56348</td></tr><tr><td>4</td><td>change in receivables</td><td>21214</td><td>-2913 ( 2913 )</td><td>-28853 ( 28853 )</td></tr><tr><td>5</td><td>change in deferred revenue</td><td>21943</td><td>5100</td><td>24576</td></tr><tr><td>6</td><td>change in other assets and liabilities</td><td>-14068 ( 14068 )</td><td>4172</td><td>17495</td></tr><tr><td>7</td><td>net cash from operating activities</td><td>$ 206588</td><td>$ 181001</td><td>$ 174247</td></tr></table> year ended june 30 , cash provided by operations increased $ 25587 to $ 206588 for the fiscal year ended june 30 , 2009 as compared to $ 181001 for the fiscal year ended june 30 , 2008 . this increase is primarily attributable to a decrease in receivables compared to the same period a year ago of $ 21214 . this decrease is largely the result of fiscal 2010 annual software maintenance billings being provided to customers earlier than in the prior year , which allowed more cash to be collected before the end of the fiscal year than in previous years . further , we collected more cash overall related to revenues that will be recognized in subsequent periods in the current year than in fiscal 2008 . cash used in investing activities for the fiscal year ended june 2009 was $ 59227 and includes $ 3027 in contingent consideration paid on prior years 2019 acquisitions . cash used in investing activities for the fiscal year ended june 2008 was $ 102148 and includes payments for acquisitions of $ 48109 , plus $ 1215 in contingent consideration paid on prior years 2019 acquisitions . capital expenditures for fiscal 2009 were $ 31562 compared to $ 31105 for fiscal 2008 . cash used for software development in fiscal 2009 was $ 24684 compared to $ 23736 during the prior year . net cash used in financing activities for the current fiscal year was $ 94675 and includes the repurchase of 3106 shares of our common stock for $ 58405 , the payment of dividends of $ 26903 and $ 13489 net repayment on our revolving credit facilities . cash used in financing activities was partially offset by proceeds of $ 3773 from the exercise of stock options and the sale of common stock ( through the employee stock purchase plan ) and $ 348 excess tax benefits from stock option exercises . during fiscal 2008 , net cash used in financing activities for the fiscal year was $ 101905 and includes the repurchase of 4200 shares of our common stock for $ 100996 , the payment of dividends of $ 24683 and $ 429 net repayment on our revolving credit facilities . cash used in financing activities was partially offset by proceeds of $ 20394 from the exercise of stock options and the sale of common stock and $ 3809 excess tax benefits from stock option exercises . beginning during fiscal 2008 , us financial markets and many of the largest us financial institutions have been shaken by negative developments in the home mortgage industry and the mortgage markets , and particularly the markets for subprime mortgage-backed securities . since that time , these and other such developments have resulted in a broad , global economic downturn . while we , as is the case with most companies , have experienced the effects of this downturn , we have not experienced any significant issues with our current collection efforts , and we believe that any future impact to our liquidity will be minimized by cash generated by recurring sources of revenue and due to our access to available lines of credit. .
Conversations:
Question: what is the net cash from operating activities in 2009?
Answer:
206588.0
IBM FinTabNet#
Homepage: IBM FinTabNet Data Exploration Notebook: IBM - Fintabnet - EDA
The IBM FinTabNet dataset is a large scale dataset for financial table to text generation. It has been created by IBM Research AI. You can download the dataset by uncommenting the following commands. The dataset size is 16GB. And the sister project SynthTabNet (synthetically generated documents) is 10GB in size.
#!wget -O ../../data/fintabnet.tar.gz https://dax-cdn.cdn.appdomain.cloud/dax-fintabnet/1.0.0/fintabnet.tar.gz #Approximate Size: 16GB
# FinTabNet consists of real public documents. SynthTabNet is another dataset from synthetically generated table layouts with annotations in jsonl files. If you want to work with synthetic data, you can download the SynthTabNet dataset from the following link:
#!wget -O ../../data/synthtabnet.tar.gz https://ds4sd-public-artifacts.s3.eu-de.cloud-object-storage.appdomain.cloud/datasets/synthtabnet_public/v2.0.0/fintabnet.zip #Approximate Size: 10GB
# Uncomment the following code to extract the dataset
"""
import tarfile
from os import path
#Extracting the dataset
tar = tarfile.open("../../data/fintabnet.tar.gz")
if hasattr(tarfile, 'data_filter'):
tar.extractall(filter='data')
else:
# remove this when no longer needed
print('Extracting may be unsafe; consider updating Python')
tar.extractall()
tar.close()
# Verifying the file was extracted properly
data_path = "examples/"
path.exists(data_path)
"""
# Parse the JSON file and read all the images and labels
import json
# Download the example file from using:
# !wget https://dax-cdn.cdn.appdomain.cloud/dax-fintabnet/1.0.0/examples.tar.gz ./data/fintabnet-examples.tar.gz
with open('../../data/fintabnet-examples/FinTabNet_1.0.0_table_example.jsonl', 'r') as fp:
images = {}
for line in fp:
sample = json.loads(line)
# Index images
if sample['filename'] in images:
annotations = images[sample['filename']]["annotations"]
html = images[sample['filename']]["html"]
else:
annotations = []
html = ""
for t, token in enumerate(sample["html"]["cells"]):
if "bbox" in token:
annotations.append({"category_id":2, "bbox": token["bbox"]})
#Build html table
cnt = 0
for t, token in enumerate(sample["html"]["structure"]["tokens"]):
html += token
if token=="<td>":
html += "".join(sample["html"]["cells"][cnt]["tokens"])
cnt += 1
annotations.append({"category_id": 1, "bbox": sample["bbox"]})
images[sample['filename']] = {'filepath': '../../data/fintabnet-examples/pdf/' + sample["filename"], 'html': html, 'annotations': annotations}
import fitz # PyMuPDF
for i, (filename, image) in enumerate(images.items()):
pdf_document = fitz.open(image["filepath"])
pdf_page = pdf_document[0] # Assuming you want to work with the first page
pdf_width = int(pdf_page.rect.width)
pdf_height = int(pdf_page.rect.height)
img = pdf_page.get_pixmap()
print("Table HTML for page #{}".format(i))
display(HTML(image['html']))
Table HTML for page #0
2017 | 2016 | 2015 | |
Minimum rentals | $2,814 | $2,394 | $2,249 |
Contingent rentals(1) | 178 | 214 | 194 |
$2,992 | $2,608 | $2,443 |
Operating Leases | Aircraftand RelatedEquipment | Facilitiesand Other | |
TotalOperatingLeases | 2018 | $398 | $2,047 |
$2,445 | 2019 | 343 | 1,887 |
2,230 | 2020 | 261 | 1,670 |
1,931 | 2021 | 203 | 1,506 |
1,709 | 2022 | 185 | 1,355 |
1,540 | Thereafter | 175 | 7,844 |
8,019 | Total | $1,565 | $16,309 |
Table HTML for page #1
Amount Reclassified from AOCI | ||||
Affected Line Item in theIncome Statement | 2017 | 2016 | 2015 | |
Amortization of retirement plans prior servicecredits, before tax | $120 | $121 | $115 | |
Salaries and employee benefits | Income tax benefit | (44) | (45) | (43) |
Provision for income taxes | AOCI reclassifications, net of tax | $76 | $76 | $72 |
2017 | 2016 | 2015 | |
Foreign currency translation gain (loss): | |||
Balance at beginning of period | $(514) | $(253) | $81 |
Translation adjustments | (171) | (261) | (334) |
Balance at end of period | (685) | (514) | (253) |
Retirement plans adjustments: | |||
Balance at beginning of period | 345 | 425 | 425 |
Prior service credit and other arising during period | 1 | (4) | 72 |
Reclassifications from AOCI | (76) | (76) | (72) |
Balance at end of period | 270 | 345 | 425 |
Accumulated other comprehensive (loss) income at end of period | $(415) | $(169) | $172 |
Table HTML for page #2
2018 | $81 |
2019 | 71 |
2020 | 55 |
2021 | 44 |
2022 | 41 |
2017 | 2016 | GrossCarryingAmount | Accumulated Amortization | Net BookValue | GrossCarryingAmount | |
Accumulated Amortization | Net BookValue | Customer relationships | $656 | $(203) | $453 | $912 |
$(156) | $756 | Technology | 54 | (26) | 28 | 123 |
(16) | 107 | Trademarks and other | 136 | (88) | 48 | 202 |
(57) | 145 | Total | $846 | $(317) | $529 | $1,237 |
2017 | 2016 | |
Accrued Salaries and Employee Benefits | ||
Salaries | $431 | $478 |
Employee benefits, including variable compensation | 781 | 804 |
Compensated absences | 702 | 690 |
$1,914 | $1,972 | |
Accrued Expenses | ||
Self-insurance accruals | $976 | $837 |
Taxes other than income taxes | 283 | 311 |
Other | 1,971 | 1,915 |
$3,230 | $3,063 |
Table HTML for page #3
(in millions, except per share amounts) | FirstQuarter | SecondQuarter | ThirdQuarter | Fourth Quarter |
2017(1) | ||||
Revenues | $14,663 | $14,931 | $14,997 | $15,728 |
Operating income | 1,264 | 1,167 | 1,025 | 1,581 |
Net income | 715 | 700 | 562 | 1,020 |
Basic earnings per common share(2) | 2.69 | 2.63 | 2.11 | 3.81 |
Diluted earnings per common share(2) | 2.65 | 2.59 | 2.07 | 3.75 |
2016(3) | ||||
Revenues | $12,279 | $12,453 | $12,654 | $12,979 |
Operating income (loss) | 1,144 | 1,137 | 864 | (68) |
Net income (loss) | 692 | 691 | 507 | (70) |
Basic earnings (loss) per common share(2) | 2.45 | 2.47 | 1.86 | (0.26) |
Diluted earnings (loss) per common share(2) | 2.42 | 2.44 | 1.84 | (0.26) |
Table HTML for page #4
2017 | 2016 | 2015 | |
Low | 3.25% | 2.75% | 4.50% |
High | 4.50 | 4.50 | 7.00 |
Weighted-average | 4.03 | 3.82 | 5.90 |
Table HTML for page #5
Aircraft andAircraft Related | Other(1) | Total | |
2018 | $1,777 | $1,440 | $3,217 |
2019 | 1,729 | 508 | 2,237 |
2020 | 1,933 | 400 | 2,333 |
2021 | 1,341 | 309 | 1,650 |
2022 | 1,276 | 198 | 1,474 |
Thereafter | 2,895 | 499 | 3,394 |
Total | $10,951 | $3,354 | $14,305 |
B767F | B777F | Total | |
2018 | 14 | 4 | 18 |
2019 | 15 | 2 | 17 |
2020 | 16 | 3 | 19 |
2021 | 10 | 3 | 13 |
2022 | 10 | 4 | 14 |
Thereafter | 6 | - | 6 |
Total | 71 | 16 | 87 |
Table HTML for page #6
2017 | Percent of Revenue 2017 | |
Revenues | $7,401 | 100.0% |
Operating expenses: | ||
Salaries and employee benefits | 2,077 | 28.1 |
Purchased transportation | 3,049 | 41.2 |
Rentals | 353 | 4.8 |
Depreciation and amortization | 239 | 3.2 |
Fuel | 225 | 3.1 |
Maintenance and repairs | 143 | 1.9 |
Intercompany charges | 17 | 0.2 |
Other | 1,214 | 16.4 |
Total operating expenses | 7,317 | 98.9% |
Operating income | $84 | |
Operating margin | 1.1% | |
Package: | ||
Average daily packages | 1,022 | |
Revenue per package (yield) | $24.77 | |
Freight: | ||
Average daily pounds | 3,608 | |
Revenue per pound (yield) | $0.56 |
Table HTML for page #7
2017 | 2016 | TotalNumber ofSharesPurchased | AveragePrice Paidper Share | TotalPurchasePrice | TotalNumber ofSharesPurchased | |
AveragePrice Paidper Share | TotalPurchasePrice | Common stock repurchases | 2,955,000 | $172.13 | $509 | 18,225,000 |
Table HTML for page #8
2017 | 2016 | |
Funded Status of Plans: | ||
Projected benefit obligation (PBO) | $29,913 | $29,602 |
Fair value of plan assets | 26,312 | 24,271 |
Funded status of the plans | $(3,601) | $(5,331) |
Cash Amounts: | ||
Cash contributions during the year | $2,115 | $726 |
Benefit payments during the year | $2,310 | $912 |
MeasurementDate | Discount Rate |
5/31/2017 | 4.08% |
5/31/2016 | 4.13 |
5/31/2015 | 4.42 |
5/31/2014 | 4.60 |
Table HTML for page #9
Net Book Value at May 31, | Range | 2017 | |
2016 | Wide-body aircraft and related equipment | 15 to 30 years | $9,103 |
$8,356 | Narrow-body and feeder aircraft and related equipment | 5 to 18 years | 3,099 |
3,180 | Package handling and ground support equipment | 3 to 30 years | 3,862 |
3,249 | Information technology | 2 to 10 years | 1,114 |
1,051 | Vehicles | 3 to 15 years | 3,400 |
3,084 | Facilities and other | 2 to 40 years | 5,403 |
FinTabNet OTSL#
An alternative format is OTSL, which is published by the same development team. https://huggingface.co/datasets/ds4sd/FinTabNet_OTSL This dataset is a conversion of the original FinTabNet into the OTSL format.
# Load the test dataset to explore the data (even the smaller split is 300MB)
dataset = load_dataset("ds4sd/FinTabNet_OTSL", split="test") # Approximate size: 300MB