flexnetadm (4) [Avatar] Offline
#1
I'm reading different source formats and joining the informations together to strore the data in the target ER Database Modell.

but I'm getting memory problems when dealing with big >2GB source data.

Is there any technique to realize a nice garbage collection, because information which is loaded into databse I dont need in the memory.

So in my mind i need a "piping mechanism" to load data, join them, store in db and forget.

Is this a realizable technique with linq for saving main memory?

maybe if possible with small example?

regards
wolfgang
fabrice.marguerie (224) [Avatar] Offline
#2
Re: Aviod memory consumption when joining big lists to store in a databse
Looks like you need to use some paging.
What are the source formats? XML?
flexnetadm (4) [Avatar] Offline
#3
Re: Aviod memory consumption when joining big lists to store in a databse
hello fabrice,

thank you for your reply. what is paging? (i bought your book smilie but not time not read until now smilie )

the source file formats are proprietary:

* binary format (nested lists of entity attribs) (datatypes are int, float)
=> 4-5 file types
* flat file fixed length (entitiy attribs too) (datatypes differ: int, string, ...)
=> 3-4 file types
* jpg's
* xml's comming soon

this must be prepared and joined (synced on entity level) before loading into db ...

so in my opinion I'm wasting much main memory at run time to hold the whole data in memory until the load is completed!
flexnetadm (4) [Avatar] Offline
#4
Re: Aviod memory consumption when joining big lists to store in a database
fabrice,

actually I must admit that I'm not loading the binary files with linq, because the file content constists of both: meta data and user data

so I'm not able to access the data with a relational query!

nevertheless im preparing generic lists which can be processed with linq in the next step. So, you may say that I have an ETL-Process where the E-Part is a non-linq step!

regards.
jwooley (123) [Avatar] Offline
#5
Re: Aviod memory consumption when joining big lists to store in a database
LINQ is good if you need to query object structures. If you need to get the objects and manipulate them as objects, it does the job fairly well with reasonable data sizes.

When dealing with large data structures (like you have), you will likely find that just iterating over the values and not needing to hydrate full objects will be more advantageous. In this case, you may want to consider creating a custom LINQ implementation with a custom iterator that only hydrates the objects when the where clause evaluates to true (or a join clause if that is more appropriate to your situation). We discuss this option a bit at the end of the book (when discussing LINQ to Amazon).

You can also look at the implementation of .Where in the Sequence.cs file which is part of the C# samples (click Help then Samples in VS 2008 to access the samples).

Ultimately, you may find using the ETL tools like SSIS may be a better option for dealing with these sorts of problems. LINQ is a great tool, but it shouldn't be the only tool in your tool box. You need to know when to use the most appropriate tool for any given job.

Jim
flexnetadm (4) [Avatar] Offline
#6
Re: Aviod memory consumption when joining big lists to store in a database
jim,

thank you for your widespread answer.
at first i will answer to the last suggestion: i think ssis is much oversized and proprietary for me. yes, actually i will load into sql server but maybe tomorrow into an oracle database smilie

i think i will study your suggestion to make a custom linq implementation. if its to complicated for me as linq beginner I planned to implemtent the iterational way and page thru the source data!?

because this load phase is not the main part of my project i will not spend too much time on it!

I agreee with you, that linq is not the solution for all data driven problems, but if someone has another idea, please post it!

nevertheless i'm planning to implement the "TL"-Parts with linq!