AusGamers Forums
Show: per page
1
Recommend me a programming solution (for fast collection.containsEle...
Nerfosaurus
Brisbane, Queensland
605 posts
So I have a problem that has been holding me up for quite some time, I think my brain is just fried and it may just need a different perspective. I figured gamers often tend to be programmers, plus I don't have any other active forum accounts, so...

I'm trying to create a class that deals extensively with a large collection of objects of type T, ie T is a generic type. So the class might look like:


class Processor < T > {
Processor(List< T > vals) {
DoStuffWith(vals);
}
}


This Processor class frequently needs to pass an object from the generic collection to various worker classes that are built to expect objects of that type (these classes are provided in the constructor, so are also generic, and are activated through an abstract parameter passing method).

so now the class might look like


class Processor < T > {
Processor(List< T > vals, Worker< T >[] workers) {
DoStuffWith(vals, workers);
}
}


These worker classes often need to check if other objects exist in the set (ie if vals contains). One example might be if the processor class is dealing with 2D points. If the worker class is passed x:4 y:5, it may want to check if x:4 y:6 exists. However the only way I understand to do this, is to create an x=4 y=6 instance and check if it exists in the set, by ensuring the generic object's .equals and .hashcode methods would return true on such a case and such a case only. (working in Java ftr, but compiling to JavaScript with GWT, though I think this is more a general Object Oriented question)

This seems dodgy to me, as I'm instancing a class only to compare (and this could happen many many many times). I could create an abstract key object that the generic class must provide to at least reduce the bloat of instancing a compare key, but this still seems wrong.

What I'd prefer is to compare on a primitive so that no new class needs to be initialised when checking if one exists in the set. (so all elements in val might now exist in a hashmap of primitive->instance, where primitive might be an Integer hash)

Or, alternatively, to know if there is some better & faster way of checking if a multi-featured object exists in a set of objects that are abstract.

If I was only working with one Val class type, I could just use a static Integer key() method, that takes the few parameters that are necessary (eg x & y). But since I'm using abstracts, I can't declare such a key generator method. (ie no abstract static methods)

Does this even make sense to any body else? And is any body else pro enough to know what solutions might exist? I tried reaching the ideal blood-alcohol concentration level to achieve programming godhood, but it just induced outsourcing my question to the internets.
09:15pm 14/08/10 Permalink
adBot
ads
Internet
--
ads keep websites free
09:15pm 14/08/10 Permalink
eXemplar
2561 posts
I'm not a real programmer, but why do you need to create a new object to do a lookup, can't iterate over the list?
09:40pm 14/08/10 Permalink
Nerfosaurus
Brisbane, Queensland
606 posts
That might be a best solution. A new object would iterate over a list checking .equals(), I'm not sure if it has some indexing tricks to speed it up based on the hashcode. Iterating over the list to find every instance seems really undesirable, the list and number of checks could be huge.

edit: I'm actually thinking in terms of solutions involving muilti-dimensional arrays that are instanced based on some 'important properties' feature of the class used in the generic/abstract slot. Wouldn't account for gaps in the set, but could be nulled by default so it just checks if one exists at those indexes. (assuming all features are essentially ints)

I'm slightly drunk now.
09:41pm 14/08/10 Permalink
hast
UK
1219 posts
why not just pre-instantiate an object that you use exclusively for looking querying the set. that way you don't need to create a new object each time. you just assign to the objects fields.
10:11pm 14/08/10 Permalink
Nerfosaurus
Brisbane, Queensland
607 posts
I was thinking something along those lines, but my head isn't clear enough to see how to do it. (there's a million other more complex things at play in the actual scenario, but I've tried to boil it down to the one thing making it difficult for me to proceed at this point)

edit: Oh wait I see what you mean now, that could be right. Will consider that my best option, big ty.
10:16pm 14/08/10 Permalink
Strik3r
Brisbane, Queensland
1785 posts
Instantiating an object to do the comparison is fine. Its a trivial task and java's garbage collection will reclaim the memory when the object goes out of scope.

Its the way you do the search thats the problem.

how big could the collection get ? (1000, 10000, 100000, 100000000 items ?) and how performance critical is it? Linear search (iterating through a list) is O(n), with an average case of ((N+1) / (K+1)) where K is the number of times the item occurs in the list.

If you are concerned that the sheer size of the collection is going to make the performance a problem, you need to consider how else you can store/index the data. You could enforce all objects in the list to satisfy the sortable interface so you can keep the list ordered and then perform binary search for o(log n) performance.
10:36pm 14/08/10 Permalink
Nerfosaurus
Brisbane, Queensland
608 posts
Essentially I want to do a lot of it (dozens of times per worker, and hundreds of workers per problem), and I would prefer for it to be able to be as fast as possible (especially if the implementation means magnitudes of difference in execution speed). It's one of the most fundamental pieces of my project, and if it's slow the quality of the whole project will suffer. I would prefer for it to be as fast as possible regardless of size, but collection sizes are expected to be in the tens of thousands.

But yes I'm starting to realise I had been basing my approach around the Java ArrayList and HashMap .contains() restrictions. I should consider creating my own space for the problem, perhaps the contains() check could be abstract, and implemented as per best for the properties of the generic object type. (which might for example involve going with hast's selection in a provided set-control object)
10:53pm 14/08/10 Permalink
Dazhel
Gold Coast, Queensland
2101 posts
It's one of the most fundamental pieces of my project, and if it's slow the quality of the whole project will suffer.


As with most performance problems, you've got to think of performance constraints as feature requirements.
e.g. is the processing over 10,000 objects fine to run in 0.1sec? 1sec? 10sec?

That way you can implement the feature, profile it and if it satisfies the performance constraints then you move on to other features. If it doesn't, then you optimise and profile again. Worrying about performance too early means you might tend to focus on it to the exclusion of other features.
If you're worried about performance in particular I'd be more likely to design in such a way that makes swapping out the implementation at a later stage easier if any performance issue arises during the profiling step (i.e. don't wed yourself too much to a particular concrete collection class).

Regarding the original problem,
Instantiating objects to compare may be the way to go for simplicity. Alternatively you could implement some sort of generic object that allows you to iterate over the collection and match another object based on constraints related to your original object. Refer to the Strategy Pattern. The List.Sort(IComparer) method in dotnet is an example of the strategy pattern.
11:24pm 14/08/10 Permalink
Nerfosaurus
Brisbane, Queensland
609 posts
Yep I know it's early to be considering when I don't even know how fast it will run, but truth be told I can't see any one complete way to do it, so am trying to decide on an approach and ensure I do it "correctly" the first time. It's a very big subsystem (with many more problems I haven't mentioned) and every single piece is inter-related and abstractable (it's a system that will be used for many different purposes), so sorting out how I'm going to stick the pieces together so that enums etc can be switched and types genericesed while still being fast is proving to be a real pain.
11:46pm 14/08/10 Permalink
Raven
Melbourne, Victoria
4608 posts
Constructing and hashing your values should be the quickest way, assuming you have a pre-sorted HashSet. Since this will be O(1) rather than O(logN), lookups will be ideal. However that comes at an insertion cost. How often will you insert new elements, and how often will collisions within the set occur?
If memory isn't a concern, you can set the load factor really low (say 0.30f rather than the default which I think is 0.75f), which will reduce how frequently hash values are recalculated, at the cost of memory.

If you're reallllly worried about the cost of instantiating objects (ie, if it's that frequent), pull out a reference to the keySet from the HashSet and create your hashCode yourself. This could be the most important factor for a large collection. A big question you should be asking is how you're recieving these co-ordinates in the first place - where do they exist for you to know what is to be checked.
06:34am 15/08/10 Permalink
Nerfosaurus
Brisbane, Queensland
611 posts
That is essentially what I was trying to do, the problem was in that I couldn't create a new hashkey without instancing a new object, as the objects are all abstract and there can't be an abstract static hash function on the important parameters.

The number of objects won't change in usage.

Now I've realised I can still have a static HashValue(params) function in my data classes, it just won't be in the abstract definition. The workers that expect that class can still use it. The question is now how does the set ensure the collection is mapped on the same hashkey. I can tell it to use the HashKey() property, but there's no guarantee that will pass through the undefined static calculator or get the same result.

Think I'm just going to make a set object, and require the implementation to handle the actual storing and looking up requests.
08:26am 15/08/10 Permalink
Nerfosaurus
Brisbane, Queensland
612 posts
God my head was really fried, eXemplar is right. I just need the workers to have access to the set and they can iterate through to look for an item with given parameters. For some reason I was intent on hiding the set from the workers.
08:38am 15/08/10 Permalink
adBot
ads
Internet
--
ads keep websites free
08:38am 15/08/10 Permalink
AusGamers Forums
Show: per page
1
This thread is archived and cannot be replied to.
 

Advertise with Us | Download Media Kit | Privacy Policy | Contact Us
© Copyright 2001-2013 AusGamers™ Pty Ltd. ACN 093 772 242.
A Mammoth Media web development, hosted by Mammoth VPS.